2-AIN-506, 2-AIN-252: Seminar in Bioinformatics (2), (4)
Summer 2025
Abstrakt

Walfred Ma, Mark Jp Chaisson. Genotyping sequence-resolved copy number variation using pangenomes reveals paralog-specific global diversity and expression divergence of duplicated genes. bioRxiv, 2025.

Download preprint: not available

Download from publisher: https://www.biorxiv.org/content/10.1101/2024.08.11.607269v6 PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Copy number variant (CNV) genes are important in evolution and disease, yet 
sequence variation in CNV genes remains a blind spot in large-scale studies. We 
present ctyper, a method that leverages pangenomes to produce allele-specific 
copy numbers with locally phased variants from next-generation sequencing (NGS) 
reads. Benchmarking on 3,351 CNV genes, including HLA, SMN, and CYP2D6, and 212 
challenging medically relevant (CMR) genes that are poorly mapped by NGS, ctyper 
captures 96.5% of phased variants with >/=99.1% correctness of copy number on CNV 
genes and 94.8% of phased variants on CMR genes. Applying alignment-free 
algorithms, ctyper requires 1.5 hours per genome on a single CPU. The results 
largely improve predictions of gene expression compared to known expression 
quantitative trait loci (eQTL) variants. Allele-specific expression quantified 
divergent expression on 7.94% of paralogs and tissue-specific biases on 4.68% of 
paralogs. We found reduced expression of SMN2 due to SMN1 conversion, potentially 
affecting spinal muscular atrophy, and increased expression of translocated 
duplications of AMY2B. Overall, ctyper enables biobank-scale genotyping of CNV 
and CMR genes.