2-AIN-505, 2-AIN-251: Seminar in Bioinformatics (1), (3)
Winter 2024
Abstrakt

Jouni Siren, Parsa Eskandar, Matteo Tommaso Ungaro, Glenn Hickey, Jordan M. Eizenga, Adam M. Novak, Xian Chang, Pi-Chuan Chang, Mikhail Kolmogorov, Andrew Carroll, Jean Monlong, Benedict Paten. Personalized pangenome references. Nature methods, 2024.

Download preprint: not available

Download from publisher: https://doi.org/10.1038/s41592-024-02407-2 PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Pangenomes reduce reference bias by representing genetic diversity better than a 
single reference sequence. Yet when comparing a sample to a pangenome, variants 
in the pangenome that are not part of the sample can be misleading, for example, 
causing false read mappings. These irrelevant variants are generally rarer in 
terms of allele frequency, and have previously been dealt with by filtering rare 
variants. However, this blunt heuristic both fails to remove some irrelevant 
variants and removes many relevant variants. We propose a new approach that 
imputes a personalized pangenome subgraph by sampling local haplotypes according 
to k-mer counts in the reads. We implement the approach in the vg toolkit ( 
https://github.com/vgteam/vg ) for the Giraffe short-read aligner and compare its 
accuracy to state-of-the-art methods using human pangenome graphs from the Human 
Pangenome Reference Consortium. This reduces small variant genotyping errors by 
four times relative to the Genome Analysis Toolkit and makes short-read 
structural variant genotyping of known variants competitive with long-read 
variant discovery methods.