2-AIN-506 a 2-AIN-252: Seminár z bioinformatiky (2) a (4)
Leto 2021

Jana Ebler et al.. Pangenome-based genome inference. Technical Report 10.1101/2020.11.11.378133, bioRxiv, 2020.

Download preprint: not available

Download from publisher: https://www.biorxiv.org/content/10.1101/2020.11.11.378133v1

Related web page: not available

Bibliography entry: BibTeX


Typical analysis workflows map reads to a reference genome in order to 
detect genetic variants. Generating such alignments introduces references 
biases, in particular against insertion alleles absent in the reference 
and comes with substantial computational burden. In contrast, recent k-
mer-based genotyping methods are fast, but struggle in repetitive or 
duplicated regions of the genome. We propose a novel algorithm, called 
PanGenie, that leverages a pangenome reference built from haplotype-
resolved genome assemblies in conjunction with k-mer count information 
from raw, short-read sequencing data to genotype a wide spectrum of 
genetic variation. The given haplotypes enable our method to take 
advantage of linkage information to aid genotyping in regions poorly 
covered by unique k-mers and provides access to regions otherwise 
inaccessible by short reads. Compared to classic mapping-based approaches, 
our approach is more than 4× faster at 30× coverage and at the same time, 
reached significantly better genotype concordances for almost all variant 
types and coverages tested. Improvements are especially pronounced for 
large insertions (> 50bp), where we are able to genotype > 99.9% of all 
tested variants with over 90% accuracy at 30× short-read coverage, where 
the best competing tools either typed less than 60% of variants or reached 
accuracies below 70%. PanGenie now enables the inclusion of this commonly 
neglected variant type in downstream analyses.