2-AIN-505, 2-AIN-251: Seminár z bioinformatiky (1) a (3)
Zima 2020

Jouni Siren, Erik Garrison, Adam M. Novak, Benedict Paten, Richard Durbin. Haplotype-aware graph indexes. Bioinformatics, 36(2):400-407. 2020.

Download preprint: not available

Download from publisher: https://doi.org/10.1093/bioinformatics/btaa640 PubMed

Related web page: not available

Bibliography entry: BibTeX


MOTIVATION: The variation graph toolkit (VG) represents genetic variation as a
graph. Although each path in the graph is a potential haplotype, most paths are
non-biological, unlikely recombinations of true haplotypes. RESULTS: We augment
the VG model with haplotype information to identify which paths are more likely
to exist in nature. For this purpose, we develop a scalable implementation of the
graph extension of the positional Burrows-Wheeler transform. We demonstrate the
scalability of the new implementation by building a whole-genome index of the
5008 haplotypes of the 1000 Genomes Project, and an index of all 108 070
Trans-Omics for Precision Medicine Freeze 5 chromosome 17 haplotypes. We also
develop an algorithm for simplifying variation graphs for k-mer indexing without 
losing any k-mers in the haplotypes. AVAILABILITY AND IMPLEMENTATION: Our
software is available at https://github.com/vgteam/vg,
https://github.com/jltsiren/gbwt and https://github.com/jltsiren/gcsa2.
SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics