2-AIN-505, 2-AIN-251: Seminar in Bioinformatics (1), (3)
Winter 2021

Xian Chang, Jordan Eizenga, Adam M. Novak, Jouni Siren, Benedict Paten. Distance indexing and seed clustering in sequence graphs. Bioinformatics, 36(Suppl_1):i146-i153. 2020.

Download preprint: not available

Download from publisher: https://academic.oup.com/bioinformatics/article/36/Supplement_1/i146/5870464?login=true PubMed

Related web page: not available

Bibliography entry: BibTeX


MOTIVATION: Graph representations of genomes are capable of expressing more
genetic variation and can therefore better represent a population than standard
linear genomes. However, due to the greater complexity of genome graphs relative 
to linear genomes, some functions that are trivial on linear genomes become much 
more difficult in genome graphs. Calculating distance is one such function that
is simple in a linear genome but complicated in a graph context. In read mapping 
algorithms such distance calculations are fundamental to determining if seed
alignments could belong to the same mapping. RESULTS: We have developed an
algorithm for quickly calculating the minimum distance between positions on a
sequence graph using a minimum distance index. We have also developed an
algorithm that uses the distance index to cluster seeds on a graph. We
demonstrate that our implementations of these algorithms are efficient and
practical to use for a new generation of mapping algorithms based upon genome
graphs. AVAILABILITY AND IMPLEMENTATION: Our algorithms have been implemented as 
part of the vg toolkit and are available at https://github.com/vgteam/vg.