2-AIN-506 a 2-AIN-252: Seminár z bioinformatiky (2) a (4)
Leto 2018

David Haussler, Maciej Smuga-Otto, Benedict Paten, Adam M. Novak, Sergei Nikitin, Maria Zueva, Dmitrii Miagkov. A Flow Procedure for the Linearization of Genome Sequence Graphs. In RECOMB 2017: Research in Computational Molecular Biology, pp. 34-49,

Download preprint: not available

Download from publisher: https://link.springer.com/chapter/10.1007/978-3-319-56970-3_3

Related web page: not available

Bibliography entry: BibTeX


Efforts to incorporate human genetic variation into the reference human 
genome have converged on the idea of a graph representation of genetic 
variation within a species, a genome sequence graph. A sequence graph 
represents a set of individual haploid reference genomes as paths in a 
single graph. When that set of reference genomes is sufficiently diverse, 
the sequence graph implicitly contains all frequent human genetic 
variations, including translocations, inversions, deletions, and 

In representing a set of genomes as a sequence graph one encounters certain 
challenges. One of the most important is the problem of graph 
linearization, essential both for efficiency of storage and access, as well 
as for natural graph visualization and compatibility with other tools. The 
goal of graph linearization is to order nodes of the graph in such a way 
that operations such as access, traversal and visualization are as 
efficient and effective as possible.

A new algorithm for the linearization of sequence graphs, called the flow 
procedure, is proposed in this paper. Comparative experimental evaluation 
of the flow procedure against other algorithms shows that it outperforms 
its rivals in the metrics most relevant to sequence graphs.