2-AIN-506 a 2-AIN-252: Seminár z bioinformatiky (2) a (4)
Leto 2019
Abstrakt

Alexandru I. Tomescu, Paul Medvedev. Safe and Complete Contig Assembly Through Omnitigs. Journal of computational biology : a journal of computational molecular cell biology, 24(6):590-602. 2017.

Download preprint: not available

Download from publisher: not available PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Contig assembly is the first stage that most assemblers solve when reconstructing
a genome from a set of reads. Its output consists of contigs-a set of strings
that are promised to appear in any genome that could have generated the reads.
From the introduction of contigs 20 years ago, assemblers have tried to obtain
longer and longer contigs, but the following question remains: given a genome
graph G (e.g., a de Bruijn, or a string graph), what are all the strings that can
be safely reported from G as contigs? In this article, we answer this question
using a model in which the genome is a circular covering walk. We also give a
polynomial-time algorithm to find such strings, which we call omnitigs. Our
experiments show that omnitigs are 66%-82% longer on average than the popular
unitigs, and 29% of dbSNP locations have more neighbors in omnitigs than in
unitigs.