Bioinformatický seminár

Tue 14 May. 2013, 17:20

Title: Paul Medvedev et al. (2011) Paired de bruijn graphs: a novel approach for incorporating mate pair informationinto genome assemblers
Speaker: Martin Bobák

The recent proliferation of next generation sequencing with short reads has
enabled many new experimental opportunities but, at the same time, has raised
formidable computational challenges in genome assembly. One of the key advances
that has led to an improvement in contig lengths has been mate pairs, which
facilitate the assembly of repeating regions. Mate pairs have been
algorithmically incorporated into most next generation assemblers as various
heuristic post-processing steps to correct the assembly graph or to link contigs 
into scaffolds. Such methods have allowed the identification of longer contigs
than would be possible with single reads; however, they can still fail to resolve
complex repeats. Thus, improved methods for incorporating mate pairs will have a 
strong effect on contig length in the future. Here, we introduce the paired de
Bruijn graph, a generalization of the de Bruijn graph that incorporates mate pair
information into the graph structure itself instead of analyzing mate pairs at a 
post-processing step. This graph has the potential to be used in place of the de 
Bruijn graph in any de Bruijn graph based assembler, maintaining all other
assembly steps such as error-correction and repeat resolution. Through assembly
results on simulated perfect data, we argue that this can effectively improve the
contig sizes in assembly.