2-AIN-505, 2-AIN-251: Seminár z bioinformatiky (1) a (3)
Zima 2014
Abstrakt

Viraj Deshpande, Eric DK Fung, Son Pham, Vineet Bafna. Cerulean: A hybrid assembly using high throughput short and long reads. In WABI 2013, 2013.

Download preprint: not available

Download from publisher: http://arxiv.org/pdf/1307.7933v1

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Genome assembly using high throughput data with short reads, arguably, 
remains an unresolvable task in repetitive genomes, since when the length 
of a repeat exceeds the read length, it becomes difficult to unambiguously 
connect the flanking regions. The emergence of third generation sequencing 
(Pacific Biosciences) with long reads enables the opportunity to resolve 
complicated repeats that could not be resolved by the short read data. 
However, these long reads have high error rate and it is an uphill task to 
assemble the genome without using additional high quality short reads. 
Recently, Koren et al. 2012 proposed an approach to use high quality short 
reads data to correct these long reads and, thus, make the assembly from 
long reads possible. However, due to the large size of both dataset (short 
and long reads), error-correction of these long reads requires excessively 
high computational resources, even on small bacterial genomes. In this 
work, instead of error correction of long reads, we first assemble the 
short reads and later map these long reads on the assembly graph to 
resolve repeats.

Contribution: We present a hybrid assembly approach that is both 
computationally effective and produces high quality assemblies. Our 
algorithm first operates with a simplified version of the assembly graph 
consisting only of long contigs and gradually improves the assembly by 
adding smaller contigs in each iteration. In contrast to the state-
of-the-art long reads error correction technique, which requires high 
computational resources and long running time on a supercomputer even for 
bacterial genome datasets, our software can produce comparable assembly 
using only a standard desktop in a short running time.