2-AIN-506 a 2-AIN-252: Seminár z bioinformatiky (2) a (4)
Leto 2015
Abstrakt

Leena Salmela, Eric Rivals. LoRDEC: accurate and efficient long read error correction. Bioinformatics, 30(24):3506-3514. 2014.

Download preprint: not available

Download from publisher: not available PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

MOTIVATION: PacBio single molecule real-time sequencing is a third-generation
sequencing technique producing long reads, with comparatively lower throughput
and higher error rate. Errors include numerous indels and complicate downstream
analysis like mapping or de novo assembly. A hybrid strategy that takes advantage
of the high accuracy of second-generation short reads has been proposed for
correcting long reads. Mapping of short reads on long reads provides sufficient
coverage to eliminate up to 99% of errors, however, at the expense of prohibitive
running times and considerable amounts of disk and memory space. RESULTS: We
present LoRDEC, a hybrid error correction method that builds a succinct de Bruijn
graph representing the short reads, and seeks a corrective sequence for each
erroneous region in the long reads by traversing chosen paths in the graph. In
comparison, LoRDEC is at least six times faster and requires at least 93% less
memory or disk space than available tools, while achieving comparable accuracy.
Availability and implementaion: LoRDEC is written in C++, tested on Linux
platforms and freely available at http://atgc.lirmm.fr/lordec.