Bioinformatický seminár

Tue 19 Mar. 2013, 17:20

Title: Pierre Peterlongo, Rayan Chikhi (2012) Mapsembler, targeted and micro assembly of large NGS datasets on a desktop computer
Speaker: Jaro Budiš

BACKGROUND: The analysis of next-generation sequencing data from large 
genomes is a timely research topic. Sequencers are producing billions of short
sequence fragments from newly sequenced organisms. Computational methods for
reconstructing whole genomes/transcriptomes (de novo assemblers) are typically
employed to process such data. However, these methods require large memory
resources and computation time. Many basic biological questions could be answered
targeting specific information in the reads, thus avoiding complete assembly.
RESULTS: We present Mapsembler, an iterative micro and targeted assembler which
processes large datasets of reads on commodity hardware. Mapsembler checks for
the presence of given regions of interest that can be constructed from reads and 
builds a short assembly around it, either as a plain sequence or as a graph,
showing contextual structure. We introduce new algorithms to retrieve approximate
occurrences of a sequence from reads and construct an extension graph. Among
other results presented in this paper, Mapsembler enabled to retrieve previously 
described human breast cancer candidate fusion genes, and to detect new ones not 
previously known. CONCLUSIONS: Mapsembler is the first software that enables de
novo discovery around a region of interest of repeats, SNPs, exon skipping, gene 
fusion, as well as other structural events, directly from raw sequencing reads.
As indexing is localized, the memory footprint of Mapsembler is negligible.
Mapsembler is released under the CeCILL license and can be freely downloaded from