Vladimír Boža, Broňa Brejová, Tomáš Vinař. GAML: Genome Assembly by Maximum Likelihood. In D. Brown, B. Morgenstern, ed., Algorithms in Bioinformatics, 14th International Workshop (WABI), 8701 volume of pp. 122-134, Wroclaw, Poland, 2014. Springer.

Download preprint: not available

Download from publisher: http://link.springer.com/chapter/10.1007/978-3-662-44753-6_10

Related web page: http://compbio.fmph.uniba.sk/gaml

Bibliography entry: BibTeX


The critical part of genome assembly is resolution of repeats and
scaffolding of shorter contigs. Modern assemblers usually perform
this step by heuristics, often tailored to a particular
technology for producing paired reads or long reads. We propose a
new framework that allows systematic combination of diverse
sequencing datasets into a single assembly. We achieve this by
searching for an assembly with maximum likelihood in a
probabilistic model capturing error rate, insert lengths, and
other characteristics of each sequencing technology.

We have implemented a prototype genome assembler GAML that can
use any combination of insert sizes with Illumina or 454 reads,
as well as PacBio reads. Our experiments show that we can
assemble short genomes with N50 sizes and error rates comparable
to ALLPATHS-LG or Cerulean. While ALLPATHS-LG and Cerulean
require each a specific combination of datasets, GAML works on
any combination.

Data and software is available at http://compbio.fmph.uniba.sk/gaml