2-AIN-505, 2-AIN-251: Seminár z bioinformatiky (1) a (3)
Zima 2014
Abstrakt

Sante Gnerre, Iain Maccallum, Dariusz Przybylski, Filipe J. Ribeiro, Joshua N. Burton, Bruce J. Walker, Ted Sharpe, Giles Hall, Terrance P. Shea, Sean Sykes, Aaron M. Berlin, Daniel Aird, Maura Costello, Riza Daza, Louise Williams, Robert Nicol, Andreas Gnirke, Chad Nusbaum, Eric S. Lander, David B. Jaffe. High-quality draft assemblies of mammalian genomes from massively parallelsequence data. Proceedings of the National Academy of Sciences of the United States of America, 108(4):1513-1518. 2011.

Download preprint: not available

Download from publisher: not available PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Massively parallel DNA sequencing technologies are revolutionizing genomics by
making it possible to generate billions of relatively short (~100-base) sequence 
reads at very low cost. Whereas such data can be readily used for a wide range of
biomedical applications, it has proven difficult to use them to generate
high-quality de novo genome assemblies of large, repeat-rich vertebrate genomes. 
To date, the genome assemblies generated from such data have fallen far short of 
those obtained with the older (but much more expensive) capillary-based
sequencing approach. Here, we report the development of an algorithm for genome
assembly, ALLPATHS-LG, and its application to massively parallel DNA sequence
data from the human and mouse genomes, generated on the Illumina platform. The
resulting draft genome assemblies have good accuracy, short-range contiguity,
long-range connectivity, and coverage of the genome. In particular, the base
accuracy is high (>/=99.95%) and the scaffold sizes (N50 size = 11.5 Mb for human
and 7.2 Mb for mouse) approach those obtained with capillary-based sequencing.
The combination of improved sequencing technology and improved computational
methods should now make it possible to increase dramatically the de novo
sequencing of large genomes. The ALLPATHS-LG program is available at
http://www.broadinstitute.org/science/programs/genome-biology/crd.