Publication details

Avril Coghlan, Tristan J. Fiedler, Sheldon J. McKay, Paul Flicek, Todd W. Harris, Darin Blasiar, The nGASP Consortium, Lincoln D. Stein. nGASP - the nematode genome annotation assessment project . BMC Bioinformatics, 9:549. 2008.
Download from publisher | BibTeX | PubMed

Abstract

While the C. elegans genome is extensively annotated, relatively little 
information is available for other Caenorhabditis species. The nematode 
genome annotation assessment project (nGASP) was launched to objectively 
assess the accuracy of protein-coding gene prediction software in C. 
elegans, and to apply this knowledge to the annotation of the genomes of 
four additional Caenorhabditis species and other nematodes. Seventeen 
groups worldwide participated in nGASP, and submitted 47 prediction sets 
across 10 Mb of the C. elegans genome. Predictions were compared to 
reference gene sets consisting of confirmed or manually curated gene 
models from WormBase.

The most accurate gene-finders were 'combiner' algorithms, which made use 
of transcript- and protein-alignments and multi-genome alignments, as well 
as gene predictions from other gene-finders. Gene-finders that used 
alignments of ESTs, mRNAs and proteins came in second. There was a tie for 
third place between gene-finders that used multi-genome alignments and ab 
initio gene-finders. The median gene level sensitivity of combiners was 
78% and their specificity was 42%, which is nearly the same accuracy 
reported for combiners in the human genome. C. elegans genes with exons of 
unusual hexamer content, as well as those with unusually many exons, short 
exons, long introns, a weak translation start signal, weak splice sites, 
or poorly conserved orthologs posed the greatest difficulty for 
gene-finders.

This experiment establishes a baseline of gene prediction accuracy in 
Caenorhabditis genomes, and has guided the choice of gene-finders for the 
annotation of newly sequenced genomes of Caenorhabditis and other nematode 
species. We have created new gene sets for C. briggsae, C. remanei, C. 
brenneri, C. japonica, and Brugia malayi using some of the best-performing 
gene-finders.