Brona Brejova, Daniel G. Brown, Ming Li, Tomas Vinar. ExonHunter: A Comprehensive Approach to Gene Finding . Technical Report CS-2004-57, School of Computer Science, University of Waterloo, 2004.

Download preprint: 04exonhunter-tr.pdf, 242Kb

Download from publisher: not available

Related web page: not available

Bibliography entry: BibTeX

See also: early version

Abstract:

We present ExonHunter, a new and comprehensive gene finder system that
outperforms existing systems, featuring several new ideas and
approaches.  Our system combines numerous sources of information
(genomic sequences, ESTs, and protein databases of related species)
with a gene finder based on hidden Markov model in a novel and
systematic way. In our framework, various sources of information are
expressed as partial probabilistic statements about positions in the
sequence and their annotation. We then combine these into the final
prediction with a quadratic programming method extending existing
methods. Allowing only partial statements is key to our transparent
handling of missing information and coping with the heterogeneous
character of individual sources of information. As well, we give a new
method for modeling length distribution of intergenic regions in
hidden Markov models.  On a commonly used test set, ExonHunter
performs significantly better than ROSETTA, SLAM, or TWINSCAN, and
more than two thirds of genes were predicted completely correctly.