Brona Brejova, Daniel G. Brown, Ming Li, Tomas Vinar. ExonHunter: A Comprehensive Approach to Gene Finding. Bioinformatics, 21(S1):i57-i65. 2005. Intelligent Systems for Molecular Biology (ISMB 2005).

Download preprint: 05eh.pdf, 144Kb

Download from publisher:

Related web page:

Bibliography entry: BibTeX

See also: early version


Motivation: We present ExonHunter, a new and comprehensive gene finder 
system that outperforms existing systems, featuring several new ideas
and approaches. Our system combines numerous sources of information 
(genomic sequences, ESTs, and protein databases of related species) into a 
gene finder based on a hidden Markov model in a novel and systematic
way. In our framework, various sources of information are expressed as
partial probabilistic statements about positions in the sequence and their 
annotation. We then combine these into the final prediction via a quadratic 
programming method, which we show is an extension of existing methods. Allowing only partial statements is key to our transparent handling of 
missing information and coping with the heterogeneous character of 
individual sources of information. As well, we give a new method for 
modeling length distribution of intergenic regions in hidden Markov models.

Results: On a commonly used test set, ExonHunter performs significantly 
better than the existing gene finders ROSETTA, SLAM, or TWINSCAN, with more 
than two thirds of genes  predicted completely correctly.