Brona Brejova, Daniel G. Brown, Ming Li, Tomas Vinar. ExonHunter: A Comprehensive Approach to Gene Finding . Technical Report CS-2004-57, School of Computer Science, University of Waterloo, 2004.
Download preprint: 04exonhunter-tr.pdf, 242Kb
Download from publisher: not available
Related web page: not available
Bibliography entry: BibTeX
See also: early version
We present ExonHunter, a new and comprehensive gene finder system that outperforms existing systems, featuring several new ideas and approaches. Our system combines numerous sources of information (genomic sequences, ESTs, and protein databases of related species) with a gene finder based on hidden Markov model in a novel and systematic way. In our framework, various sources of information are expressed as partial probabilistic statements about positions in the sequence and their annotation. We then combine these into the final prediction with a quadratic programming method extending existing methods. Allowing only partial statements is key to our transparent handling of missing information and coping with the heterogeneous character of individual sources of information. As well, we give a new method for modeling length distribution of intergenic regions in hidden Markov models. On a commonly used test set, ExonHunter performs significantly better than ROSETTA, SLAM, or TWINSCAN, and more than two thirds of genes were predicted completely correctly.