Xuefeng Cui, Tomas Vinar, Brona Brejova, Dennis Shasha, Ming Li. Homology Search for Genes. Bioinformatics, 23(13):i97-i103. 2007. Intelligent Systems for Molecular Biology (ISMB 2007).

Download preprint: 07homology.pdf, 159Kb

Download from publisher: http://dx.doi.org/10.1093/bioinformatics/btm225

Related web page: http://www.bioinformatics.uwaterloo.ca/~xfcui/ismb07/

Bibliography entry: BibTeX

Abstract:

Motivation: Life science researchers often require an exhaustive list of
protein coding genes similar to a given query gene. To find such genes, 
homology search tools, such as BLAST or PatternHunter, return a set of 
high-scoring pairs (HSPs). These HSPs then need to be correlated with 
existing sequence annotations, or assembled manually into putative gene 
structures. This process is error-prone and labor-intensive, especially in 
genomes without reliable gene annotation.

Results: We have developed a homology search solution that automates this 
process, and instead of HSPs returns complete gene structures. We achieve 
better sensitivity and specificity by adapting a hidden Markov model for 
gene finding to reflect features of the query gene. Compared to traditional 
homology search, our novel approach identifies splice sites much more 
reliably and can even locate exons that were lost in the query gene.

On a testing set of 400 mouse query genes, we report 79% exon sensitivity 
and 80% exon specificity in the human genome based on orthologous genes 
annotated in NCBI HomoloGene. In the same set, we also found 50 (12%) gene
structures with better protein alignment scores than the ones identified in 
HomoloGene.

Availability: The Java implementation is available for download from 
http://www.bioinformatics.uwaterloo.ca/software.