Publication details

Xuefeng Cui, Tomas Vinar, Brona Brejova, Dennis Shasha and Ming Li. Homology Search for Genes. Bioinformatics, 23(13):i97-103. 2007. Intelligent Systems for Molecular Biology and European Conference on Computational Biology (ISMB/ECCB 2007).
Preprint, 159Kb | Download from publisher | BibTeX | PubMed

Abstract

Motivation: Life science researchers often require an exhaustive list
of protein coding genes similar to a given query gene. To find such
genes, homology search tools, such as BLAST or PatternHunter, return a
set of high-scoring pairs (HSPs). These HSPs then need to be
correlated with existing sequence annotations, or assembled manually
into putative gene structures. This process is error-prone and
labor-intensive, especially in genomes without reliable gene
annotation.

Results: We have developed a homology search solution that automates
this process, and instead of HSPs returns complete gene structures. We
achieve better sensitivity and specificity by adapting a hidden
Markov model for gene finding to reflect features of the query
gene. Compared to traditional homology search, our novel approach
identifies splice sites much more reliably and can even locate exons
that were lost in the query gene.

On a testing set of 400 mouse query genes, we report 79% exon
sensitivity and 80% exon specificity in the human genome based on
orthologous genes annotated in NCBI HomoloGene. In the same set, we
also found 50 (12%) gene structures with better protein alignment
scores than the ones identified in HomoloGene.

Availability: The Java implementation is available for download from
http://www.bioinformatics.uwaterloo.ca/software.