Publication details

Brona Brejova, Daniel G. Brown, Tomas Vinar. Optimal spaced seeds for homologous coding regions. Journal of Bioinformatics and Computational Biology, 1(4):595-610. January 2004. Early version in CPM 2003.
Preprint, 2657Kb | Download from publisher | Webpage | Early version | BibTeX | PubMed

Abstract

Optimal spaced seeds were developed as a method to increase sensitivity of
local alignment programs similar to BLASTN. Such seeds have been used before
in the program PatternHunter, and have given improved sensitivity and running
time relative to BLASTN in genome-genome comparison. We study the problem of
computing optimal spaced seeds for detecting homologous coding regions in
unannotated genomic sequences. By using well-chosen seeds, we are able to
improve the sensitivity of coding sequence alignment over that of TBLASTX,
while keeping runtime comparable to BLASTN. We identify good seeds by first
giving effective hidden Markov models of conservation in alignments of
homologous coding regions. We give an efficient algorithm to compute the
optimal spaced seed when conservation patterns are generated by these models.
Our results offer the hope of improved gene finding due to fewer missed exons
in DNA/DNA comparison, and more effective homology search in general, and may
have applications outside of bioinformatics.