Publication details

Brona Brejova, Daniel G. Brown. Optimal Spaced Seeds for Finding Homologous Coding Regions. Technical Report CS-2002-43, Dept. of Computer Science, University of Waterloo, October 2002.
Preprint, 123Kb | Download from publisher | BibTeX

Abstract

We study the problem of computing optimal spaced seeds for identifying
homologous coding DNA sequences in large genomic data sets. We develop
a model of DNA sequence alignment in coding regions, and using data
sets from human/Drosophila and human/mouse comparisons, we compute
optimal spaced seeds using a dynamic programming algorithm. The seeds
we identify are more sensitive by far at identifying homologous
regions than the seeds from BLAST or from PatternHunter, and also
significantly improve on the sensitivity of WABA, which also uses a
simple spaced seed. In particular, in human/Drosophila comparisons, we
offer an 82% improvement in false negatives over BLAST and a 33%
improvement over WABA.  Our results offer the hope of improved gene
finding due to fewer missed exons in DNA/DNA comparison, and more
effective homology search in general.