Publication details
Brona Brejova, Daniel G. Brown.
Optimal Spaced Seeds for Finding Homologous Coding Regions.
Technical Report CS-2002-43, Dept. of Computer Science, University of Waterloo,
October 2002.
Preprint, 123Kb | Download from publisher | BibTeX
Abstract
We study the problem of computing optimal spaced seeds for identifying homologous coding DNA sequences in large genomic data sets. We develop a model of DNA sequence alignment in coding regions, and using data sets from human/Drosophila and human/mouse comparisons, we compute optimal spaced seeds using a dynamic programming algorithm. The seeds we identify are more sensitive by far at identifying homologous regions than the seeds from BLAST or from PatternHunter, and also significantly improve on the sensitivity of WABA, which also uses a simple spaced seed. In particular, in human/Drosophila comparisons, we offer an 82% improvement in false negatives over BLAST and a 33% improvement over WABA. Our results offer the hope of improved gene finding due to fewer missed exons in DNA/DNA comparison, and more effective homology search in general.