Publication details

Brona Brejova, Daniel Brown, Tomas Vinar. Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In G. Benson, R. Page, ed., Algorithms and Bioinformatics: 3rd International Workshop (WABI), 2812 volume of Lecture Notes in Bioinformatics, pp. 39-54, Budapest, Hungary, September 2003. Springer.
Preprint, 170Kb | Download from publisher | Webpage | BibTeX

Abstract

We present improved techniques for finding homologous regions in DNA and
protein sequences. Our approach focuses on the core region of a local
pairwise alignment; we suggest new ways to characterize these regions
that allow marked improvements in both specificity and sensitivity
over existing techniques for sequence alignment. For any such
characterization, which we call a vector seed, we give an efficient
algorithm that estimates the specificity and sensitivity of that seed
under reasonable probabilistic models of sequence. We also
characterize the probability of a match when an alignment is required
to have multiple hits before it is detected. Our extensions fit well
with existing approaches to sequence alignment, while still offering
substantial improvement in runtime and sensitivity, particularly for
the important problem of identifying matches between homologous coding
DNA sequences.