Brona Brejova, Daniel Brown, Tomas Vinar. Vector seeds: an extension to spaced seeds allows substantial improvements in sensitivity and specificity. In G. Benson, R. Page, ed., Algorithms and Bioinformatics: 3rd International Workshop (WABI), 2812 volume of Lecture Notes in Bioinformatics, pp. 39-54, Budapest, Hungary, September 2003. Springer.

Download preprint: 03wabi-seeds.ps.gz, 170Kb

Download from publisher: http://www.springerlink.com/link.asp?id=u6fr322a7yfh6u7f

Related web page: http://www.bioinformatics.uwaterloo.ca/supplements/03seeds/

Bibliography entry: BibTeX

See also: early version

Abstract:

We present improved techniques for finding homologous regions in DNA and
protein sequences.  Our approach focuses on the core region of a local
pairwise alignment; we suggest new ways to characterize these regions that
allow marked improvements in both specificity and sensitivity over existing
techniques for sequence alignment.  For any such characterization, which we
call a vector seed, we give an efficient algorithm that estimates the
specificity and sensitivity of that seed under reasonable probabilistic
models of sequence.  We also characterize the probability of a match when
an alignment is required to have multiple hits before it is detected.  Our
extensions fit well with existing approaches to sequence alignment, while
still offering substantial improvement in runtime and sensitivity,
particularly for the important problem of identifying matches between
homologous coding DNA sequences.