Brona Brejova, Daniel G. Brown, Tomas Vinar. Vector seeds: an extension to spaced seeds. Journal of Computer and System Sciences, 70(3):364-380. May 2005. Special issue on bioinformatics. Early version in WABI 2003.

Download preprint: 04jcss.pdf, 406Kb

Download from publisher: http://dx.doi.org/10.1016/j.jcss.2004.12.008

Related web page: not available

Bibliography entry: BibTeX

See also: early version

Abstract:

We present improved techniques for finding homologous regions in DNA and 
protein sequences. Our approach focuses on the core regions of a local 
pairwise alignment; we suggest new ways to characterize these regions that 
allow marked improvements in both specificity and sensitivity over existing 
techniques for sequence alignment. For any such characterization, which we 
call a vector seed, we give an e cient algorithm that estimates the 
specificity and sensitivity of that seed under reasonable probabilistic 
models of sequence. We also characterize the probability of a match when an 
alignment is required to have multiple hits before it is detected. Our 
extensions fit well with existing approaches to sequence alignment, while 
still offering substantial improvement in runtime and sensitivity, 
particularly for the important problem of identifying matches between 
homologous coding DNA sequences.