Rastislav Sramek, Brona Brejova, Tomas Vinar. On-line Viterbi Algorithm for Analysis of Long Biological Sequences. In Raffaele Giancarlo, Sridhar Hannenhalli, ed., Algorithms in Bioinformatics: 7th International Workshop (WABI), 4645 volume of Lecture Notes in Computer Science, pp. 240-251, Philadelphia, PA, USA, September 2007. Springer.

Download preprint: 07hmmmem.pdf, 243Kb

Download from publisher: http://dx.doi.org/10.1007/978-3-540-74126-8_23

Related web page: not available

Bibliography entry: BibTeX

See also: early version

Abstract:

Hidden Markov models (HMMs) are routinely used for analysis of long genomic 
sequences to identify various features such as genes, CpG islands, and 
conserved elements. A commonly used Viterbi algorithm requires \$O(mn)\$ memory 
to annotate a sequence of length \$n\$ with an \$m\$-state HMM, which is 
impractical for analyzing whole chromosomes. In this paper, we introduce the 
on-line Viterbi algorithm for decoding HMMs in much smaller space.  Our 
analysis shows that our algorithm has the expected maximum memory 
\$Theta(mlog n)\$ on two-state HMMs. We also experimentally demonstrate that 
our algorithm significantly reduces memory of decoding a simple HMM for gene 
finding on both simulated and real DNA sequences, without a significant 
slow-down compared to the classical Viterbi algorithm.