Rastislav Sramek, Brona Brejova, Tomas Vinar.
On-line Viterbi algorithm for analysis of long biological sequences.
In Raffaele Giancarlo, Sridhar Hannenhalli, ed.,
Algorithms in Bioinformatics: 7th International Workshop (WABI),
4645 volume of Lecture Notes in Computer Science,
pp. 240-251, Philadelphia, PA, USA, September 2007. Springer.
Preprint, 243Kb | Download from publisher | BibTeX
Hidden Markov models (HMMs) are routinely used for analysis of long genomic sequences to identify various features such as genes, CpG islands, and conserved elements. A commonly used Viterbi algorithm requires O(mn) memory to annotate a sequence of length n with an m-state HMM, which is impractical for analyzing whole chromosomes. In this paper, we introduce the on-line Viterbi algorithm for decoding HMMs in much smaller space. Our analysis shows that our algorithm has the expected maximum memory Theta(mlog n) on two-state HMMs. We also experimentally demonstrate that our algorithm significantly reduces memory of decoding a simple HMM for gene finding on both simulated and real DNA sequences, without a significant slow-down compared to the classical Viterbi algorithm.