Rastislav Sramek, Brona Brejova, Tomas Vinar. On-line Viterbi Algorithm for Analysis of Long Biological Sequences. In Raffaele Giancarlo, Sridhar Hannenhalli, ed., Algorithms in Bioinformatics: 7th International Workshop (WABI), 4645 volume of Lecture Notes in Computer Science, pp. 240-251, Philadelphia, PA, USA, September 2007. Springer.
Download preprint: 07hmmmem.pdf, 243Kb
Download from publisher: http://dx.doi.org/10.1007/978-3-540-74126-8_23
Related web page: not available
Bibliography entry: BibTeX
See also: early version
Abstract:
Hidden Markov models (HMMs) are routinely used for analysis of long genomic sequences to identify various features such as genes, CpG islands, and conserved elements. A commonly used Viterbi algorithm requires \$O(mn)\$ memory to annotate a sequence of length \$n\$ with an \$m\$-state HMM, which is impractical for analyzing whole chromosomes. In this paper, we introduce the on-line Viterbi algorithm for decoding HMMs in much smaller space. Our analysis shows that our algorithm has the expected maximum memory \$Theta(mlog n)\$ on two-state HMMs. We also experimentally demonstrate that our algorithm significantly reduces memory of decoding a simple HMM for gene finding on both simulated and real DNA sequences, without a significant slow-down compared to the classical Viterbi algorithm.