Michal Nánási. Decoding of Hidden Markov Models with Applications to Sequence Alignment. PhD thesis, Comenius University in Bratislava, 2014. Supervised by Broňa Brejová.
Download preprint: 14nanasith.pdf, 1398Kb
Download from publisher: http://alis.uniba.sk/storage/dpg/dostupne/FM/2015/2015-FM-17881/
Related web page: not available
Bibliography entry: BibTeX
We study two important problems in computational biology: sequence annotation and sequence alignment. In the thesis we concentrate on the use of hidden Markov models (HMMs), well established generative probabilistic models. In the first part, we study the sequence annotation problem, specifically the two-stage HMM decoding algorithms and the computational complexity of related problems. In particular, we demonstrate that two-stage algorithms can be used to increase the accuracy of decoding, and we prove the NP-hardness for three problems appropriate for the first stage: the most probable set problem, the most probable restriction problem and the most probable footprint problem. The second part of the thesis focuses on alignment of sequences that contain tandem repeats. Tandem repeats are highly repetitive elements withing genomic sequences that cause biases in alignments. To address this issue, we introduce a new HMM that mod- els alignments containing tandem repeats, combine it with existing and new decoding algorithms, and evaluate our approach experimentally. In both problems, we use the decoding algorithms to improve the accuracy of HMM predictions. Decoding algorithms are often neglected, and most of the development is focused on the structure of an HMM. However, a proper selection of a decoding method can lead to significant improvements in the predictions. Keywords: hidden Markov models, decoding, annotation, alignment