Marcel Kucharík. A New Algorithm for Using External Information in Gene Finding. Master thesis, Comenius University in Bratislava, 2011. Supervised by Broňa Brejová.

Download preprint: 11kucharikth.pdf, 989Kb

Download from publisher: not available

Related web page: not available

Bibliography entry: BibTeX

Abstract:

The goal of gene finding is to mark important
places in DNA sequence -- genes. Programs solving
this task are using information about DNA structure
from already annotated sequences and external
information. We study existing gene finding
programs and find their limitations in using
external information. Typically they can process
only simple hints consisting of a single interval,
because these are relatively easy to incorporate
to standard algorithms, but cannot cope with
complex hint consisting of multiple intervals.
This is unwanted information loss, and therefore
we developed an algorithm able to process
complex information. It is based on Hidden
Markov models and the Viterbi algorithm as
previous gene-finders, but uses a different
approach of adding external information into
the model. Our algorithm finds the sequence
of state labels of the HMM with the best score,
where a sequence gets a bonus for each respected
hint. Hints are modeled as alternative paths
in a graph representation of the problem. We
proved that this approach increases the accuracy
of gene prediction, especially at the gene level.
Time complexity of the algorithm is linear in
sequence length and in the total length of
all hints, while quadratic in the number of
hints and HMM states. This is comparable to
time complexity of existing gene finders.