Bioinformatický seminár

Thu 27 Sep. 2012, 10:00

Title: New fast algorithms for phylogeny reconstruction
Speaker: Jakub Truszkowski, University of Waterloo, Canada

Due to rapid expansion in sequence databases, very fast phylogenetic
reconstruction algorithms are becoming necessary. Currently, large sequence
alignments can contain up to hundreds of thousands of sequences, making tra-
ditional methods, such as Neighbour Joining, computationally prohibitive. To
address this problem, we have developed two novel fast phylogenetic algorithms.
The rst algorithm is a quartet-based heuristic that runs in O(n log n) time. It
is based on a theoretical algorithm that reconstructs the correct
tree, with high probability, assuming every quartet is inferred correctly with
constant probability.
The core of our algorithm is a balanced search tree structure that enables
us to locate an edge in the tree in O(log n) time. Our algorithm is several
times faster than all the current methods, while its accuracy approaches that of
Neighbour Joining. This work appeared in CPM 2011 and WABI 2011.

In the second part of the talk, I will present the first sub-quadratic time
algorithm with theoretical performance guarantees under a Markov model of
sequence evolution. Our new algorithm runs in O(n^{1+\gamma(g)} log^2 n) time,
where \gamma is an increasing function of an upper bound on the mutation rate along any
branch in the phylogeny, the upper bound g must be below
1/2 -\sqrt{1/8} \approx 0.15, and \gamma(g) < 1 for all g. For
phylogenies with very short branches, the running time of our algorithm is close to linear.
In preliminary experiments, our prototype implementation was more accurate
than the current fast algorithms, while being comparably fast.
This work appeared in WABI 2012.