Bioinformatický seminár

Tue 7 May. 2013, 17:20
I-9

Title: Yu Lin et al. (2012) Bootstrapping phylogenies inferred from rearrangement data
Speaker: Martin Višňovec

Large-scale sequencing of genomes has enabled the inference of
phylogenies based on the evolution of genomic architecture, under such events as 
rearrangements, duplications, and losses. Many evolutionary models and associated
algorithms have been designed over the last few years and have found use in
comparative genomics and phylogenetic inference. However, the assessment of
phylogenies built from such data has not been properly addressed to date. The
standard method used in sequence-based phylogenetic inference is the bootstrap,
but it relies on a large number of homologous characters that can be resampled;
yet in the case of rearrangements, the entire genome is a single character.
Alternatives such as the jackknife suffer from the same problem, while likelihood
tests cannot be applied in the absence of well established probabilistic models. 
We present a new approach to the assessment of distance-based phylogenetic
inference from whole-genome data; our approach combines features of the jackknife
and the bootstrap and remains nonparametric. For each feature of our method, we
give an equivalent feature in the sequence-based framework; we also present the
results of extensive experimental testing, in both sequence-based and
genome-based frameworks. Through the feature-by-feature comparison and the
experimental results, we show that our bootstrapping approach is on par with the 
classic phylogenetic bootstrap used in sequence-based reconstruction, and we
establish the clear superiority of the classic bootstrap for sequence data and of
our corresponding new approach for rearrangement data over proposed variants.
Finally, we test our approach on a small dataset of mammalian genomes, verifying 
that the support values match current thinking about the respective branches. Our
method is the first to provide a standard of assessment to match that of the
classic phylogenetic bootstrap for aligned sequences. Its support values follow a
similar scale and its receiver-operating characteristics are nearly identical,
indicating that it provides similar levels of sensitivity and specificity. Thus
our assessment method makes it possible to conduct phylogenetic analyses on whole
genomes with the same degree of confidence as for analyses on aligned sequences. 
Extensions to search-based inference methods such as maximum parsimony and
maximum likelihood are possible, but remain to be thoroughly tested.