2-AIN-505, 2-AIN-251: Seminár z bioinformatiky (1) a (3)
Zima 2017

Jarno Alanko, Fabio Cunial, Djamal Belazzougui, Veli Makinen. A framework for space-efficient read clustering in metagenomic samples. BMC bioinformatics, 18(Suppl 3):59. 2017.

Download preprint: not available

Download from publisher: not available PubMed

Related web page: not available

Bibliography entry: BibTeX


BACKGROUND: A metagenomic sample is a set of DNA fragments, randomly extracted
from multiple cells in an environment, belonging to distinct, often unknown
species. Unsupervised metagenomic clustering aims at partitioning a metagenomic
sample into sets that approximate taxonomic units, without using reference
genomes. Since samples are large and steadily growing, space-efficient clustering
algorithms are strongly needed. RESULTS: We design and implement a
space-efficient algorithmic framework that solves a number of core primitives in 
unsupervised metagenomic clustering using just the bidirectional Burrows-Wheeler 
index and a union-find data structure on the set of reads. When run on a sample
of total length n, with m reads of maximum length l each, on an alphabet of total
size sigma, our algorithms take O(n(t+logsigma)) time and just 2n+o(n)+O(max{l
sigmalogn,K logm}) bits of space in addition to the index and to the union-find
data structure, where K is a measure of the redundancy of the sample and t is the
query time of the union-find data structure. CONCLUSIONS: Our experimental
results show that our algorithms are practical, they can exploit multiple cores
by a parallel traversal of the suffix-link tree, and they are competitive both in
space and in time with the state of the art.