Jarno Alanko, Fabio Cunial, Djamal Belazzougui, Veli Makinen. A framework for space-efficient read clustering in metagenomic samples. BMC bioinformatics, 18(Suppl 3):59. 2017.
Download preprint: not available
Download from publisher: not available PubMed
Related web page: not available
Bibliography entry: BibTeX
Abstract:
BACKGROUND: A metagenomic sample is a set of DNA fragments, randomly extracted from multiple cells in an environment, belonging to distinct, often unknown species. Unsupervised metagenomic clustering aims at partitioning a metagenomic sample into sets that approximate taxonomic units, without using reference genomes. Since samples are large and steadily growing, space-efficient clustering algorithms are strongly needed. RESULTS: We design and implement a space-efficient algorithmic framework that solves a number of core primitives in unsupervised metagenomic clustering using just the bidirectional Burrows-Wheeler index and a union-find data structure on the set of reads. When run on a sample of total length n, with m reads of maximum length l each, on an alphabet of total size sigma, our algorithms take O(n(t+logsigma)) time and just 2n+o(n)+O(max{l sigmalogn,K logm}) bits of space in addition to the index and to the union-find data structure, where K is a measure of the redundancy of the sample and t is the query time of the union-find data structure. CONCLUSIONS: Our experimental results show that our algorithms are practical, they can exploit multiple cores by a parallel traversal of the suffix-link tree, and they are competitive both in space and in time with the state of the art.