2-AIN-506 a 2-AIN-252: Seminár z bioinformatiky (2) a (4)
Leto 2020

David C. Danko, Dmitry Meleshko, Daniela Bezdan, Christopher Mason, Iman Hajirasouliha. Minerva: an alignment- and reference-free approach to deconvolve Linked-Reads formetagenomics. Genome research, 29(1):116-124. 2019.

Download preprint: not available

Download from publisher: https://genome.cshlp.org/content/29/1/116.full PubMed

Related web page: not available

Bibliography entry: BibTeX


Emerging Linked-Read technologies (aka read cloud or barcoded short-reads) have
revived interest in short-read technology as a viable approach to understand
large-scale structures in genomes and metagenomes. Linked-Read technologies, such
as the 10x Chromium system, use a microfluidic system and a specialized set of 3'
barcodes (aka UIDs) to tag short DNA reads sourced from the same long fragment of
DNA; subsequently, the tagged reads are sequenced on standard short-read
platforms. This approach results in interesting compromises. Each long fragment
of DNA is only sparsely covered by reads, no information about the ordering of
reads from the same fragment is preserved, and 3' barcodes match reads from
roughly 2-20 long fragments of DNA. However, compared to long-read technologies, 
the cost per base to sequence is far lower, far less input DNA is required, and
the per base error rate is that of Illumina short-reads. In this paper, we
formally describe a particular algorithmic issue common to Linked-Read
technology: the deconvolution of reads with a single 3' barcode into clusters
that represent single long fragments of DNA. We introduce Minerva, a graph-based 
algorithm that approximately solves the barcode deconvolution problem for
metagenomic data (where reference genomes may be incomplete or unavailable).
Additionally, we develop two demonstrations where the deconvolution of barcoded
reads improves downstream results, improving the specificity of taxonomic
assignments and of k-mer-based clustering. To the best of our knowledge, we are
the first to address the problem of barcode deconvolution in metagenomics.