2-AIN-505, 2-AIN-251: Seminár z bioinformatiky (1) a (3)
Zima 2019

Pierre Marijon, Rayan Chikhi, Jean-Stephane Varre. Graph analysis of fragmented long-read bacterial genome assemblies. Bioinformatics, 35(21):4239-4246. 2019.

Download preprint: not available

Download from publisher: https://academic.oup.com/bioinformatics/article-lookup/doi/10.1093/bioinformatics/btz219 PubMed

Related web page: not available

Bibliography entry: BibTeX


MOTIVATION: Long-read genome assembly tools are expected to reconstruct bacterial
genomes nearly perfectly; however, they still produce fragmented assemblies in
some cases. It would be beneficial to understand whether these cases are
intrinsically impossible to resolve, or if assemblers are at fault, implying that
genomes could be refined or even finished with little to no additional
experimental cost. RESULTS: We propose a set of computational techniques to
assist inspection of fragmented bacterial genome assemblies, through careful
analysis of assembly graphs. By finding paths of overlapping raw reads between
pairs of contigs, we recover potential short-range connections between contigs
that were lost during the assembly process. We show that our procedure recovers
45% of missing contig adjacencies in fragmented Canu assemblies, on samples from 
the NCTC bacterial sequencing project. We also observe that a simple procedure
based on enumerating weighted Hamiltonian cycles can suggest likely contig
orderings. In our tests, the correct contig order is ranked first in half of the 
cases and within the top-three predictions in nearly all evaluated cases,
providing a direction for finishing fragmented long-read assemblies. AVAILABILITY
AND IMPLEMENTATION: https://gitlab.inria.fr/pmarijon/knot . SUPPLEMENTARY
INFORMATION: Supplementary data are available at Bioinformatics online.