2-AIN-506 a 2-AIN-252: Seminár z bioinformatiky (2) a (4)
Leto 2020

Mitchell R. Vollger, Philip C. Dishuck, Melanie Sorensen, AnneMarie E. Welch, Vy Dang, Max L. Dougherty, Tina A. Graves-Lindsay, Richard K. Wilson, Mark J. P. Chaisson, Evan E. Eichler. Long-read sequence and assembly of segmental duplications. Nature methods, 16(1):88-94. 2019.

Download preprint: not available

Download from publisher: https://dx.doi.org/10.1038%2Fs41592-018-0236-3 PubMed

Related web page: not available

Bibliography entry: BibTeX


We have developed a computational method based on polyploid phasing of long
sequence reads to resolve collapsed regions of segmental duplications within
genome assemblies. Segmental Duplication Assembler (SDA;
https://github.com/mvollger/SDA ) constructs graphs in which paralogous sequence 
variants define the nodes and long-read sequences provide attraction and
repulsion edges, enabling the partition and assembly of long reads corresponding 
to distinct paralogs. We apply it to single-molecule, real-time sequence data
from three human genomes and recover 33-79 megabase pairs (Mb) of duplications in
which approximately half of the loci are diverged (<99.8%) compared to the
reference genome. We show that the corresponding sequence is highly accurate
(>99.9%) and that the diverged sequence corresponds to copy-number-variable
paralogs that are absent from the human reference genome. Our method can be
applied to other complex genomes to resolve the last gene-rich gaps, improve
duplicate gene annotation, and better understand copy-number-variant genetic
diversity at the base-pair level.