2-AIN-506, 2-AIN-252: Seminar in Bioinformatics (2), (4)
Summer 2026
Abstrakt

Sumit Walia, Harsh Motwani, Yu-Hsiang Tseng, Kyle Smith, Russell Corbett-Detig, Yatish Turakhia. Compressive pangenomics using mutation-annotated networks. Nature genetics, 58(2):445-453. 2026.

Download preprint: not available

Download from publisher: https://doi.org/10.1038/s41588-025-02478-7 PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Pangenomics is an emerging field that uses collections of genomes, rather than a 
single reference, to reduce bias and capture intra-species diversity. However, 
existing pangenomic data formats face challenges in scaling to millions of 
genomes and primarily emphasize variation, often neglecting the underlying 
mutational events and evolutionary relationships. This work introduces Pangenome 
Mutation-Annotated Network (PanMAN), a lossless pangenome representation that 
achieves compression ratios ranging from 3.5-1,391x in file sizes compared to 
existing variation-preserving formats, with performance generally improving on 
larger datasets. In addition to compression, PanMAN increases representational 
capacity by encoding detailed mutational and evolutionary histories inferred 
across genomes, thereby enabling new biological insights. Using PanMAN, a 
comprehensive SARS-CoV-2 pangenome was constructed from 8 million publicly 
available sequences, requiring only 366 MB of disk space. We also present 
'panmanUtils', a toolkit that supports common analyses and ensures 
interoperability with existing software. PanMAN is poised to greatly improve the 
scale, speed, resolution and scope of pangenomic analysis and data sharing.