2-AIN-506, 2-AIN-252: Seminar in Bioinformatics (2), (4)
Summer 2025
Abstrakt

Jim Shaw, Yun William Yu. Rapid species-level metagenome profiling and containment estimation with sylph. Nature biotechnology, 43(8):1348-1359. 2025.

Download preprint: not available

Download from publisher: https://www.nature.com/articles/s41587-024-02412-y PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Profiling metagenomes against databases allows for the detection and 
quantification of microorganisms, even at low abundances where assembly is not 
possible. We introduce sylph, a species-level metagenome profiler that estimates 
genome-to-metagenome containment average nucleotide identity (ANI) through 
zero-inflated Poisson k-mer statistics, enabling ANI-based taxa detection. On the 
Critical Assessment of Metagenome Interpretation II (CAMI2) Marine dataset, sylph 
was the most accurate profiling method of seven tested. For multisample 
profiling, sylph took >10-fold less central processing unit time compared to 
Kraken2 and used 30-fold less memory. Sylph's ANI estimates provided an 
orthogonal signal to abundance, allowing for an ANI-based metagenome-wide 
association study for Parkinson disease (PD) against 289,232 genomes while 
confirming known butyrate-PD associations at the strain level. Sylph took <1 min 
and 16 GB of random-access memory to profile metagenomes against 85,205 
prokaryotic and 2,917,516 viral genomes, detecting 30-fold more viral sequences 
in the human gut compared to RefSeq. Sylph offers precise, efficient profiling 
with accurate containment ANI estimation even for low-coverage genomes.