2-AIN-506, 2-AIN-252: Seminar in Bioinformatics (2), (4)
Summer 2025
Abstrakt

Pedro Tomaz da Silva, Alexander Karollus, Johannes Hingerl, Gihanna Sta Teresa Galindez, Nils Wagner, Xavier Hernandez-Alias, Danny Incarnato, Julien Gagneur. Nucleotide dependency analysis of genomic language models detects functional elements. Nature genetics, 57(10):2589-2602. 2025.

Download preprint: not available

Download from publisher: https://doi.org/10.1038/s41588-025-02347-3 PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Deciphering how nucleotides in genomes encode regulatory instructions and 
molecular machines is a long-standing goal. Genomic language models (gLMs) 
implicitly capture functional elements and their organization from genomic 
sequences alone by modeling probabilities of each nucleotide given its sequence 
context. However, discovering functional genomic elements from gLMs has been 
challenging due to the lack of interpretable methods. Here we introduce 
nucleotide dependencies, which quantify how nucleotide substitutions at one 
genomic position affect the probabilities of nucleotides at other positions. We 
demonstrate that nucleotide dependencies are more effective at indicating the 
deleteriousness of genetic variants than alignment-based conservation and gLM 
reconstruction. Dependency analysis accurately detects regulatory motifs and 
highlights bases in contact within RNAs, including pseudoknots and tertiary 
structure contacts, revealing new, experimentally validated RNA structures. 
Finally, we leverage dependency maps to reveal critical limitations of several 
gLM architectures and training strategies. Altogether, nucleotide dependency 
analysis opens a new avenue for discovering and studying functional elements and 
their interactions in genomes.