2-AIN-506, 2-AIN-252: Seminar in Bioinformatics (2), (4)
Summer 2024

Shuvom Sadhuka, Daniel Fridman, Bonnie Berger, Hyunghoon Cho. Assessing transcriptomic reidentification risks using discriminative sequence models. Genome Research, 33(7):1101-1112. 2023.

Download preprint: not available

Download from publisher: https://doi.org/10.1101/gr.277699.123 PubMed

Related web page: not available

Bibliography entry: BibTeX


Gene expression data provide molecular insights into the functional impact of 
genetic variation, for example, through expression quantitative trait loci 
(eQTLs). With an improving understanding of the association between genotypes and 
gene expression comes a greater concern that gene expression profiles could be 
matched to genotype profiles of the same individuals in another data set, known 
as a linking attack. Prior works show such a risk could analyze only a fraction 
of eQTLs that is independent owing to restrictive model assumptions, leaving the 
full extent of this risk incompletely understood. To address this challenge, we 
introduce the discriminative sequence model (DSM), a novel probabilistic 
framework for predicting a sequence of genotypes based on gene expression data. 
By modeling the joint distribution over all known eQTLs in a genomic region, DSM 
improves the power of linking attacks with necessary calibration for linkage 
disequilibrium and redundant predictive signals. We show greater linking accuracy 
of DSM compared with existing approaches across a range of attack scenarios and 
data sets including up to 22,288 individuals, suggesting that DSM helps uncover a 
substantial additional risk overlooked by previous studies. Our work provides a 
unified framework for assessing the privacy risks of sharing diverse omics data 
sets beyond transcriptomics.