2-AIN-505, 2-AIN-251: Seminar in Bioinformatics (1), (3)
Winter 2022
Abstrakt

Wontack Han, Haixu Tang, Yuzhen Ye. Locality-Sensitive Hashing-Based k-Mer Clustering for Identification ofDifferential Microbial Markers Related to Host Phenotype. Journal of computational biology : a journal of computational molecular cell biology, 29(7):738-751. 2022.

Download preprint: not available

Download from publisher: https://www.liebertpub.com/doi/10.1089/cmb.2021.0640 PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Microbial organisms play important roles in many aspects of human health and
diseases. Encouraged by the numerous studies that show the association between
microbiomes and human diseases, computational and machine learning methods have
been recently developed to generate and utilize microbiome features for
prediction of host phenotypes such as disease versus healthy cancer immunotherapy
responder versus nonresponder. We have previously developed a subtractive
assembly approach, which focuses on extraction and assembly of differential reads
from metagenomic data sets that are likely sampled from differential genomes or
genes between two groups of microbiome data sets (e.g., healthy vs. disease). In 
this article, we further improved our subtractive assembly approach by utilizing 
groups of k-mers with similar abundance profiles across multiple samples. We
implemented a locality-sensitive hashing (LSH)-enabled approach (called
kmerLSHSA) to group billions of k-mers into k-mer coabundance groups (kCAGs),
which were subsequently used for the retrieval of differential kCAGs for
subtractive assembly. Testing of the kmerLSHSA approach on simulated data sets
and real microbiome data sets showed that, compared with the conventional
approach that utilizes all genes, our approach can quickly identify differential 
genes that can be used for building promising predictive models for
microbiome-based host phenotype prediction. We also discussed other potential
applications of LSH-enabled clustering of k-mers according to their abundance
profiles across multiple microbiome samples.