2-AIN-505, 2-AIN-251: Seminar in Bioinformatics (1), (3)
Winter 2022
Abstrakt

Kristoffer Sahlin. Effective sequence similarity detection with strobemers. Genome research, 31(11):2080-2094. 2021.

Download preprint: not available

Download from publisher: https://doi.org/10.1101%2Fgr.275648.121 PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

k-mer-based methods are widely used in bioinformatics for various types of
sequence comparisons. However, a single mutation will mutate k consecutive k-mers
and make most k-mer-based applications for sequence comparison sensitive to
variable mutation rates. Many techniques have been studied to overcome this
sensitivity, for example, spaced k-mers and k-mer permutation techniques, but
these techniques do not handle indels well. For indels, pairs or groups of small 
k-mers are commonly used, but these methods first produce k-mer matches, and only
in a second step, a pairing or grouping of k-mers is performed. Such techniques
produce many redundant k-mer matches owing to the size of k Here, we propose
strobemers as an alternative to k-mers for sequence comparison. Intuitively,
strobemers consist of two or more linked shorter k-mers, where the combination of
linked k-mers is decided by a hash function. We use simulated data to show that
strobemers provide more evenly distributed sequence matches and are less
sensitive to different mutation rates than k-mers and spaced k-mers. Strobemers
also produce higher match coverage across sequences. We further implement a
proof-of-concept sequence-matching tool StrobeMap and use synthetic and
biological Oxford Nanopore sequencing data to show the utility of using
strobemers for sequence comparison in different contexts such as sequence
clustering and alignment scenarios.