2-AIN-505, 2-AIN-251: Seminar in Bioinformatics (1), (3)
Winter 2022
Abstrakt

Robert Edgar. Syncmers are more sensitive than minimizers for selecting conserved kmers inbiological sequences. PeerJ, 9:e10805. 2021.

Download preprint: not available

Download from publisher: https://peerj.com/articles/10805/ PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Minimizers are widely used to select subsets of fixed-length substrings (k-mers) 
from biological sequences in applications ranging from read mapping to taxonomy
prediction and indexing of large datasets. The minimizer of a string of w
consecutive k-mers is the k-mer with smallest value according to an ordering of
all k-mers. Syncmers are defined here as a family of alternative methods which
select k-mers by inspecting the position of the smallest-valued substring of
length s < k within the k-mer. For example, a closed syncmer is selected if its
smallest s-mer is at the start or end of the k-mer. At least one closed syncmer
must be found in every window of length (k - s) k-mers. Unlike a minimizer, a
syncmer is identified by its sequence alone, and is therefore synchronized in the
following sense: if a given k-mer is selected from one sequence, it will also be 
selected from any other sequence. Also, minimizers can be deleted by mutations in
flanking sequence, which cannot happen with syncmers. Experiments on minimizers
with parameters used in the minimap2 read mapper and Kraken taxonomy prediction
algorithm respectively show that syncmers can simultaneously achieve both lower
density and higher conservation compared to minimizers.