2-AIN-505, 2-AIN-251: Seminár z bioinformatiky (1) a (3)
Zima 2020
Abstrakt

Robert C. Edgar. Syncmers are more sensitive than minimizers for selecting conserved k-mers in biological sequences. Technical Report doi:10.1101/2020.09.29.319095, bioRxiv, 2020.

Download preprint: not available

Download from publisher: https://doi.org/10.1101/2020.09.29.319095

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Minimizers are widely used to select subsets of fixed-length substrings (k-
mers) from biological sequences in applications ranging from read mapping to 
taxonomy prediction and indexing of large datasets. Syncmers are an 
alternative method for selecting a subset of k-mers. Unlike a minimizer, a 
syncmer is identified by its k-mer sequence alone and is therefore 
synchronized in the following sense: if a given k-mer is selected from one 
sequence, it will also be selected from any other sequence. Bounded syncmers 
are defined by a small and fast function of the k-mer sequence which 
exploits correlations between overlapping k-mers to guarantee that at least 
one syncmer must appear in a window of predetermined length, and therefore 
comprise a universal hitting set which does not require a precomputed lookup 
table. Bounded syncmers are shown to be unambiguously superior to minimizers 
because they achieve both lower density and better conservation in mutated 
sequences.