Jozef Sitarcik, Tomas Vinar, Brona Brejova, Werner Krampl, Jaroslav Budis, Jan Radvanszky, Maria Lucka. WarpSTR: Determining tandem repeat lengths using raw nanopore signals. Technical Report 2022.11.05.515275, bioRxiv, 2022.

Download preprint: not available

Download from publisher: https://doi.org/10.1101/2022.11.05.515275

Related web page: https://github.com/fmfi-compbio/warpstr

Bibliography entry: BibTeX

Abstract:

Motivation: Short tandem repeats (STRs) are regions of a genome containing 
many consecutive copies of the same short motif, possibly with small 
variations. Analysis of STRs has many clinical uses, but is limited by 
technology mainly due to STRs surpassing the used read length. Nanopore 
sequencing, as one of long read sequencing technologies, produces very long 
reads, thus offering more possibilities to study and analyze STRs. 
Basecalling of nanopore reads is however particularly unreliable in 
repeating regions, and therefore direct analysis from raw nanopore data is 
required. Results: Here we present WarpSTR, a novel method for 
characterizing both simple and complex tandem repeats directly from raw 
nanopore signals using a finite-state automaton and a search algorithm 
analogous to dynamic time warping. By applying this approach to determine 
the lengths of 241 STRs, we demonstrate that our approach decreases the 
mean absolute error of the STR length estimate compared to basecalling and 
STRique. Availability: WarpSTR is freely available at 
https://github.com/fmfi-compbio/warpstr