Jaroslav Budis, Marcel Kucharik, Frantisek Duris, Juraj Gazdarica, Michaela Zrubcova, Andrej Ficek, Tomas Szemes, Brona Brejova, Jan Radvanszky. Dante: genotyping of known complex and expanded short tandem repeats. Bioinformatics, 35(8):1310-1317. 2019.

Download preprint: not available

Download from publisher: https://doi.org/10.1093/bioinformatics/bty791

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Motivation: Short tandem repeats (STRs) are stretches of repetitive DNA in which 
short sequences, typically made of 2-6 nucleotides, are repeated several times.
Since STRs have many important biological roles and also belong to the most
polymorphic parts of the human genome, they became utilized in several
molecular-genetic applications. Precise genotyping of STR alleles, therefore, was
of high relevance during the last decades. Despite this, massively parallel
sequencing (MPS) still lacks the analysis methods to fully utilize the
information value of STRs in genome scale assays. Results: We propose an
alignment-free algorithm, called Dante, for genotyping and characterization of
STR alleles at user-specified known loci based on sequence reads originating from
STR loci of interest. The method accounts for natural deviations from the
expected sequence, such as variation in the repeat count, sequencing errors,
ambiguous bases, and complex loci containing several different motifs. In
addition, we implemented a correction for copy number defects caused by the
polymerase induced stutter effect as well as a prediction of STR expansions that,
according to the conventional view, cannot be fully captured by inherently short 
MPS reads. We tested Dante on simulated data sets and on data sets obtained by
targeted sequencing of protein coding parts of thousands of selected clinically
relevant genes. In both these data sets, Dante outperformed HipSTR and GATK
genotyping tools. Furthermore, Dante was able to predict allele expansions in all
tested clinical cases. Availability: Dante is open source software, freely
available for download at https://github.com/jbudis/dante. Supplementary
Information: Supplementary data are available at Bioinformatics online.