Jaroslav Budis, Werner Krampl, Marcel Kucharik, Rastislav Hekel, Adrian Goga, Jozef Sitarcik, Michal Lichvar, David Smol'ak, Miroslav Bohmer, Andrej Balaz, Frantisek Duris, Juraj Gazdarica, Katarina Soltys, Jan Turna, Jan Radvanszky, Tomas Szemes. SnakeLines: integrated set of computational pipelines for sequencing reads. Journal of Integrative Bioinformatics, 2023. Online ahead of print.

Download preprint: not available

Download from publisher: https://dx.doi.org/10.1515/jib-2022-0059

Related web page: not available

Bibliography entry: BibTeX

Abstract:

With the rapid growth of massively parallel sequencing technologies, still more 
laboratories are utilising sequenced DNA fragments for genomic analyses. 
Interpretation of sequencing data is, however, strongly dependent on 
bioinformatics processing, which is often too demanding for clinicians and 
researchers without a computational background. Another problem represents the 
reproducibility of computational analyses across separated computational centres 
with inconsistent versions of installed libraries and bioinformatics tools. We 
propose an easily extensible set of computational pipelines, called SnakeLines, 
for processing sequencing reads; including mapping, assembly, variant calling, 
viral identification, transcriptomics, and metagenomics analysis. Individual 
steps of an analysis, along with methods and their parameters can be readily 
modified in a single configuration file. Provided pipelines are embedded in 
virtual environments that ensure isolation of required resources from the host 
operating system, rapid deployment, and reproducibility of analysis across 
different Unix-based platforms. SnakeLines is a powerful framework for the 
automation of bioinformatics analyses, with emphasis on a simple set-up, 
modifications, extensibility, and reproducibility. The framework is already 
routinely used in various research projects and their applications, especially in 
the Slovak national surveillance of SARS-CoV-2.