Andrej Baláž, Alessia Petescia. Prefix-free graphs and suffix array construction in sublinear space. Technical Report 2306.14689v1, arXiv, 2023.

Download preprint: not available

Download from publisher: https://arxiv.org/pdf/2306.14689.pdf

Related web page: not available

Bibliography entry: BibTeX

Abstract:

A recent paradigm shift in bioinformatics from a single reference genome to 
a pangenome brought with it several graph structures. These graph structures 
must implement operations, such as efficient construction from multiple 
genomes and read mapping. Read mapping is a well-studied problem in 
sequential data, and, together with data structures such as suffix array and 
Burrows-Wheeler transform, allows for efficient computation. Attempts to 
achieve comparatively high performance on graphs bring many complications 
since the common data structures on strings are not easily obtainable for 
graphs. In this work, we introduce prefix-free graphs, a novel pangenomic 
data structure; we show how to construct them and how to use them to obtain 
well-known data structures from stringology in sublinear space, allowing for 
many efficient operations on pangenomes.