Adrian Goga, Andrej Balaz, Alessia Petescia, Travis Gagie. MARIA: Multiple-alignment r-index with aggregation. Technical Report 2209.09218, arXiv, 2022.

Download preprint: not available

Download from publisher: https://doi.org/10.48550/arXiv.2209.09218

Related web page: not available

Bibliography entry: BibTeX

Abstract:

There now exist compact indexes that can efficiently list all the 
occurrences of a pattern in a dataset consisting of thousands of genomes, or 
even all the occurrences of all the pattern's maximal exact matches (MEMs) 
with respect to the dataset. Unless we are lucky and the pattern is specific 
to only a few genomes, however, we could be swamped by hundreds of matches -
- or even hundreds per MEM -- only to discover that most or all of the 
matches are to substrings that occupy the same few columns in a multiple 
alignment. To address this issue, in this paper we present a simple and 
compact data index MARIA that stores a multiple alignment such that, given 
the position of one match of a pattern (or a MEM or other substring of a 
pattern) and its length, we can quickly list all the distinct columns of the 
multiple alignment where matches start.