Vladimír Boža, Jakub Jursa, Tomáš Vinař, Broňa Brejová. Fishing in Read Collections: Memory Efficient Indexing for Sequence Assembly. In Costas S. Iliopoulos, Simon J. Puglisi, Emine Yilmaz, ed., String Processing and Information Retrieval (SPIRE), 9309 volume of Lecture Notes in Computer Science, pp. 188-198, London, UK, September 2015. Springer.
Download preprint: not available
Download from publisher: http://dx.doi.org/10.1007/978-3-319-23826-5_19
Related web page: not available
Bibliography entry: BibTeX
Abstract:
In this paper, we present a memory efficient index for storing a large set of DNA sequencing reads. The index allows us to quickly retrieve the set of reads containing a certain query k-mer. Instead of the usual approach of treating each read as a separate string, we take an advantage of significant overlap between reads and compress the data by aligning the reads to an approximate superstring constructed specifically for this purpose in combination with several succint data structures.