Martin Kravec, Martin Bobák, Broňa Brejová, Tomáš Vinař. Variants of Genes from the Next Generation Sequencing Data. In Tomáš Vinař, ed., Information Technologies - Applications and Theory (ITAT), 1003 volume of CEUR-WS, pp. 44-51, 2013.

Download preprint: not available

Download from publisher:

Related web page: not available

Bibliography entry: BibTeX


A typical next generation sequencing platform
produces DNA sequence data of a very fragmented nature,
consisting of many short overlapping reads that need to be
assembled into a longer DNA sequence by an assembly
program. There are many families of genes that evolve by
gene duplication. In the genomes, such genes have several
copies that are very similar to each other, and therefore
they pose a difficult challenge for sequence assembly pro-
grams. In this paper, we present a method for recovering
variants of genes from the NGS data without need for as-
sembly of the whole DNA sequence. We show that our
problem is NP-hard, but also demonstrate that in practice
it can be solved by integer linear programming.