Martin Kravec, Martin Bobák, Broňa Brejová, Tomáš Vinař. Variants of Genes from the Next Generation Sequencing Data. In Tomáš Vinař, ed., Information Technologies - Applications and Theory (ITAT), 1003 volume of CEUR-WS, pp. 44-51, 2013.
Download preprint: not available
Download from publisher: http://ceur-ws.org/Vol-1003/44.pdf
Related web page: not available
Bibliography entry: BibTeX
A typical next generation sequencing platform produces DNA sequence data of a very fragmented nature, consisting of many short overlapping reads that need to be assembled into a longer DNA sequence by an assembly program. There are many families of genes that evolve by gene duplication. In the genomes, such genes have several copies that are very similar to each other, and therefore they pose a difficult challenge for sequence assembly pro- grams. In this paper, we present a method for recovering variants of genes from the NGS data without need for as- sembly of the whole DNA sequence. We show that our problem is NP-hard, but also demonstrate that in practice it can be solved by integer linear programming.