Brona Brejova, Michal Burger, Tomas Vinar.
Automated Segmentation of DNA Sequences with Complex Evolutionary Histories.
In Teresa M. Przytycka, Marie-France Sagot, ed.,
Algorithms in Bioinformatics, 11th International Workshop (WABI),
6833 volume of Lecture Notes in Computer Science,
pp. 1-13, Saarbrücken, Germany, September 2011. Springer.
Download from publisher | Webpage | BibTeX
Most algorithms for reconstruction of evolutionary histories involving large-scale events such as duplications, deletions or rearrangements, work on sequences of predetermined markers, for example protein coding genes or other functional elements. However, markers defined in this way ignore information included in non-coding sequences, are prone to errors in annotation, and may even introduce artifacts due to partial gene copies or chimeric genes. We propose the problem of sequence segmentation where the goal is to automatically select suitable markers based on sequence homology alone. We design an algorithm for this problem which can tolerate certain amount of inaccuracies in the input alignments and still produce segmentation of the sequence to markers with high coverage and accuracy. We test our algorithm on several artificial and real data sets representing complex clusters of segmental duplications. Our software is available at http://compbio.fmph.uniba.sk/atomizer/