Bioinformatický seminár

Tue 20 Sep. 2011, 17:20

Title: Alkan et al. Genome-wide characterization of centromeric satellites from multiple mammalian genomes
Speaker: Tomáš Vinař

Despite its importance in cell biology and evolution, the centromere has
remained the final frontier in genome assembly and annotation due to its
complex repeat structure. However, isolation and characterization of the
centromeric repeats from newly sequenced species are necessary for a
complete understanding of genome evolution and function. In recent years,
various genomes have been sequenced, but the characterization of the
corresponding centromeric DNA has lagged behind. Here, we present a
computational method (RepeatNet) to systematically identify higher-order
repeat structures from unassembled whole-genome shotgun sequence and test
whether these sequence elements correspond to functional centromeric
sequences. We analyzed genome datasets from six species of mammals
representing the diversity of the mammalian lineage, namely, horse, dog,
elephant, armadillo, opossum, and platypus. We define candidate monomer
satellite repeats and demonstrate centromeric localization for five of the
six genomes. Our analysis revealed the greatest diversity of centromeric
sequences in horse and dog in contrast to elephant and armadillo, which
showed high-centromeric sequence homogeneity. We could not isolate
centromeric sequences within the platypus genome, suggesting that
centromeres in platypus are not enriched in satellite DNA. Our method can
be applied to the characterization of thousands of other vertebrate
genomes anticipated for sequencing in the near future, providing an
important tool for annotation of centromeres.