2-AIN-506 a 2-AIN-252: Seminár z bioinformatiky (2) a (4)
Leto 2015
Abstrakt

Beth L. Dumont, Evan E. Eichler. Signals of historical interlocus gene conversion in human segmental duplications. PLoS One, 8(10):e75949. 2013.

Download preprint: not available

Download from publisher: not available PubMed

Related web page: not available

Bibliography entry: BibTeX

Abstract:

Standard methods of DNA sequence analysis assume that sequences evolve
independently, yet this assumption may not be appropriate for segmental
duplications that exchange variants via interlocus gene conversion (IGC). Here,
we use high quality multiple sequence alignments from well-annotated segmental
duplications to systematically identify IGC signals in the human reference
genome. Our analysis combines two complementary methods: (i) a paralog quartet
method that uses DNA sequence simulations to identify a statistical excess of
sites consistent with inter-paralog exchange, and (ii) the alignment-based method
implemented in the GENECONV program. One-quarter (25.4%) of the paralog families 
in our analysis harbor clear IGC signals by the quartet approach. Using GENECONV,
we identify 1477 gene conversion tracks that cumulatively span 1.54 Mb of the
genome. Our analyses confirm the previously reported high rates of IGC in
subtelomeric regions and Y-chromosome palindromes, and identify multiple novel
IGC hotspots, including the pregnancy specific glycoproteins and the
neuroblastoma breakpoint gene families. Although the duplication history of a
paralog family is described by a single tree, we show that IGC has introduced
incredible site-to-site variation in the evolutionary relationships among
paralogs in the human genome. Our findings indicate that IGC has left significant
footprints in patterns of sequence diversity across segmental duplications in the
human genome, out-pacing the contributions of single base mutation by orders of
magnitude. Collectively, the IGC signals we report comprise a catalog that will
provide a critical reference for interpreting observed patterns of DNA sequence
variation across duplicated genomic regions, including targets of recent adaptive
evolution in humans.