2-AIN-506 a 2-AIN-252: Seminár z bioinformatiky (2) a (4)
Leto 2018

Daniel John Lawson, Garrett Hellenthal, Simon Myers, Daniel Falush. Inference of population structure using dense haplotype data. PLoS Genet, 8(1):e1002453. 2012.

Download preprint: not available

Download from publisher: http://dx.plos.org/10.1371/journal.pgen.1002453 PubMed

Related web page: not available

Bibliography entry: BibTeX


The advent of genome-wide dense variation data provides an opportunity to
investigate ancestry in unprecedented detail, but presents new statistical
challenges. We propose a novel inference framework that aims to efficiently
capture information on population structure provided by patterns of haplotype
similarity. Each individual in a sample is considered in turn as a recipient,
whose chromosomes are reconstructed using chunks of DNA donated by the other
individuals. Results of this \"chromosome painting\" can be summarized as a
\"coancestry matrix,\" which directly reveals key information about ancestral
relationships among individuals. If markers are viewed as independent, we show
that this matrix almost completely captures the information used by both standard
Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE 
in a unified manner. Furthermore, when markers are in linkage disequilibrium, the
matrix combines information across successive markers to increase the ability to 
discern fine-scale population structure using PCA. In parallel, we have developed
an efficient model-based approach to identify discrete populations using this
matrix, which offers advantages over PCA in terms of interpretability and over
existing clustering algorithms in terms of speed, number of separable
populations, and sensitivity to subtle population structure. We analyse Human
Genome Diversity Panel data for 938 individuals and 641,000 markers, and we
identify 226 populations reflecting differences on continental, regional, local, 
and family scales. We present multiple lines of evidence that, while many methods
capture similar information among strongly differentiated groups, more subtle
population structure in human populations is consistently present at a much finer
level than currently available geographic labels and is only captured by the
haplotype-based approach. The software used for this article, ChromoPainter and
fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.