Bioinformatický seminár

Tue 22 Feb. 2011, 17:20

Title: Durbin et al. A map of human genome variation from population-scale sequencing
Speaker: Peter Kováč

The 1000 Genomes Project aims to provide a deep characterization of human
genome sequence variation as a foundation for investigating the
relationship between genotype and phenotype. Here we present results of
the pilot phase of the project, designed to develop and compare different
strategies for genome-wide sequencing with high-throughput platforms. We
undertook three projects: low-coverage whole-genome sequencing of 179
individuals from four populations; high-coverage sequencing of two
mother-father-child trios; and exon-targeted sequencing of 697 individuals
from seven populations. We describe the location, allele frequency and
local haplotype structure of approximately 15 million single nucleotide
polymorphisms, 1 million short insertions and deletions, and 20,000
structural variants, most of which were previously undescribed. We show
that, because we have catalogued the vast majority of common variation,
over 95% of the currently accessible variants found in any individual are
present in this data set. On average, each person is found to carry
approximately 250 to 300 loss-of-function variants in annotated genes and
50 to 100 variants previously implicated in inherited disorders. We
demonstrate how these results can be used to inform association and
functional studies. From the two trios, we directly estimate the rate of
de novo germline base substitution mutations to be approximately 10(-8)
per base pair per generation. We explore the data with regard to
signatures of natural selection, and identify a marked reduction of
genetic variation in the neighbourhood of genes, due to selection at
linked sites. These methods and public data will support the next phase of
human genetic research.