Gil McVean
description
Transcript of Gil McVean
What makes us different?
Image: Wikimedia commons
The genetic axes
Strong
Weak
Inherited Somatic
Cancer
Complex disease
Genetic disorders
Aging
Images:Wikimedia commons
Characterising individual genomes
Image: Illumina Cambridge Ltd
Image: Wikimedia commonsImage: Wikimedia commons
Why 1000 genomes?
• To find all common (>5%) variants in the accessible human genome
• To find at least 95% of variants at 1% in populations of medical genetics interest– 95% of variants at 0.1% in genes
• To provide a fully public framework for interpreting rare genetic variation in the context of disease– Screening– Imputation
The 1000 Genomes Project
1000 Genomes Project design
Haplotypes2x
10x
Population sequencing
A map of shared variation
http://browser.1000genomes.org
www.1000genomes.org
Good, but not perfect
Variant type Validation methods Estimated FDR
Low-coverage SNPs Sequenom, 454, PacBio
1.8%
Exome SNPs 454 1.6%
LOF variants 454 5.2%
Short indels PCR, Sanger, array genotypes
36% -> 5.4%
Large deletions PCR, array CGH, SNP genotype
2.1%
Other large SVs PCR, array CGH, SNP genotype
1.4% – 3.7%
Post-hoc filtering
Not genotyped
4 million sites that differ from the human reference genome
12,000 changes to proteins
100 changes that knockout gene function5 rare
variants that are known to cause disease
Most variation is common – Most common variation is cosmopolitan
Number of variants in typical genome
Found only in Europe
0.3%
Found in all continents
92%
Found only in the UK
0.1%
Found only in you
0.002%
Imputation from 1000 Genomes
• Imputation similar for all variant types across populations• Comparable to imputation from high quality SNP haplotypes
…but it can work for common variants
The 1000 Genomes Sampling design
The 1000 Genomes Sampling design
What have we learned about low-frequency genetic variation from the 1000 Genomes Project?
• How many rare (<0.5%) and low-frequency (0.5-5%) variants are there, how does it vary between populations and what does it tell use about demography?
• To what extent has natural selection shaped the distribution of rare variants within and between populations?
• What are the implications of these findings for the interpretation of genetic variation in individual genomes?
Populations differ in load of rare and common variants
Most rare variation is private
Rare variant differentiation within ancestry groupings increases as variant frequency decreases
Not all populations are equal
Rare variants identify recent historical links between populations
48% of IBS variants shared with American populations
ASW shows stronger sharing with YRI than LWK
What about variants that affect gene function?
Conserved variant load per individual
The proportion of rare variants is predicted by conservation, with the exception of splice-disrupting and STOP+ variants
KEGG ‘pathways’ show variation in excess rare-variant load
Patterns of variation inform about selective constraint
CTCF-binding motif
Variants under selection showed elevated levels of population differentiation
Proportion of pairwise comparisons where nonsynonymous variants are more differentiated than synonymous ones
Rare variant differentiation can confound the genetic study of disease
Mathieson and McVean (2012)
Implications
• Rare variants have spatial and ancestry-related distributions that reflect recent demographic events and selection.
• Purifying selection elevates local differentiation of rare variants.
• The functional and aetiological interpretation of rare variants in the context of disease needs to be aware of the local genetic background.
AFRICA
Gambian in Western Division, The Gambia (GWD)
Malawian in Blantyre, Malawi (MAB)
Mende in Sierra Leone (MSL)
Esan in Nigeria (ESN)
SOUTH ASIAN
Punjabi in Lahore, Pakistan (PJL)
Bengali in Bangladesh (BEB)
Sri Lankan Tamil in the UK (STU)
Indian Telugu in the UK (ITU)
AMERICASAfrican American in Jackson, MS (AJM)
100
200
100
100
100
100
80
The final resource – mid 2013
What more could we learn about human population genetics?
• There is a need for continuing the programme of developing public resources describing genetic variation across new populations, with high resolution spatial information.– This will not just shed light on population history and selection, but be
important for interpreting (rare) genetic variation in individual genomes.
• The Phase 1 1000 Genomes data has made clear the extent of variation in conserved regulatory sequence within genomes– How does this relate to variation in function in different cell types?
• Many of the most interesting parts of the genome (for the study of selection) are still poorly-covered by HTS data– Need to collect ‘bespoke’ data types for some genomic regions
The 1000 Genomes Project Consortium
http://www.1000genomes.org/