Gil McVean

Gil McVean

http://www.well.ox.ac.uk/

http://www.ndm.ox.ac.uk/

http://www.ox.ac.uk/

What makes us different?

Image: Wikimedia commons

The genetic axes

Strong

Weak

Inherited Somatic

Cancer

Complex disease

Genetic disorders

Aging

Images:Wikimedia commons

Characterising individual genomes

Image: Illumina Cambridge Ltd

Image: Wikimedia commonsImage: Wikimedia commons

Why 1000 genomes?

• To find all common (>5%) variants in the accessible human genome

• To find at least 95% of variants at 1% in populations of medical genetics interest– 95% of variants at 0.1% in genes

• To provide a fully public framework for interpreting rare genetic variation in the context of disease– Screening– Imputation

The 1000 Genomes Project

1000 Genomes Project design

Haplotypes2x

10x

Population sequencing

A map of shared variation

http://browser.1000genomes.org

www.1000genomes.org

Good, but not perfect

Variant type Validation methods Estimated FDR

Low-coverage SNPs Sequenom, 454, PacBio

1.8%

Exome SNPs 454 1.6%

LOF variants 454 5.2%

Short indels PCR, Sanger, array genotypes

36% -> 5.4%

Large deletions PCR, array CGH, SNP genotype

2.1%

Other large SVs PCR, array CGH, SNP genotype

1.4% – 3.7%

Post-hoc filtering

Not genotyped

4 million sites that differ from the human reference genome

12,000 changes to proteins

100 changes that knockout gene function5 rare

variants that are known to cause disease

Most variation is common – Most common variation is cosmopolitan

Number of variants in typical genome

Found only in Europe

0.3%

Found in all continents

92%

Found only in the UK

0.1%

Found only in you

0.002%

Imputation from 1000 Genomes

• Imputation similar for all variant types across populations• Comparable to imputation from high quality SNP haplotypes

…but it can work for common variants

The 1000 Genomes Sampling design

What have we learned about low-frequency genetic variation from the 1000 Genomes Project?

• How many rare (<0.5%) and low-frequency (0.5-5%) variants are there, how does it vary between populations and what does it tell use about demography?

• To what extent has natural selection shaped the distribution of rare variants within and between populations?

• What are the implications of these findings for the interpretation of genetic variation in individual genomes?

Populations differ in load of rare and common variants

Most rare variation is private

Rare variant differentiation within ancestry groupings increases as variant frequency decreases

Not all populations are equal

Rare variants identify recent historical links between populations

48% of IBS variants shared with American populations

ASW shows stronger sharing with YRI than LWK

What about variants that affect gene function?

Conserved variant load per individual

The proportion of rare variants is predicted by conservation, with the exception of splice-disrupting and STOP+ variants

KEGG ‘pathways’ show variation in excess rare-variant load

Patterns of variation inform about selective constraint

CTCF-binding motif

Variants under selection showed elevated levels of population differentiation

Proportion of pairwise comparisons where nonsynonymous variants are more differentiated than synonymous ones

Rare variant differentiation can confound the genetic study of disease

Mathieson and McVean (2012)

Implications

• Rare variants have spatial and ancestry-related distributions that reflect recent demographic events and selection.

• Purifying selection elevates local differentiation of rare variants.

• The functional and aetiological interpretation of rare variants in the context of disease needs to be aware of the local genetic background.

AFRICA

Gambian in Western Division, The Gambia (GWD)

Malawian in Blantyre, Malawi (MAB)

Mende in Sierra Leone (MSL)

Esan in Nigeria (ESN)

SOUTH ASIAN

Punjabi in Lahore, Pakistan (PJL)

Bengali in Bangladesh (BEB)

Sri Lankan Tamil in the UK (STU)

Indian Telugu in the UK (ITU)

AMERICASAfrican American in Jackson, MS (AJM)

100

200

100

100

100

100

80

The final resource – mid 2013

What more could we learn about human population genetics?

• There is a need for continuing the programme of developing public resources describing genetic variation across new populations, with high resolution spatial information.– This will not just shed light on population history and selection, but be

important for interpreting (rare) genetic variation in individual genomes.

• The Phase 1 1000 Genomes data has made clear the extent of variation in conserved regulatory sequence within genomes– How does this relate to variation in function in different cell types?

• Many of the most interesting parts of the genome (for the study of selection) are still poorly-covered by HTS data– Need to collect ‘bespoke’ data types for some genomic regions

The 1000 Genomes Project Consortium

http://www.1000genomes.org/

Gil McVean

Documents

Transcript of Gil McVean