Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome...

Lessons learnt from the 1000 Genomes Project about sequencing in populations

Gil McVeanWellcome Trust Centre for Human Genetics and

Department of Statistics, University of Oxford

Some questions

• What has the 1000 Genomes Project told us about how to sequence (in) populations

• What has the 1000 Genomes Project told us about populations

Samples for the 1000 Genomes Project

Major population groups comprised of subpopulations of c. 100 each

GBRFIN

TSIIBS

JPTCHB

KHVGWB

ASW AJM

Samples from S. Asia

The role of the 1000G Project in medical genetics

• A catalogue of variants– 95% of variants at 1% frequency in populations of interest

• A representation of ‘normal’ variation

• A set of haplotypes for imputation into GWAS

• A training ground for sequencing/statistical/computational technologies

JPTCHB

CHS*YRI

*Exon pilot only

Samples for the 1000 Genomes Project: Pilot

Population-scale genome sequencing

Haplotypes2x

What has the project generated?

>15 million SNPs, >50% of them novel

dbSNP entries increased by 70%

An huge increase in the set of structural variants

A robust and modular pipeline for analysis of population-scale sequence data

An efficient format for storing aligned reads and a set of tools to manipulate and view the files

• SAM/BAM format for storing (aligned) reads

Bioinformatics (2009) http://samtools.sourceforge.net

An information-rich format for storing generic haplotype/genotype data and tools for manipulating the files

http://vcftools.sourceforge.net

An understanding of the ‘rare functional variant load’ carried by individuals

c. 250 LOF / personc. 75 HGMD DM

• Mutations cause with Usher syndrome

• 66 missense variants in dbSNP• 2/3 detected in 1000 Genomes Pilot• One HGMD ‘disease-causing’ variant homozygous in 3 YRI

– Other reports indicate this is not a real disease-causing variant

Samples for the 1000 Genomes Project: Phase1

GBRFIN

JPTCHB

CHSYRI

Lessons learnt about sequencing in populations

Lesson 1.

The low-coverage model works for variant discovery

A near complete record of common variants

Lesson 2.

The low coverage model works for SNP genotyping

A set of accurate genotypes/haplotypes

Lesson 3.

The genome has a large grey area where variant calling is hard

Lesson 4.

Joint calling of different variant types substantially improves the

quality of calls

Lesson 5.

Managing uncertainty is important

Lesson 6.

Data visualisation is key

Lessons learnt about populations

Closely related populations can have substantially different rare

variants

Spatial heterogeneity in non-genetic risk can differentially confound association studies for rare and common variants

Iain Mathieson

Thanks to the many...

• Steering committee– Co-chairs: Richard Durbin and David Altshuler

• Samples and ELSI Committee– Co-chairs: Aravinda Chakravarti and Leena Peltonen

• Data Production Group– Co-chairs: Elaine Mardis and Stacey Gabriel

• Analysis Group– Co-Chairs: Gil McVean and Goncalo Abecasis– Subgroups in gene-targeted sequencing (Richard Gibbs) and population genetics (Molly Przeworski)

• Structural Variation Group– Co-chairs: Matt Hurles, Charles Lee and Evan Eichler

• DCC– Co-Chairs: Paul Flicek and Steve Sherry

Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome...

Documents

Transcript of Lessons learnt from the 1000 Genomes Project about sequencing in populations Gil McVean Wellcome...

Introduction to Genomes with Ensembl - Tufts Universitysites.tufts.edu/cbi/files/2013/01/Introduction2ENSEMBL.pdf · 2013-01-14 · Ensembl Paul Flicek (EBI), Steve Searle (Wellcome

Wellcome History 47

Wellcome lamola

Wellcome News 61

Wellcome Trust Workshop Working with Pathogen Genomes Module 1 Artemis.

January 2019 - Wellcome

FuzzyPath Assemblies - from Bacterial to Mammalian Genomes and Zebrafish Finishing Zemin Ning The Wellcome Trust Sanger Institute.

2012 wellcome-talk

FuzzyPath Assemblies - from Mixed Solexa/454 Datasets to Extremely GC Biased Genomes Zemin Ning The Wellcome Trust Sanger Institute.

Human origins and evolution Gil McVean, Department of Statistics, Oxford.

Wellcome Trust Workshop Working with Pathogen Genomes Module 2 Gene Prediction.

1 Population Genomics Gil McVean, Department of Statistics, Oxford.

Wellcome History 46

Wellcome News 66

The tangled genome Gil McVean. The real heroes.

100,000 Genomes & Genomics England100,000 Genomes & Genomics England . Tim Hubbard . Genomics England . King’s College London, King’s Health Partners . Wellcome Trust Sanger Institute

The Human Genomes Gil McVean, Department of Statistics, Oxford.

1 of 42 Browsing Genes and Genomes with Ensembl Bert Overduin Ensembl User Support EMBL Outstation European Bioinformatics Institute Wellcome Trust Genome.

The 1000 Genomes Project Gil McVean Department of Statistics, Oxford.

StratPlan2005 2010 - Wellcome