1 Population Genomics Gil McVean, Department of Statistics, Oxford.

41
1 Population Genomics Gil McVean, Department of Statistics, Oxford
  • date post

    20-Jan-2016
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

Page 1: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

1

Population Genomics

Gil McVean, Department of Statistics, Oxford

Page 2: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

2

Questions about genetic variation

• How different are our genomes?

• How is the variation distributed within and between genomes?

• What does variation tell us about human evolution?

Page 3: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

3

How different are our genomes?

Page 4: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

4

Serological techniques for detecting variation

Human

Rabbit

A

A B AB O

Page 5: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

5

Blood group systems in humans

• 28 known systems– 39 genes, 643 alleles

System Genes Alleles

ABO ABO 102

Colton C4A, C4B 7+

Chido-rodgers AQP1 7

Colton DAF 10

Diego SLC4A1 78

Dombrock DO 9

Duffy FY 9

Gerbich GYPC 9

GIL AQP3 2

H/h FUT1, FUT2 27/22

I GCNT2 7

Indian CD44 2

Kell KEL, XK 33/30

Kidd SLC14A1 8

Knops CR1 24+

Landsteiner-Wiener

ICAM4 3

Lewis FUT3, FUT6 14/20

Lutheran LU 16

MNS GYPA,GYPB,GYPE

43

OK BSG 2

P-related A4GALT, B3GALT3

14/5

RAPH-MER2 CD151 3

Rh RHCE, RHD, RHAG

129

Scianna ERMAP 4

Xg XG, CD99 -

YT ACHE 4

http://www.bioc.aecom.yu.edu/bgmut/summary.htm

Page 6: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

6

Protein electroporesis

• Changes in mass/charge ratio resulting from amino acid substitutions in proteins can be detected

• In humans, about 30% of all loci show polymorphism with a 6% chance of a pair of randomly drawn alleles at a locus being different

+++

--- - --

++--- - --- - +

Starch or agar gel

Direction of travel

Lewontin and Hubby (1966)

Harris(1966)

Page 7: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

7

The rise of DNA sequence analysis

• RFLPs– Cann et al 1987

• Sequencing of small regions– Vigilant et al 1991

• Whole genome sequencing– Ingman et al 2000

Page 8: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

8

The human genomes…

• The draft human genome sequence was published in 2001– This is a mosaic from several individuals

• Since then, several more genomes have been sequenced, at least partially– Shotgun sequencing variation discovery

• Other methods have been developed to look for gross chromosomal differences

Nimblegen array CGH

Page 9: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

9

The International HapMap Project

• Launched in 2002 with the goal of characterising single nucleotide variation between 540 human genomes from individuals of European, Nigerian, Chinese and Japanese ancestry

• Not a sequencing project, rather it types known polymorphisms

• Has currently assembled information on over 6 million SNPs (single nucleotide polymorphisms)

Page 10: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

10

The 1000 Genomes Project

Page 11: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

11

How do we differ? – Let me count the ways

• Single nucleotide polymorphisms– 1 every few hundred bp

• Short indels (=insertion/deletion)– 1 every few kb

• Microsatellite (STR) repeat number– 1 every few kb

• Minisatellites– 1 every few kb

• Repeated genes– rRNA, histones

• Large inversions, deletions– Y chromosome, Copy Number Variants (CNVs)

TGCATTGCGTAGGCTGCATTCCGTAGGC

TGCATT---TAGGCTGCATTCCGTAGGC

TGCTCATCATCATCAGCTGCTCATCA------GC

≤100bp

1-5kb

Page 12: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

12

Y chromosome variation

• Non-pathological rearrangements of the AZFc region on the Y chromosome

Tyler-Smith and McVean (2003)

Page 13: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

13

Mutation is the ultimate source of variation

• New mutations occur in the germ-line

• Point mutations at about 2x10-8 per nucleotide per generation– You pass on about 60 new mutations to your children, of which perhaps 1

changes the protein sequence encoded by a gene

• Microsatellite mutations can occur much faster– Up to 10-4 per generation– Some, e.g. in Huntington’s disease, have important consequences

• Minisatellites can mutate at rates of up to 10-1 per generation– The uniqueness of these patterns gives rise to DNA fingerprinting

• Most of the differences between genomes are the result of inheriting mutations from our ancestors

Page 14: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

14

Our genomesInherited mutations

Our genealogical tree

Mutations in our ancestors

Page 15: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

15

Different, but not that different

• Humans are one of the least diverse organisms (excepting cheetahs)

Species Diversity (percent)

Humans 0.08 - 0.1

Chimpanzees 0.12 - 0.17

Drosophila simulans 2

E. coli 5

HIV1 30

Photos from UN photo gallery www.un.org/av/photo

Page 16: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

16

An aside on the genetics of race

• It is sometimes claimed that there is a ‘genetic basis to race’

• What is true is that groups of individuals from different parts of the world tend to have similar genomes because they share recent ancestry

• But there are very few ‘fixed’ genetic differences between populations (I can think of one example – the FY gene)

• The differences between populations are in terms of the combinations of variants,

Rosenberg et al (2002)

Page 17: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

17

How is genetic variation distributed within and between genomes?

Page 18: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

18

Diversity is not evenly distributed across the genome I

Genome Average pairwise differences / kb

Relative copy number ()

Autosomes 0.5 – 0.85 1

X chromosome 0.47 3/4

Y chromosome 0.15 1/4

mtDNA 2.8 1/4

TISMWG (2001) , Jobling et al (2004)

• Autosomes, sex chromosomes and mtDNA have systematically different levels of diversity

• This reflects differences in the number of chromosomes and the mutation rate

Page 19: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

19

Diversity is not evenly distributed across the genome II

TISMWG (2001)

Chromosome 6

HLA

• There are fluctuations in the level of variation across the genome

Page 20: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

20

Diversity is not evenly distributed across genes I

• Purifying selection eliminates deleterious mutations and reduces diversity in regions of strong functional constraint

0123456789

Intergenic Intronic Exonic UTR

SN

Ps

pe

r 1

0k

b

Zhao et al (2003)

Page 21: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

21

Diversity is not evenly distributed across genes II

• Adaptive evolution ‘wipes out’ diversity nearby due to the hitch-hiking effects of a selective sweep

– e.g. Duffy-null locus in sub-Saharn africa, protects against P. vivax

Pop1

Pop2

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0C 0 0 0 T 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 T 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 C 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 C 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

C 0 0 0 T 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0C 0 0 0 T 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 T 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 T 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 T 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 G 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0

0 G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 00 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 G 0

C 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 G 0

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 0

0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 0

0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 00 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 A

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 A0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 A

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 AC 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0

C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0

C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0

C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0

C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0

C 0 0 0 T T -1 G 0 0 T T 0 0 0 C 0 0 0 0 0 0 0C 0 0 0 T T -1 G 0 0 T T 0 0 0 C 0 0 0 0 0 0 0

C 0 0 0 T T -1 G 0 0 T T 0 0 0 C 0 0 0 0 0 0 00 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 C 0 0 0 0 G 0 0

0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 C 0 0 0 0 G 0 0

European

African

FY*O mutation

Ancestral alleleDerived alleleMissing dataHamblin and Di Rienzo (2000)

Page 22: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

22

Diversity is not evenly distributed across genes III

• Some genes are under balancing or diversifying selection, where diversity is actively selected for

– MHC complex: heterozygote advantage and frequency-dependent selection driven by recognition of pathogens

Horton et al (1998)

Page 23: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

23

Diversity is not evenly distributed across populations I

• African populations are more diverse than non-African populations– More polymorphisms– Polymorphisms at less skewed frequencies

• Differences reflect bottlenecks associated with the colonisation from Africa c.65 KYA

Population Segregating sites per kb (n = 30)

Diversity per kb

Tajima D statistic

Hausa (African)

4.8 0.11 -0.33

Italian 3.2 0.10 1.18

Chinese 3.0 0.07 1.19

Frisse et al (2001)

Page 24: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

24

mtDNA phylogeography

Ingman et al (2000)

African

Non-African

Page 25: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

25

The colonisation process as inferred from mtDNA variation

Page 26: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

26

What does genetic variation tell us about human evolution?

• Modern humans appear in the fossil record about 200K years ago

• The mitochondrial Eve dates back to about 150K years ago

• The Y-chromosome Adam dates back to about 70K years ago

• For most of our genome, however, the common ancestor is about 500K – 1M years ago

– This predates the origin of Homo sapiens considerably

Page 27: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

27

Human – chimp split

Autosomal MRCA

Origin of H. sapiens

Page 28: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

28

Did early humans interbreed with Neanderthals?

Ovchinnikov et al (2000)

Neanderthals

mtDNA sequences say no…

Page 29: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

29

But…

• There is some evidence for this in the presence of unusual haplotypes found in Europe composed of SNPs not found in non-European populations

Plagnol and Wall (2006)

Page 30: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

30

Deeper trees in the human genome

• There is growing evidence that some regions of our genome have truly ancient common ancestors

• Dystrophin has an ancient haplotype found primarily outside Africa suggesting a colonisation of >160KYA

• There is an inversion found primarily in Europeans that is roughly 3MY old

Stefansson et al (2005)

Haplotype 1

Haplotype 2

Page 31: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

31

What are the genetic differences that make us human?

Page 32: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

32

Chromosomal changes

• Human chromosome 2 is a fusion of two chromosomes in great apes

• There are several inversion differences between the chromosomes

Feuk et al (2005)

Page 33: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

33

Gene loss

• Loss of enzymes that make sialic acid

– Sugar on cell surface that mediates a variety of recognition events involving pathogenic microbes and toxins

• Myosin heavy chain– Associated with

gracilization

Wang et al (2006)

Page 34: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

34

Gene evolution

• FOXP2 is a highly conserved gene (across the mammalia), expressed in the brain. Mutations in the gene in humans are associated with specific language impairment

• Across the entire mammalian phylogeny, there have only been a very few amino acid changing substitutions

• However, two amino acid changes have become fixed in the lineage leading to modern humans since the split with the chimpanzee lineage

Enard et al. (2002)

Page 35: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

35

What are the genetic differences that make people and peoples different?

Page 36: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

36

Detecting recent adaptive evolution

• Let’s look closely at the dynamics of the fixation process for adaptive mutations

• The fixation of a beneficial mutation is associated with a change in the patterns of linked neutral genetic variation

• This is known as the hitch-hiking effect (Maynard Smith and Haigh 1974)

• Looking for the signature of hitch-hiking can be a good way of detecting very recent fixation events

Page 37: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

37

Long haplotypes

• A selective sweep at the Lactase gene in Europeans

Page 38: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

38

Strong population differentiation

Lamason et al (Science 2005)

• SLC24A5

Page 39: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

39

Page 40: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

40

Classes of selected genes

Voight et al. (2005)

Page 41: 1 Population Genomics Gil McVean, Department of Statistics, Oxford.

41

Reading

• Human genetic variation– Rosenberg et al. Genetic structure of human populations. Science 2002, 298:2381-2385.– Conrad et al. A worldwide survey of haplotype variation and linkage disequilibrium in the

human genome. Nature Genet. 2006, 1251-1260.– McVean et al. Perspectives on human genetic variation from the International HapMap

Project. PLoS Genetics 2005, 1:e54.

• The origin of modern humans– Reed & Tishkoff. African human diversity, origins and migrations. Curr Opin Genet Dev. 2006

16:597-605.– Jobling et al. Human evolutionary genetics: origins, peoples, and disease. Garland Science,

2004.– Harding & McVean. A structured ancestral population for the evolution of modern humans.

Curr. Op. Genet. Dev. 2004, 14: 667-674.

• Natural selection– Lamason et al. SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and

humans. Science 2005, 310:1782-1786.– Sabeti et al. Positive natural selection in the human lineage. Science 2006, 312:1614-1620. – Tishkoff et al. Convergent adaptation of human lactase persistence in Africa and Europe. Nat

Genet. 2007 39:31-40