1 Population Genomics Gil McVean, Department of Statistics, Oxford.
-
date post
20-Jan-2016 -
Category
Documents
-
view
214 -
download
0
Transcript of 1 Population Genomics Gil McVean, Department of Statistics, Oxford.
1
Population Genomics
Gil McVean, Department of Statistics, Oxford
2
Questions about genetic variation
• How different are our genomes?
• How is the variation distributed within and between genomes?
• What does variation tell us about human evolution?
3
How different are our genomes?
4
Serological techniques for detecting variation
Human
Rabbit
A
A B AB O
5
Blood group systems in humans
• 28 known systems– 39 genes, 643 alleles
System Genes Alleles
ABO ABO 102
Colton C4A, C4B 7+
Chido-rodgers AQP1 7
Colton DAF 10
Diego SLC4A1 78
Dombrock DO 9
Duffy FY 9
Gerbich GYPC 9
GIL AQP3 2
H/h FUT1, FUT2 27/22
I GCNT2 7
Indian CD44 2
Kell KEL, XK 33/30
Kidd SLC14A1 8
Knops CR1 24+
Landsteiner-Wiener
ICAM4 3
Lewis FUT3, FUT6 14/20
Lutheran LU 16
MNS GYPA,GYPB,GYPE
43
OK BSG 2
P-related A4GALT, B3GALT3
14/5
RAPH-MER2 CD151 3
Rh RHCE, RHD, RHAG
129
Scianna ERMAP 4
Xg XG, CD99 -
YT ACHE 4
http://www.bioc.aecom.yu.edu/bgmut/summary.htm
6
Protein electroporesis
• Changes in mass/charge ratio resulting from amino acid substitutions in proteins can be detected
• In humans, about 30% of all loci show polymorphism with a 6% chance of a pair of randomly drawn alleles at a locus being different
+++
--- - --
++--- - --- - +
Starch or agar gel
Direction of travel
Lewontin and Hubby (1966)
Harris(1966)
7
The rise of DNA sequence analysis
• RFLPs– Cann et al 1987
• Sequencing of small regions– Vigilant et al 1991
• Whole genome sequencing– Ingman et al 2000
8
The human genomes…
• The draft human genome sequence was published in 2001– This is a mosaic from several individuals
• Since then, several more genomes have been sequenced, at least partially– Shotgun sequencing variation discovery
• Other methods have been developed to look for gross chromosomal differences
Nimblegen array CGH
9
The International HapMap Project
• Launched in 2002 with the goal of characterising single nucleotide variation between 540 human genomes from individuals of European, Nigerian, Chinese and Japanese ancestry
• Not a sequencing project, rather it types known polymorphisms
• Has currently assembled information on over 6 million SNPs (single nucleotide polymorphisms)
10
The 1000 Genomes Project
11
How do we differ? – Let me count the ways
• Single nucleotide polymorphisms– 1 every few hundred bp
• Short indels (=insertion/deletion)– 1 every few kb
• Microsatellite (STR) repeat number– 1 every few kb
• Minisatellites– 1 every few kb
• Repeated genes– rRNA, histones
• Large inversions, deletions– Y chromosome, Copy Number Variants (CNVs)
TGCATTGCGTAGGCTGCATTCCGTAGGC
TGCATT---TAGGCTGCATTCCGTAGGC
TGCTCATCATCATCAGCTGCTCATCA------GC
≤100bp
1-5kb
12
Y chromosome variation
• Non-pathological rearrangements of the AZFc region on the Y chromosome
Tyler-Smith and McVean (2003)
13
Mutation is the ultimate source of variation
• New mutations occur in the germ-line
• Point mutations at about 2x10-8 per nucleotide per generation– You pass on about 60 new mutations to your children, of which perhaps 1
changes the protein sequence encoded by a gene
• Microsatellite mutations can occur much faster– Up to 10-4 per generation– Some, e.g. in Huntington’s disease, have important consequences
• Minisatellites can mutate at rates of up to 10-1 per generation– The uniqueness of these patterns gives rise to DNA fingerprinting
• Most of the differences between genomes are the result of inheriting mutations from our ancestors
14
Our genomesInherited mutations
Our genealogical tree
Mutations in our ancestors
15
Different, but not that different
• Humans are one of the least diverse organisms (excepting cheetahs)
Species Diversity (percent)
Humans 0.08 - 0.1
Chimpanzees 0.12 - 0.17
Drosophila simulans 2
E. coli 5
HIV1 30
Photos from UN photo gallery www.un.org/av/photo
16
An aside on the genetics of race
• It is sometimes claimed that there is a ‘genetic basis to race’
• What is true is that groups of individuals from different parts of the world tend to have similar genomes because they share recent ancestry
• But there are very few ‘fixed’ genetic differences between populations (I can think of one example – the FY gene)
• The differences between populations are in terms of the combinations of variants,
Rosenberg et al (2002)
17
How is genetic variation distributed within and between genomes?
18
Diversity is not evenly distributed across the genome I
Genome Average pairwise differences / kb
Relative copy number ()
Autosomes 0.5 – 0.85 1
X chromosome 0.47 3/4
Y chromosome 0.15 1/4
mtDNA 2.8 1/4
TISMWG (2001) , Jobling et al (2004)
• Autosomes, sex chromosomes and mtDNA have systematically different levels of diversity
• This reflects differences in the number of chromosomes and the mutation rate
19
Diversity is not evenly distributed across the genome II
TISMWG (2001)
Chromosome 6
HLA
• There are fluctuations in the level of variation across the genome
20
Diversity is not evenly distributed across genes I
• Purifying selection eliminates deleterious mutations and reduces diversity in regions of strong functional constraint
0123456789
Intergenic Intronic Exonic UTR
SN
Ps
pe
r 1
0k
b
Zhao et al (2003)
21
Diversity is not evenly distributed across genes II
• Adaptive evolution ‘wipes out’ diversity nearby due to the hitch-hiking effects of a selective sweep
– e.g. Duffy-null locus in sub-Saharn africa, protects against P. vivax
Pop1
Pop2
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0C 0 0 0 T 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 T 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 C 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 C 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
C 0 0 0 T 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0C 0 0 0 T 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 T 0 0 0 0 0
0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 T 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 G 0 0 -1 -1 C C 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 T 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 00 0 G 0 0 0 0 0 0 0 0 0 0 0 C 0 0 0 0 0 0 0 0
0 G 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 00 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 G 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 G 0
C 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 G 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 G 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 0
0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 0
0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 00 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 A
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 A0 0 0 T 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 A
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 T 0 0 0 0 0 AC 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0
C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0
C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0
C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0
C 0 0 0 T 0 -1 0 0 0 T 0 0 0 0 0 T 0 0 0 0 0 0
C 0 0 0 T T -1 G 0 0 T T 0 0 0 C 0 0 0 0 0 0 0C 0 0 0 T T -1 G 0 0 T T 0 0 0 C 0 0 0 0 0 0 0
C 0 0 0 T T -1 G 0 0 T T 0 0 0 C 0 0 0 0 0 0 00 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 C 0 0 0 0 G 0 0
0 0 0 0 0 0 -1 0 0 0 0 0 0 0 0 C 0 0 0 0 G 0 0
European
African
FY*O mutation
Ancestral alleleDerived alleleMissing dataHamblin and Di Rienzo (2000)
22
Diversity is not evenly distributed across genes III
• Some genes are under balancing or diversifying selection, where diversity is actively selected for
– MHC complex: heterozygote advantage and frequency-dependent selection driven by recognition of pathogens
Horton et al (1998)
23
Diversity is not evenly distributed across populations I
• African populations are more diverse than non-African populations– More polymorphisms– Polymorphisms at less skewed frequencies
• Differences reflect bottlenecks associated with the colonisation from Africa c.65 KYA
Population Segregating sites per kb (n = 30)
Diversity per kb
Tajima D statistic
Hausa (African)
4.8 0.11 -0.33
Italian 3.2 0.10 1.18
Chinese 3.0 0.07 1.19
Frisse et al (2001)
24
mtDNA phylogeography
Ingman et al (2000)
African
Non-African
25
The colonisation process as inferred from mtDNA variation
26
What does genetic variation tell us about human evolution?
• Modern humans appear in the fossil record about 200K years ago
• The mitochondrial Eve dates back to about 150K years ago
• The Y-chromosome Adam dates back to about 70K years ago
• For most of our genome, however, the common ancestor is about 500K – 1M years ago
– This predates the origin of Homo sapiens considerably
27
Human – chimp split
Autosomal MRCA
Origin of H. sapiens
28
Did early humans interbreed with Neanderthals?
Ovchinnikov et al (2000)
Neanderthals
mtDNA sequences say no…
29
But…
• There is some evidence for this in the presence of unusual haplotypes found in Europe composed of SNPs not found in non-European populations
Plagnol and Wall (2006)
30
Deeper trees in the human genome
• There is growing evidence that some regions of our genome have truly ancient common ancestors
• Dystrophin has an ancient haplotype found primarily outside Africa suggesting a colonisation of >160KYA
• There is an inversion found primarily in Europeans that is roughly 3MY old
Stefansson et al (2005)
Haplotype 1
Haplotype 2
31
What are the genetic differences that make us human?
32
Chromosomal changes
• Human chromosome 2 is a fusion of two chromosomes in great apes
• There are several inversion differences between the chromosomes
Feuk et al (2005)
33
Gene loss
• Loss of enzymes that make sialic acid
– Sugar on cell surface that mediates a variety of recognition events involving pathogenic microbes and toxins
• Myosin heavy chain– Associated with
gracilization
Wang et al (2006)
34
Gene evolution
• FOXP2 is a highly conserved gene (across the mammalia), expressed in the brain. Mutations in the gene in humans are associated with specific language impairment
• Across the entire mammalian phylogeny, there have only been a very few amino acid changing substitutions
• However, two amino acid changes have become fixed in the lineage leading to modern humans since the split with the chimpanzee lineage
Enard et al. (2002)
35
What are the genetic differences that make people and peoples different?
36
Detecting recent adaptive evolution
• Let’s look closely at the dynamics of the fixation process for adaptive mutations
• The fixation of a beneficial mutation is associated with a change in the patterns of linked neutral genetic variation
• This is known as the hitch-hiking effect (Maynard Smith and Haigh 1974)
• Looking for the signature of hitch-hiking can be a good way of detecting very recent fixation events
37
Long haplotypes
• A selective sweep at the Lactase gene in Europeans
38
Strong population differentiation
Lamason et al (Science 2005)
• SLC24A5
39
40
Classes of selected genes
Voight et al. (2005)
41
Reading
• Human genetic variation– Rosenberg et al. Genetic structure of human populations. Science 2002, 298:2381-2385.– Conrad et al. A worldwide survey of haplotype variation and linkage disequilibrium in the
human genome. Nature Genet. 2006, 1251-1260.– McVean et al. Perspectives on human genetic variation from the International HapMap
Project. PLoS Genetics 2005, 1:e54.
• The origin of modern humans– Reed & Tishkoff. African human diversity, origins and migrations. Curr Opin Genet Dev. 2006
16:597-605.– Jobling et al. Human evolutionary genetics: origins, peoples, and disease. Garland Science,
2004.– Harding & McVean. A structured ancestral population for the evolution of modern humans.
Curr. Op. Genet. Dev. 2004, 14: 667-674.
• Natural selection– Lamason et al. SLC24A5, a putative cation exchanger, affects pigmentation in zebrafish and
humans. Science 2005, 310:1782-1786.– Sabeti et al. Positive natural selection in the human lineage. Science 2006, 312:1614-1620. – Tishkoff et al. Convergent adaptation of human lactase persistence in Africa and Europe. Nat
Genet. 2007 39:31-40