Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.
-
date post
20-Dec-2015 -
Category
Documents
-
view
215 -
download
0
Transcript of Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.
Human Evolution:
Searching for Selection
Andrew Shah
Algorithms in Biology
374 Spring 2008
Overview Given a DNA sequences how do we know when natural selection has occurred?
Different methods of answering this question
How does having the entire genome available change this?
Natural Selection
Introduction
Natural Selection
Introduction
Natural Selection
Introduction
Natural Selection
What sort of artifacts would this leave within the genome?
Introduction
Natural Selection
Introduction
The frequency of the long gene increases from one generation to the next.
It eventually reaches 100%, or fixation.
Natural SelectionGene Perspective
Introduction
Same process at the gene level
Let the yellow dot represent the advantageous allele
It begins at a small frequency (.125 in this case)
Natural SelectionGene Perspective
Introduction
During selection The allele has risen in frequency!
Because of linkage, the nearby alleles have also risen in frequency
Natural SelectionGene Perspective
Introduction
The allele has reached fixation!
As time goes on the nearby genes will slowly begin to reach fixation as well
Diversity has been lost
Natural SelectionGene Perspective
Introduction
Effect of Selection on the Genome Next Challenge: How did this effect differ from non-selection?
Neutral Theory (N.T.) Problem: Need to distinguish natural selection
Therefore: Need a null hypothesis
Solution: Create model that approximates neutral evolution
Introduction
Kimura, 1960s
N.T. & Genetic Drift Most variation is neutral with respect to selection
Therefore most changes in frequency are due to genetic drift
Introduction
N.T. & Genetic Drift A neutral gene has an equal probability of increasing or decreasing in frequency in the next generation
Introduction
N.T. & Mutation New alleles are introduced a constant rate (at a particular point)
To think about: How will this help us search for selection?
Introduction
N.T. & Mutation
Introduction
N.T. & Mutation
Introduction
N.T. & Mutation
Introduction
N.T. & Recombination Recombination occurs at a near-constant rate at a given position
Introduction
Testing the N. T. How would natural selection differ from these assumptions?
Introduction
“Positive Natural Selection in the Human Lineage”
P. C. Sabeti, S. F. Schaffner, B. Fry, J. Lohmueller, P. Varilly,
Shamovsky, A. Palma, T. S. Mikkelsen, D. Altshuler, E. S.
Lander
Testing for Selection
Sabeti et al.
Review of current state of genomic selection
Five statistical tests which use divergence from neutral theory to test for selection
Ideas? Functional Alteration, Decreased Diversity, High Derived Alleles, Population Differences,
Long Haplotypes
Sabeti et al.
I. Functional Alteration Get a section of genome, and compare synonymous vs. non-synonymous mutations between two species
Definition of synonymous mutation
I. Functional Alteration
Sabeti et al.
Silent/ SynonymousNon-Synonymous
I. Functional Alteration
Sabeti et al.
Long time scale, because it is an interspecies metricLimited value--only finds ongoing or recurrent selectionUse a Ka/Ks statistical test, or McDonald-Kreitman
II. Decreased Diversity
Sabeti et al.
Way of detecting a selective sweep Requires you know ancestral gene, derived genes
A derived gene is one that is a descendent of the ancestral one-it can be inferred using comparison to others species
II. Decreased Diversity
Sabeti et al.
The two small bars represent mutations. They are derived genes of the blue ancestor gene.
II. Decreased Diversity
Sabeti et al.
After the selective sweep the frequency of the derived alleles has jumped vis-a-vis the ancestral gene
II. Decreased Diversity
Sabeti et al.
A real example: derived alleles in red
II. Decreased Diversity
Sabeti et al.
Key idea: need to have ancestral genes present
The genes must not have reached fixation!
The pattern will be that of normal diversity of alleles but with skewed distribution of variation
Statistical Tests: Tajima’s D, Fu and Li’s D*
III. New Alleles(AKA High Frequency of Derived Alleles) Another technique for detecting selective sweep
Gene ‘hitch-hiking’ Limited diversity because of fixation
Key idea: low frequency of new genes, but high diversity of rare alleles
Sabeti et al.
III. New Alleles(AKA High Frequency of Derived Alleles)
Sabeti et al.
Gene has reached fixation
Low diversity in this region compared to other regions
III. New Alleles(AKA High Frequency of Derived Alleles)
Sabeti et al.
Next mutations slowly increase the diversity
Because they are all new the frequency remains low
III. New Alleles(AKA High Frequency of Derived Alleles)
Sabeti et al.
As more time progresses, any pre-selective sweep alleles die out, and diversity is replace by many derived alleles
III. New Alleles(AKA High Frequency of Derived Alleles)
Sabeti et al.
Real world example: Red dots indicate rare alleles
III. New Alleles(AKA High Frequency of Derived Alleles)
Sabeti et al.
Key Idea: The genes will have reached fixation and decreased diversity
The diversity will all be in the form of rare alleles (because they are new)
Statistical Test: Fay and Wu’s H
Comparing Methods The difference between decreased diversity and increased frequency of new alleles?
Sabeti et al.
Vs.
IV. Population Differences Requires population split
Disproportionate shift in gene frequencies
Limited utility
Sabeti et al.
IV. Population Differences
Sabeti et al.
IV. Population Differences
Sabeti et al.
Tall Tree Island
IV. Population Differences
Sabeti et al.
IV. Population Differences
Sabeti et al.
Two separated populations--specific gene will show disproportionate shift in frequency with respect to the other genes
Limited to cases where there are two populations
Statistical Test: F(st), P(excess)
V. Long Haplotypes
Based on Linkage Disequilibria (LD) Long Haploblock and high frequency
Sabeti et al.
V. Long Haplotypes
Under neutral conditions, a new allele has low frequency and high linkage disequilibrium
Sabeti et al.
V. Long Haplotypes
As time goes on and the neutral allele increases in frequency recombination erodes the L.D.
Sabeti et al.
V. Long Haplotypes
Sabeti et al.
Genome-Wide Scanning Better estimation of background rate
Helps to confirm previous studies
Suggests future areas of research
MORE POWER
Sabeti et al.
Genome-Wide Scanning SNP: Single Nucleotide Polymorphisms (excludes other types of mutations) that occur at > 1% frequency
SNPs are the basis of many genome wide analyses
Sabeti et al.
“Forces Shaping the Fastest Evolving Regions in the Human Genome”
K. S. Pollard, S. R. Salama, B. King, A. D. Kern, T. Dreszer, S.
Katzman, A. Siepel, J. S. Pedersen, G. Bejerano, R. Baertsch, K. R. Rosenbloom, J. Kent, D. Haussler
Background Exploits the very recent sequencing of the chimp and human genome
Uses the rate of allele replacement as test for selection
Assumption is that highly changing parts of the genome have been under selective pressure
Pollard et al.
Idea Take chimp and mouse genome, find common regions
Compare these regions to human genome
Pollard et al.
Method Part I
First half: Find conserved regions. Use sequence tests to look for regions of 100bp with 96% similarity
Pollard et al.
Results Part I
Results Part I
Conclusion: These areas represent genes with deep functionality
Method Part II
Pollard et al.
Search human genome for conserved regions
Method Part II
Pollard et al.
For every region that doesn’t match up, label Human Accelerated Region
Formal Description
Pollard et al.
Results Part II Found 202 Human Accelerated Regions in total
These were regions where there had been rapid evolution in the past 5 million years
But evolution doesn’t mean selection
Pollard et al.
Possible Explanations Relaxation of negative selection -- ruled out because the rate of neutral evolution is slower for 201/202 HARs
Natural selection Sudden change in mutation rate
Pollard et al.
But was it Selection?
Pollard et al.
A Digression Biased Gene Conversion: Tendency to replace misaligned nucleotides with GC
In all but two of the HARs there was no evidence of a selective sweep but significant evidence of GC favored replacement
Pollard et al.
A Digression New Paper suggests BGC hotspots change for species
Conserved areas may suddenly become a BGC hotspot, explaining the HAR’s high BGC rates
Adaptation or biased gene conversion: Extending the null hypothesis of molecular evolution, Galtier & Duret 2007
Pollard et al.
General Implications Illustrates utility of genome wide approached--by using the full genome to establish a background rate, signals stand out of noise
Weaknesses: approach did not take into account failure to meet the assumption of neutral theory (mutation rate)
Pollard et al.
“Global Landscape of Recent Inferred Darwinian Selection for Homo Sapiens”
E. Wang, G. Kodama, P. Baldi, and R. K. Moyzis
Background Ever growing catalog of SNPs for human populations
SNP data can be used to construct haplotype maps
Can screen whole genome for haplotype outlier
Wang et al.
Idea Take only homozygotes
Bin the alleles together
Calculate the L.D. for each allele
Wang et al.
Idea
Wang et al.
Formalized Description
Wang et al.
Description of the Formalized Description
Wang et al.
Expected decay of LD for a allele of a specific frequency
Description of the Formalized Description
Wang et al.
Description of the Formalized Description
Wang et al.
Selective sweep will be more resistant to decay
Description of the Formalized Description
Wang et al.
Normalize with respect to the sigmoidal curve
Advantages of Method By using the whole genome can track not only for L. D. but the exponential decay of L.D. over distance. This helps to distinguish selective sweeps from other demographic shifts such as bottlenecks
Wang et al.
Results
Wang et al.
Results
Wang et al.
“Darwin’s Fingerprint”: Using different datasets from different populations, certain areas show consistent evidence of selection
Discussion
Wang et al.
Compare regions to known gene functions
Six groups predominate
Test was well designed
Limited detection: Genes cant be at fixation
Overall Conclusions It all comes down to statistics. What are the null assumptions? What are the alternate assumptions?
Genome-wide scans improve by allowing us to exploit this elegant statistical method in new ways Improved data for null hypothesis Increased volume to potential candidates
Wang et al.
Thank You!