Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

78
Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008
  • date post

    20-Dec-2015
  • Category

    Documents

  • view

    215
  • download

    0

Transcript of Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Page 1: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Human Evolution:

Searching for Selection

Andrew Shah

Algorithms in Biology

374 Spring 2008

Page 2: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Overview Given a DNA sequences how do we know when natural selection has occurred?

Different methods of answering this question

How does having the entire genome available change this?

Page 3: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Natural Selection

Introduction

Page 4: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Natural Selection

Introduction

Page 5: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Natural Selection

Introduction

Page 6: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Natural Selection

What sort of artifacts would this leave within the genome?

Introduction

Page 7: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Natural Selection

Introduction

The frequency of the long gene increases from one generation to the next.

It eventually reaches 100%, or fixation.

Page 8: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Natural SelectionGene Perspective

Introduction

Same process at the gene level

Let the yellow dot represent the advantageous allele

It begins at a small frequency (.125 in this case)

Page 9: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Natural SelectionGene Perspective

Introduction

During selection The allele has risen in frequency!

Because of linkage, the nearby alleles have also risen in frequency

Page 10: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Natural SelectionGene Perspective

Introduction

The allele has reached fixation!

As time goes on the nearby genes will slowly begin to reach fixation as well

Diversity has been lost

Page 11: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Natural SelectionGene Perspective

Introduction

Effect of Selection on the Genome Next Challenge: How did this effect differ from non-selection?

Page 12: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Neutral Theory (N.T.) Problem: Need to distinguish natural selection

Therefore: Need a null hypothesis

Solution: Create model that approximates neutral evolution

Introduction

Kimura, 1960s

Page 13: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

N.T. & Genetic Drift Most variation is neutral with respect to selection

Therefore most changes in frequency are due to genetic drift

Introduction

Page 14: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

N.T. & Genetic Drift A neutral gene has an equal probability of increasing or decreasing in frequency in the next generation

Introduction

Page 15: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

N.T. & Mutation New alleles are introduced a constant rate (at a particular point)

To think about: How will this help us search for selection?

Introduction

Page 16: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

N.T. & Mutation

Introduction

Page 17: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

N.T. & Mutation

Introduction

Page 18: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

N.T. & Mutation

Introduction

Page 19: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

N.T. & Recombination Recombination occurs at a near-constant rate at a given position

Introduction

Page 20: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Testing the N. T. How would natural selection differ from these assumptions?

Introduction

Page 21: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

“Positive Natural Selection in the Human Lineage”

P. C. Sabeti, S. F. Schaffner, B. Fry, J. Lohmueller, P. Varilly,

Shamovsky, A. Palma, T. S. Mikkelsen, D. Altshuler, E. S.

Lander

Page 22: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Testing for Selection

Sabeti et al.

Review of current state of genomic selection

Five statistical tests which use divergence from neutral theory to test for selection

Ideas? Functional Alteration, Decreased Diversity, High Derived Alleles, Population Differences,

Long Haplotypes

Page 23: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Sabeti et al.

I. Functional Alteration Get a section of genome, and compare synonymous vs. non-synonymous mutations between two species

Definition of synonymous mutation

Page 24: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

I. Functional Alteration

Sabeti et al.

Silent/ SynonymousNon-Synonymous

Page 25: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

I. Functional Alteration

Sabeti et al.

Long time scale, because it is an interspecies metricLimited value--only finds ongoing or recurrent selectionUse a Ka/Ks statistical test, or McDonald-Kreitman

Page 26: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

II. Decreased Diversity

Sabeti et al.

Way of detecting a selective sweep Requires you know ancestral gene, derived genes

A derived gene is one that is a descendent of the ancestral one-it can be inferred using comparison to others species

Page 27: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

II. Decreased Diversity

Sabeti et al.

The two small bars represent mutations. They are derived genes of the blue ancestor gene.

Page 28: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

II. Decreased Diversity

Sabeti et al.

After the selective sweep the frequency of the derived alleles has jumped vis-a-vis the ancestral gene

Page 29: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

II. Decreased Diversity

Sabeti et al.

A real example: derived alleles in red

Page 30: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

II. Decreased Diversity

Sabeti et al.

Key idea: need to have ancestral genes present

The genes must not have reached fixation!

The pattern will be that of normal diversity of alleles but with skewed distribution of variation

Statistical Tests: Tajima’s D, Fu and Li’s D*

Page 31: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

III. New Alleles(AKA High Frequency of Derived Alleles) Another technique for detecting selective sweep

Gene ‘hitch-hiking’ Limited diversity because of fixation

Key idea: low frequency of new genes, but high diversity of rare alleles

Sabeti et al.

Page 32: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

III. New Alleles(AKA High Frequency of Derived Alleles)

Sabeti et al.

Gene has reached fixation

Low diversity in this region compared to other regions

Page 33: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

III. New Alleles(AKA High Frequency of Derived Alleles)

Sabeti et al.

Next mutations slowly increase the diversity

Because they are all new the frequency remains low

Page 34: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

III. New Alleles(AKA High Frequency of Derived Alleles)

Sabeti et al.

As more time progresses, any pre-selective sweep alleles die out, and diversity is replace by many derived alleles

Page 35: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

III. New Alleles(AKA High Frequency of Derived Alleles)

Sabeti et al.

Real world example: Red dots indicate rare alleles

Page 36: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

III. New Alleles(AKA High Frequency of Derived Alleles)

Sabeti et al.

Key Idea: The genes will have reached fixation and decreased diversity

The diversity will all be in the form of rare alleles (because they are new)

Statistical Test: Fay and Wu’s H

Page 37: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Comparing Methods The difference between decreased diversity and increased frequency of new alleles?

Sabeti et al.

Vs.

Page 38: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

IV. Population Differences Requires population split

Disproportionate shift in gene frequencies

Limited utility

Sabeti et al.

Page 39: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

IV. Population Differences

Sabeti et al.

Page 40: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

IV. Population Differences

Sabeti et al.

Tall Tree Island

Page 41: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

IV. Population Differences

Sabeti et al.

Page 42: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

IV. Population Differences

Sabeti et al.

Two separated populations--specific gene will show disproportionate shift in frequency with respect to the other genes

Limited to cases where there are two populations

Statistical Test: F(st), P(excess)

Page 43: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

V. Long Haplotypes

Based on Linkage Disequilibria (LD) Long Haploblock and high frequency

Sabeti et al.

Page 44: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

V. Long Haplotypes

Under neutral conditions, a new allele has low frequency and high linkage disequilibrium

Sabeti et al.

Page 45: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

V. Long Haplotypes

As time goes on and the neutral allele increases in frequency recombination erodes the L.D.

Sabeti et al.

Page 46: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

V. Long Haplotypes

Sabeti et al.

Page 47: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Genome-Wide Scanning Better estimation of background rate

Helps to confirm previous studies

Suggests future areas of research

MORE POWER

Sabeti et al.

Page 48: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Genome-Wide Scanning SNP: Single Nucleotide Polymorphisms (excludes other types of mutations) that occur at > 1% frequency

SNPs are the basis of many genome wide analyses

Sabeti et al.

Page 49: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

“Forces Shaping the Fastest Evolving Regions in the Human Genome”

K. S. Pollard, S. R. Salama, B. King, A. D. Kern, T. Dreszer, S.

Katzman, A. Siepel, J. S. Pedersen, G. Bejerano, R. Baertsch, K. R. Rosenbloom, J. Kent, D. Haussler

Page 50: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Background Exploits the very recent sequencing of the chimp and human genome

Uses the rate of allele replacement as test for selection

Assumption is that highly changing parts of the genome have been under selective pressure

Pollard et al.

Page 51: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Idea Take chimp and mouse genome, find common regions

Compare these regions to human genome

Pollard et al.

Page 52: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Method Part I

First half: Find conserved regions. Use sequence tests to look for regions of 100bp with 96% similarity

Pollard et al.

Page 53: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Results Part I

Page 54: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Results Part I

Conclusion: These areas represent genes with deep functionality

Page 55: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Method Part II

Pollard et al.

Search human genome for conserved regions

Page 56: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Method Part II

Pollard et al.

For every region that doesn’t match up, label Human Accelerated Region

Page 57: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Formal Description

Pollard et al.

Page 58: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Results Part II Found 202 Human Accelerated Regions in total

These were regions where there had been rapid evolution in the past 5 million years

But evolution doesn’t mean selection

Pollard et al.

Page 59: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Possible Explanations Relaxation of negative selection -- ruled out because the rate of neutral evolution is slower for 201/202 HARs

Natural selection Sudden change in mutation rate

Pollard et al.

Page 60: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

But was it Selection?

Pollard et al.

Page 61: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

A Digression Biased Gene Conversion: Tendency to replace misaligned nucleotides with GC

In all but two of the HARs there was no evidence of a selective sweep but significant evidence of GC favored replacement

Pollard et al.

Page 62: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

A Digression New Paper suggests BGC hotspots change for species

Conserved areas may suddenly become a BGC hotspot, explaining the HAR’s high BGC rates

Adaptation or biased gene conversion: Extending the null hypothesis of molecular evolution, Galtier & Duret 2007

Pollard et al.

Page 63: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

General Implications Illustrates utility of genome wide approached--by using the full genome to establish a background rate, signals stand out of noise

Weaknesses: approach did not take into account failure to meet the assumption of neutral theory (mutation rate)

Pollard et al.

Page 64: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

“Global Landscape of Recent Inferred Darwinian Selection for Homo Sapiens”

E. Wang, G. Kodama, P. Baldi, and R. K. Moyzis

Page 65: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Background Ever growing catalog of SNPs for human populations

SNP data can be used to construct haplotype maps

Can screen whole genome for haplotype outlier

Wang et al.

Page 66: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Idea Take only homozygotes

Bin the alleles together

Calculate the L.D. for each allele

Wang et al.

Page 67: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Idea

Wang et al.

Page 68: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Formalized Description

Wang et al.

Page 69: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Description of the Formalized Description

Wang et al.

Expected decay of LD for a allele of a specific frequency

Page 70: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Description of the Formalized Description

Wang et al.

Page 71: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Description of the Formalized Description

Wang et al.

Selective sweep will be more resistant to decay

Page 72: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Description of the Formalized Description

Wang et al.

Normalize with respect to the sigmoidal curve

Page 73: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Advantages of Method By using the whole genome can track not only for L. D. but the exponential decay of L.D. over distance. This helps to distinguish selective sweeps from other demographic shifts such as bottlenecks

Wang et al.

Page 74: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Results

Wang et al.

Page 75: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Results

Wang et al.

“Darwin’s Fingerprint”: Using different datasets from different populations, certain areas show consistent evidence of selection

Page 76: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Discussion

Wang et al.

Compare regions to known gene functions

Six groups predominate

Test was well designed

Limited detection: Genes cant be at fixation

Page 77: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Overall Conclusions It all comes down to statistics. What are the null assumptions? What are the alternate assumptions?

Genome-wide scans improve by allowing us to exploit this elegant statistical method in new ways Improved data for null hypothesis Increased volume to potential candidates

Wang et al.

Page 78: Human Evolution: Searching for Selection Andrew Shah Algorithms in Biology 374 Spring 2008.

Thank You!