The Biological ESTEEM Project: Linear Algebra, Population Genetics, and Microsoft Excel
description
Transcript of The Biological ESTEEM Project: Linear Algebra, Population Genetics, and Microsoft Excel
The Biological ESTEEM Project:Linear Algebra, Population Genetics, and Microsoft Excel
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
0 10 20 30 40 50 60 70 80 90 100
Generation
p’ = p (pWAA + qWAS) / W
Anton E. Weisstein, Truman State University
BIO 2010:Transforming Undergraduate Education
for Future Research Biologists
National Research Council (2003)
Recommendation #2:“Concepts, examples, and techniques from mathematics…should be included in biology courses. …Faculty in biology, mathematics, and physical sciences must work collaboratively to find ways of integrating mathematics…into life science courses…”
Recommendation #1:“Those selecting the new approaches should consider the importance of mathematics...”
BIO 2010:Transforming Undergraduate Education
for Future Research Biologists
National Research Council (2003)
Specific strategies:
• A strong interdisciplinary curriculum that includes physical science, information technology, and math.
• Meaningful laboratory experiences.
Biological Topics
Spread of infectious diseases
Tree growth
Enzyme kinetics
Population genetics
Mathematical Topics
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
0 10 20 30 40 50 60 70 80 90 100
Generation
p
Random walks
Optimi-zation
Linear algebra
Graph theory
Unpacking “ESTEEM”
• Excel: ubiquitous, easy, flexible, non-intimidating
• Exploratory: apply to real-world data; extend & improve
• Experiential: students engage directly with the math
Three Boxes
Black box:Hide the model
? y = axb
Glass box:Study the model
y = axb
No box:Build the model!
How do students interact with the mathematical model underlying the biology?
Copyleft
• download• use• modify• share
Users may freely the software, w/proper attribution
More info available at Free Software Foundation website
3. Survival of the Slightly Better: Exploring an Evolutionary Paradox with Linear Algebra
1. Intro to Population Genetics: Hardy-Weinberg Equilibrium and the Binomial Theorem
2. Evolutionary Analysis:Microevolution, Statistics, and Stability Analysis
€
ˆ p =WSS −WAS
WAA − 2WAS + WSS
Synthesizing and Applying Math Concepts Using Biological Cases
DefinitionsAllele:One variant of a specific gene.
Genotype:The set of alleles carried by an individual.
Phenotype:The detectable manifestations of a specific genotype.
Example: ABO blood type
IA
IB i
IAIA Type A
IBIB Type B
ii Type O
IAi Type A
IAIB Type AB
IBi Type B
Life Cycle Gametes(eggs & sperm)
Zygotes(fertilized eggs)
Juveniles(reproductively
immature)
Adults(reproductively
mature)
Life Cycle Gametes(eggs & sperm)
Zygotes(fertilized eggs)
Juveniles(reproductively
immature)
Adults(reproductively
mature)
Life Cycle Gametes(eggs & sperm)
Zygotes(fertilized eggs)
Juveniles(reproductively
immature)
Adults(reproductively
mature)
Life Cycle Gametes(eggs & sperm)
Zygotes(fertilized eggs)
Juveniles(reproductively
immature)
Adults(reproductively
mature)
Recursion EquationsLet
x = # AA adults;y = # Aa adults;z = # aa adults.
Definep = # A gametes = x + y/2 ;q = # a gametes = y/2 + z .
Determine expected # adults of each genotype in next generation.
(For now, feel free to make any
simplifying assumptions.)
Hardy-Weinberg Equilibrium
Genotypes reach ratios
p2 : 2pq : q2
in one generation, then stay there forever!
Assumptions?
• Gametes combine at random• All individuals have equal chance of survival• Each gen. a perfectly representative sample of the previous
3. Survival of the Slightly Better: Exploring an Evolutionary Paradox with Linear Algebra
1. Intro to Population Genetics: Hardy-Weinberg Equilibrium and the Binomial Theorem
2. Evolutionary Analysis:Microevolution, Statistics, and Stability Analysis
€
ˆ p =WSS −WAS
WAA − 2WAS + WSS
Synthesizing and Applying Math Concepts Using Biological Cases
The Case of the Sickled Cell
• The S allele for sickle-cell anemia has a frequency of ~11% in some African populations.
• Why is it so common?
• If it provides a selective advantage, why isn’t its frequency 100%?
DefinitionsReproductive fitness:The average number of offspring produced
by an organism in a specific environment.
Examples:• Antibiotic resistance• Camouflage• Resistance to infectious diseases
Natural selection:An evolutionary mechanism
that tends to increase the freq. of traits that increase an organism’s fitness.
Source: Jeffrey Jeffords, DiveGallery.com
Selection and Sickle-CellAlleles:
A: “normal” hemoglobinS: sickle-cell hemoglobin
Genotype Fitness
AA WAA = 0.9
AS WAS = 1.0
SS WSS = 0.2
Natural selection:
Sickle-cell anemia:~20% survive to reproductive age
Malaria susceptibility: ~90% survive to reproductive age
Recursion Equationsp = # A gametes;q = # S gametes.
Life stage SS(W = 0.2)
AS(W = 1.0)
AA(W = 0.9)
Juvenile q22pqp2
Adult q2WSS2pqWASp2WAA
Zygote q22pqp2
W W W
p’ = p (pWAA + qWAS) / W
W = p2WAA + 2pqWAS + q2WSSNormalization:
Selection and Sickle-Cell
Genotype Fitness
AA WAA = 0.9
AS WAS = 1.0
SS WSS = 0.2
Biological Question:
• How will this population evolve over time?
p’ = p (pWAA + qWAS) / W
Mathematical Question:
What are the equilibria for this recursion equation?
Solving for EquilibriaSet p’ = p and solve:
€
p =p (pWAA + qWAS )
W
€
p = 0
€
pWAA + qWAS = W = p2WAA + 2pqWAS + q2WSSor
Substitute q = 1 – p and factor:
€
0 = (1− p)[WSS −WAS − p(WAA − 2WAS + WSS )]
€
ˆ p =1
€
ˆ p =WSS −WAS
WAA − 2WAS + WSSor
€
ˆ p = 0 or
€
ˆ p =0.2 −1.0
0.9 − 2 ⋅1.0 + 0.2=
−0.8
−0.9= 0.889Nontrivial solution:
Stability Analysis:NatSelDiffEqns (Tim Comar, Benedictine College)
Is q = 0.11 stable or unstable?
The Case of the Protective Protein
• HIV docks with the CCR5 surface protein present on some cells of immune system
• CCR5 32 allele partially protects against HIV infection
Peterson 1999. JYI 2: ?
The Case of the Protective Protein
• Based on genetic evidence, 32 arose ~700 years ago.
• Present in ~10% of Caucasians; largely absent in other groups. Why?
Hypothesis: May also have protected vs. plague and/or smallpox.
Biological Question:How much selective advantage must 32 have given to become so common in only 700 years?
Mathematical Question:For what fitness values does 700 years lie within the 95% CI of 32’s age?
Definitions
Examples:
• Absence of blood type B in Native Americans
• Northern elephant seal: virtually no genetic variation 100 years after near-extinction
Genetic drift:An evolutionary mechanism by which
allele frequencies change due to chance alone, independent of those alleles’ effects on fitness.
Modeling Genetic Drift
Let N = population size (constant).
Assume this pop. produces ∞ gametes:f(A) = p, f(B) =q .
But only 2N of those gametes (chosen at random) combine to form the zygotes that
develop into the next generation!
p’ = B(2N, p)12N
≈ N(p, )pq
2N
Genetic Drift as a Random Walkp’ = B(2N, p)1
2N≈ N(p, )pq
2N
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
0 10 20 30 40 50 60 70 80 90 100Generation
p
N = 2000
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
0 10 20 30 40 50 60 70 80 90 100
Generation
p
N = 200
0.000
0.100
0.200
0.300
0.400
0.500
0.600
0.700
0.800
0.900
1.000
0 10 20 30 40 50 60 70 80 90 100
Generation
p
N = 20
• Largest fluctuations in small pops.
• p = 0 and p = 1 are absorbing states
Modeling Microevolution:Deme
3. Survival of the Slightly Better: Exploring an Evolutionary Paradox with Linear Algebra
1. Intro to Population Genetics: Hardy-Weinberg Equilibrium and the Binomial Theorem
2. Evolutionary Analysis:Microevolution, Statistics, and Stability Analysis
€
ˆ p =WSS −WAS
WAA − 2WAS + WSS
Synthesizing and Applying Math Concepts Using Biological Cases
Sickle Cell Strikes Back!• In addition to the A and S
alleles, there is also a C allele for hemoglobin!
• C confers even stronger malaria resistance than AS but with no anemia!
• But C is found only in a few isolated populations. Why might this happen?
Extend previous analysis to 3 alleles: some surprising results!
Selection and Sickle-CellHemoglobin alleles: A, S, C
Genotype Fitness
AA 0.9
AS 1.0
AC 0.9
SS 0.2
SC 0.7
CC 1.3
Sickle-cell anemia
Malaria susceptibility
Malaria susceptibility
Mild anemia
Strong malaria resistance
C is beneficial only when common!
Selection and Sickle-Cell
p’ = p (pWAA + qWAS + rWAC) / W
q’ = q (pWAS + qWSS + rWSC) / W
r’ = r (pWAC + qWSC + rWCC) / W
Recursion Equations:
Equilibria:p = DA / D, q = DS / D, r = DC / D
where DA = (WAS – WSS)(WAC – WCC) – (WAS – WSC)(WAC – WSC)
DS = (WAS – WAA)(WSC – WCC) – (WAS – WAC)(WSC – WAC)
DC = (WAC – WAA)(WSC – WSS) – (WAC – WAS)(WSC – WAS)
D = DA + DS + DC
Plotting the Adaptive Landscape2 alleles:Landscape W(p) is a curve in R2
3 alleles:Landscape W(p, q, r)
is a sheet in R3
Constraint:p + q + r = 1
Stability Analysis1. Re-express W(p, q, r) as W(x, y)
2. Calculate Hessian matrix:
€
∂2W
∂x 2
∂ 2W
∂x∂y
∂ 2W
∂y∂x
∂ 2W
∂y 2
⎡
⎣
⎢ ⎢ ⎢ ⎢
⎤
⎦
⎥ ⎥ ⎥ ⎥
€
=T U
U V
⎡
⎣ ⎢
⎤
⎦ ⎥
where
€
T = 2(WAA − 2WAS + 2WSS ),
U = 2 33 (WAA −WSS + 2WAC + 2WSC ),
V = 23 (WAA + WSS + 4WCC + 2WAS − 4WAC − 4WSC ).
3. Take the determinant and apply the 2nd derivative test:
TV > U2, T > 0, T+V > 0 Local max
TV > U2, T < 0, T+V < 0 Local min
TV < U2 Saddle point
TV = U2Higher-order tests needed
Survival of the Slightly Better:DeFinetti
Global maximum:only C allele present
Local maximum:C allele eliminatedSaddle point
Cases & Mathematics:Explicit Connections
• Binomial & Normal Distributions • Combinatorics
• Equilibria & Stability Analysis • Normalization
• Recursion & Difference Eqns. • Stochasticity
• Geometry of Curves & Solids • Matrix & Linear Algebra
• Partial Derivatives