MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .
-
Upload
andrew-long -
Category
Documents
-
view
214 -
download
1
Transcript of MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .
![Page 1: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/1.jpg)
MAT 4830Mathematical Modeling
4.5
Phylogenetic Distances I
http://myhome.spu.edu/lauw
![Page 2: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/2.jpg)
Preview
Phylogenetic: of or relating to the evolutionary development of organisms
Estimate the amount of total mutations (observed and hidden mutations).
![Page 3: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/3.jpg)
Example from 4.1
S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1
S0 : ATGTCGCCTGATAATGCC
S1 : ATGCCGCTTGACAATGCC
S2 : ATGCCGCGTGATAATGCC
![Page 4: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/4.jpg)
Example from 4.1
S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1
S0 : ATGTCGCCTGATAATGCC
S1 : ATGCCGCTTGACAATGCC
S2 : ATGCCGCGTGATAATGCC
Observed mutations: 2
![Page 5: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/5.jpg)
Example from 4.1
S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1
S0 : ATGTCGCCTGATAATGCC
S1 : ATGCCGCTTGACAATGCC
S2 : ATGCCGCGTGATAATGCC
Actual mutations: 5
![Page 6: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/6.jpg)
Example from 4.1
S0 : Ancestral sequenceS1 : Descendant of S0S2 : Descendant of S1
S0 : ATGTCGCCTGATAATGCC
S1 : ATGCCGCTTGACAATGCC
S2 : ATGCCGCGTGATAATGCC
Actual mutations: 5, (some are hidden mutations)
![Page 7: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/7.jpg)
Distance of Two Sequences
We want to define the “distance” between two sequences.
It measures the average no. of mutations per site that occurred, including the hidden ones.
S0 : ATGTCGCCTGATAATGCC
S : ATGCCGCGTGATAATGCC
![Page 8: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/8.jpg)
Distance of Two Sequences
Let d(S0,S) be the distance between sequences S0 and S. What properties it “should” have?
1.
2.
3.S0 : ATGTCGCCTGATAATGCC
S : ATGCCGCGTGATAATGCC
![Page 9: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/9.jpg)
Jukes-Cantor Model
Assume α is small. Mutations per time step are “rare”.
0
1 / 3 / 3 / 3
/ 3 1 / 3 / 3 1 1 1 1( )
/ 3 / 3 1 / 3 4 4 4 4
/ 3 / 3 / 3 1
T
M p
![Page 10: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/10.jpg)
Jukes-Cantor Model
q(t)=conditional prob. that the base at time t is the same as the base at time 0
( )q t
1 3 4 1 1 4 1 1 41 1 1
4 4 3 4 4 3 4 4 3
1 1 4 1 3 4 1 1 41 1 1
4 4 3 4 4 3 4 4 3( )
1 1 4 1 1 4 1 3 41 1 1
4 4 3 4 4 3 4 4 3
1 1 4 1 1 4 11 1
4 4 3 4 4 3 4
t t t
t t t
t
t t t
t t
M
1 1 41
4 4 3
1 1 41
4 4 3
1 1 41
4 4 3
1 4 1 3 41 1
4 3 4 4 3
t
t
t
t t
![Page 11: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/11.jpg)
Jukes-Cantor Model
q(t)=fraction of sites with no observed mutations
( )q t
1 3 4 1 1 4 1 1 41 1 1
4 4 3 4 4 3 4 4 3
1 1 4 1 3 4 1 1 41 1 1
4 4 3 4 4 3 4 4 3( )
1 1 4 1 1 4 1 3 41 1 1
4 4 3 4 4 3 4 4 3
1 1 4 1 1 4 11 1
4 4 3 4 4 3 4
t t t
t t t
t
t t t
t t
M
1 1 41
4 4 3
1 1 41
4 4 3
1 1 41
4 4 3
1 4 1 3 41 1
4 3 4 4 3
t
t
t
t t
![Page 12: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/12.jpg)
Jukes-Cantor Model
p(t)=1-q(t)=fractions of sites with observed mutations
( )q t
( ) 1 ( )p t q t
1 3 4 1 1 4 1 1 41 1 1
4 4 3 4 4 3 4 4 3
1 1 4 1 3 4 1 1 41 1 1
4 4 3 4 4 3 4 4 3( )
1 1 4 1 1 4 1 3 41 1 1
4 4 3 4 4 3 4 4 3
1 1 4 1 1 4 11 1
4 4 3 4 4 3 4
t t t
t t t
t
t t t
t t
M
1 1 41
4 4 3
1 1 41
4 4 3
1 1 41
4 4 3
1 4 1 3 41 1
4 3 4 4 3
t
t
t
t t
![Page 13: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/13.jpg)
Jukes-Cantor Model
p(t)=1-q(t)=fractions of sites with observed mutations
( )q t
( ) 1 ( )p t q t
1 3 4 1 1 4 1 1 41 1 1
4 4 3 4 4 3 4 4 3
1 1 4 1 3 4 1 1 41 1 1
4 4 3 4 4 3 4 4 3( )
1 1 4 1 1 4 1 3 41 1 1
4 4 3 4 4 3 4 4 3
1 1 4 1 1 4 11 1
4 4 3 4 4 3 4
t t t
t t t
t
t t t
t t
M
1 1 41
4 4 3
1 1 41
4 4 3
1 1 41
4 4 3
1 4 1 3 41 1
4 3 4 4 3
t
t
t
t t
3 3 4( ) 1
4 4 3
t
p t
![Page 14: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/14.jpg)
Jukes-Cantor Model
p can be estimated from the two sequences
( )q t
( ) 1 ( )p t q t
1 3 4 1 1 4 1 1 41 1 1
4 4 3 4 4 3 4 4 3
1 1 4 1 3 4 1 1 41 1 1
4 4 3 4 4 3 4 4 3( )
1 1 4 1 1 4 1 3 41 1 1
4 4 3 4 4 3 4 4 3
1 1 4 1 1 4 11 1
4 4 3 4 4 3 4
t t t
t t t
t
t t t
t t
M
1 1 41
4 4 3
1 1 41
4 4 3
1 1 41
4 4 3
1 4 1 3 41 1
4 3 4 4 3
t
t
t
t t
3 3 4( ) 1
4 4 3
t
p t
![Page 15: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/15.jpg)
Example from 4.1
S0 : ATGTCGCCTGATAATGCC
S1 : ATGCCGCTTGACAATGCC
S2 : ATGCCGCGTGATAATGCC
Observed mutations: 2
fractions of sites with observed mutations
2 0.11
18p
![Page 16: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/16.jpg)
Jukes-Cantor Distance
Given p (and t), the J-C distance between two sequences S0 and S1 is defined as
0 1
3 4( , ) ln 1
4 3JCd S S p
0
1
: ATGTCGCCTGATAATGCC
: ATGCCGCGTGATAATGCC
S
S
![Page 17: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/17.jpg)
Jukes-Cantor Distance
Given p (and t), the J-C distance between two sequences S0 and S1 is defined as
0 1
3 4( , ) ln 1
4 3JCd S S p
0
1
: ATGTCGCCTGATAATGCC
: ATGCCGCGTGATAATGCC
S
S
![Page 18: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/18.jpg)
Jukes-Cantor Distance
rate of base sub. sub. per site per time step
t no. of time step
t total no. of sub. in t time steps sub. per site
![Page 19: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/19.jpg)
Jukes-Cantor Distance
rate of base sub. sub. per site per time step
t no. of time step
t total no. of sub. in t time steps sub. per site
3 3 41
4 4 3
4 4ln 1 ln 1
3 3 when is small
44ln 1
33
t
p
p pt
![Page 20: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/20.jpg)
Jukes-Cantor Distance
rate of base sub. sub. per site per time step
t no. of time step
t total no. of sub. in t time steps sub. per site
3 3 41
4 4 3
4 4ln 1 ln 1
3 3 when is small
44ln 1
33
t
p
p pt
3 4ln 1
4 3t p
![Page 21: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/21.jpg)
Example from 4.3
Suppose a 40-base ancestral and descendent DNA sequences are
0
1
S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT
S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC
1 0\
7 0 1 1 1 9 2 0
0 2 7 2
1 0 1 6
S S A G C T
A
G
C
T
![Page 22: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/22.jpg)
Example from 4.3
Suppose a 40-base ancestral and descendent DNA sequences are
0
1
S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT
S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC
1 0\
7 0 1 1 1 9 2 0
0 2 7 2
1 0 1 6
S S A G C T
A
G
C
T
110.275
403 4 11
ln 1 0.34264 3 40JC
p
d
0 1
3 4( , ) ln 1
4 3JCd S S p
![Page 23: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/23.jpg)
Example from 4.3
0.275 observed sub. per site.
0.3426 sub. estimated per site.
0
1
S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT
S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC
1 0\
7 0 1 1 1 9 2 0
0 2 7 2
1 0 1 6
S S A G C T
A
G
C
T
110.275
403 4 11
ln 1 0.34264 3 40JC
p
d
![Page 24: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/24.jpg)
Example from 4.3
11 observed sub.
13.7 sub. estimated.
0
1
S : ACTTGTCGGATGATCAGCGGTCCATGCACCTGACAACGGT
S : ACATGTTGCTTGACGACAGGTCCATGCGCCTGAGAACGGC
1 0\
7 0 1 1 1 9 2 0
0 2 7 2
1 0 1 6
S S A G C T
A
G
C
T
110.275
403 4 11
ln 1 0.34264 3 40JC
p
d
![Page 25: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/25.jpg)
Performance of JC distance (Homework Problem 4)
Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.
![Page 26: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/26.jpg)
Performance of JC distance (Homework Problem 4)
Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.
Count the number of base substitutions occurred.
![Page 27: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/27.jpg)
Performance of JC distance (Homework Problem 4)
Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.
Count the number of base substitutions occurred.
Compute the Jukes-Cantor distance of the initial and finial sequence.
![Page 28: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/28.jpg)
Performance of JC distance (Homework Problem 4)
Write a program to simulate of the mutations of a sequence for t time step using the Jukes-Cantor model with parameter α.
Count the number of base substitutions occurred.
Compute the Jukes-Cantor distance of the initial and finial sequence.
Compare the actual number of base substitutions and the estimation from the Jukes-Cantor distance.
![Page 29: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/29.jpg)
Performance of JC distance (Homework Problem 4)
![Page 30: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/30.jpg)
Maple: Strings Handling II
Concatenating two strings
![Page 31: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/31.jpg)
Maple: Strings Handling II
However, no “re-assignment”.
![Page 32: MAT 4830 Mathematical Modeling 4.5 Phylogenetic Distances I .](https://reader035.fdocuments.in/reader035/viewer/2022081603/56649f345503460f94c52411/html5/thumbnails/32.jpg)
Classwork
Work on HW #1, 2