Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized...

11
Coalescence with Mutations • Towards incorporating greater realism • Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites • A realistic model would also incorporate – Insertions, deletions, inversions • Sequences are large, but not infinite – Mutations can occur at the same point but on different lineages 07/19/22 Comp 790– Coalescence with Mutations 1

Transcript of Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized...

Page 1: Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.

Coalescence with Mutations

• Towards incorporating greater realism• Last time we discussed 2 idealized models

– Infinite Alleles, Infinite Sites

• A realistic model would also incorporate– Insertions, deletions, inversions

• Sequences are large, but not infinite– Mutations can occur at the same point but on

different lineages

04/18/23 Comp 790– Coalescence with Mutations 1

Page 2: Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.

Finite Sites Model• Jukes-Cantor model- all positions are equally likely to mutate,

mutations to any of the 3 other nucleotides are equally probable• Kimura model- accommodates that

transitions (A<->T and C<->G)occur more frequently than transversions

• Positions evolve independently

04/18/23 Comp 790– Coalescence with Mutations 2

ACCTGCAT

ACGTGCAT

ACGTGCTT

ACGTGCTT

TCCTGCAT

TCCTGCATTCCTGCATACGTGCAAACGTGCTA

ACGTGCTTACGTGCTAACGTGCAATCCTGCAT

Page 3: Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.

Wright-Fisher with Mutations

• Each gene passed on to the nextgeneration is subject to mutationwith probability u (probability 1 - u it is copied without modification)

• Can accommodate any one of InfiniteAlleles, Infinite Sites, or Finite Sites

• Overall structure is the same• Working backwards from the present,

the probability that a lineage experiencesthe first mutation j generations in the past is:

04/18/23 Comp 790– Coalescence with Mutations 3

P(TM =j)=u(1−u) j−1Population of 8 with 4 alleles;one group of 5, and 3 of 1

Page 4: Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.

Return of the Basic Coalescent

• Formula is similar in form to the discrete coalescent• TM denotes the number of generations until the first

mutation event. It is a geometric variable with parameter u.

• If time is measured in units of 2N generations then:

• Where θ = 4Nu, the population mutation rate, can be interpreted as the expected number of mutations separating two samples

04/18/23 Comp 790– Coalescence with Mutations 4

P(TM ≤j)=1−(1−u) j ≈1−e−θt2 =P(TM

c ≤t)

Page 5: Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.

Probabilities of Events

• If we consider n disjoint lineages then the time to the first mutation along any line the distribution is exponential with parameter

• Wait for both mutation and coalescence events, then the parameter is the sum of the two parameters (consequence of min(U,V) ~ Exp(a+b))

• Whether the 1st event is a coalescent of mutation event is determined by a biased Bernoulli trials, with probabilities

04/18/23 Comp 790– Coalescence with Mutations 5

n2

⎝ ⎜

⎠ ⎟+

nθ2

=n(n−1+θ)2

nθ2

n(n−1)2

n(n−1)2

+nθ2

= n−1n−1+θ

1− n−1n−1+θ

= θn−1+θ

for coalescence, and, , of mutation

Page 6: Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.

Simulating Sequence Evolution

• Simulating a set of genes with mutations

04/18/23 Comp 790– Coalescence with Mutations 6

1. Put k=n, where n is the sample size2. Choose an exponential variable with parameter k(k-

1+θ)/23. With probability (k-1)/(k-1+θ) the event is a

coalescent event and probability θ/(k-1+θ) it is a mutation event

4. If a coalescent event, choose a pair randomly to coalesce, set k k-1

5. If a mutation event, choose a lineage to mutate6. Continue until k is one

Page 7: Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.

Coalescent in Python

• Straightforward translation into Python

04/18/23 Comp 790– Coalescence with Mutations 7

T = [[i,0.0] for i in xrange(N)] # gene number and time of merge k = N theta = 4.0*N*0.1 t = 0.0 while k > 1: t += expovariate(0.5*k*(k-1+theta)) if (random() < theta/(k-1+theta)): i = randint(0,k-1) T[i] = [T[i], t] else: i = randint(0,k-1) j = randint(0,k-1) while i == j: j = randint(0,k-1) T[i] = [T[i], T[j], t] T.pop(j) k -= 1

Page 8: Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.

An Alternate Algorithm• Waiting time until a mutation along a lineage is an exponential

distribution with parameter θ/2 (slide 4)• Equivalent to distributing mutations along a t-length path with Poisson

distribution having parameter tθ/2

• With mean, tθ/2. • Given the number of mutations on a branch, the times are random• The number and times of mutations on each branch are independent• Thus, coalescence can be computed first, then mutations can be inserted

along each branch• The placing mutations on branches is called a “Poisson process”

04/18/23 Comp 790– Coalescence with Mutations 8

P(Mt =j)=(tθ) j

j!2 je−tθ / 2

Page 9: Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.

Poisson Algorithm

• A variant of the continuous-time coalescent with mutations

04/18/23 Comp 790– Coalescence with Mutations 9

1. Simulate the genealogy of N sequences according to a coalescent process with rate (algorithm from lecture 4)

2. For each branch generate a random number, Mt, using a Poisson distribution with parameter tθ/2, where t is the length of the branch

3. For each branch, the times of the Mt mutation events are chosen at random

k2

⎝ ⎜

⎠ ⎟

Page 10: Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.

Benefits of New Algorithm

• Mutations can be added onto a genealogy in retrospect

• Tweak mutation and coalescent parameters independently

• Mutation does not effect the fundamental results of coalescence

• Mutations only impact the extant Alleles

04/18/23 Comp 790– Coalescence with Mutations

1. How long ago did N haplotypes diverge?

2. What mutation rate would explain the diversity seen?

1. How long ago did N haplotypes diverge?

2. What mutation rate would explain the diversity seen?

Page 11: Coalescence with Mutations Towards incorporating greater realism Last time we discussed 2 idealized models – Infinite Alleles, Infinite Sites A realistic.

Book

• What’s ahead• I will finish chapter 2 next Tuesday• From there on, one of you will be responsible

for the a chapter– Each chapter in 2 lectures (pick up the pace a bit),

a Thursday followed by a Tuesday– You’ll have a weekend to prepare each lecture– I will do chapter 8

04/18/23 Comp 790– Continuous-Time Coalescence 11