. Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo...
-
date post
22-Dec-2015 -
Category
Documents
-
view
218 -
download
0
Transcript of . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo...
![Page 1: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/1.jpg)
.
Learning – EM in The ABO locusTutorial #9
© Ilan Gronau.
Based on original slides of Ydo Wexler & Dan Geiger
![Page 2: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/2.jpg)
2
Genotype statistics
Mendelian Genetics:• locus - a particular location on a chromosome (genome)
- Each locus has two copies – alleles (one paternal and one maternal)- Each copy has several relevant states - genotypes
• locus genotype is determined by the combined genotype of both copies.• locus genotype yields phenotype (physical features)
NN tsts ,,
We wish to estimate the distribution of all possible genotypes.
Suppose we randomly sample N individuals and found the
number Ns,t.
The MLE is given by: Sampling genotypes is costlySampling phenotypes is cheap
![Page 3: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/3.jpg)
3
The ABO locus
• ABO locus determines blood-type
• It has six possible genotypes {a/a, a/o, b/o, b/b, a/b, o/o}.
• They lead to four possible phenotypes: {A, B, AB, O}
We wish to estimate the proportion in a population of the 6
genotypes.
- Sample genotype – sequence a genomic region
- Sample phenotype - checking presence of antibodies (simple
blood test)
Problem: phenotype doesn’t reveal genotype (in case of
A,B)
![Page 4: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/4.jpg)
4
The ABO locus
Problem: phenotype doesn’t reveal genotype
The probabilistic model: Allele genotypes are distributed
i.i.d w.p a ,b ,o, and determine probabilities for locus genotypes:
• a/b=2a b ; a/o=2a o ; b/o=2b o
• a/a= a2 ; b/b=b
2 ; o/o=o2
This implies probabilities for phenotypes:
• Pr[P=A |Θ] = a/a+a/o = a2+2a o
• Pr[P=B |Θ] = b/b+b/o = b2+2b o
• Pr[P=AB |Θ] = a/b= 2a b
• Pr[P=O |Θ] = o/o = o2
Hardy-Weinbergequilibrium
Θ - model parameter set
Θ={a ,b ,o}
![Page 5: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/5.jpg)
5
Likelihood of phenotype data
Given a population phenotype sample: Data =
{B,A,B,B,O,A,B,A,O,B, AB}
the likelihood of our parameter set Θ={a ,b ,o} is:
3 5 212 2 2Pr[ | ] 2 2 2a a o b b o a b oData A B AB O
• Maximum of this function yields the MLE
Use EM to obtain this
![Page 6: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/6.jpg)
6
The EM algorithm
The setting for the algorithm:
• Our data is a series of outcomes of experiments.
• Each experiment is conducted identically and independently.
• The outcome of an experiment is a function of values selected
for a set of discrete random variables – X1,..Xn .
• The actual values selected for X1,..Xn may be hidden from us.
We wish to find the MLE of the p.d’s for X1,..Xn .
![Page 7: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/7.jpg)
7
The EM algorithmThe setting for the algorithm:• Our data is a series of outcomes of experiments.• Each experiment is conducted identically and independently.• The outcome of an experiment is a function of values selected for a set
of discrete random variables – X1,..Xn .
• The actual values selected for X1,..Xn may be hidden from us.
We wish to find the MLE of the p.d’s for X1,..Xn .Examples:
1.Genotyping in the ABO locus:• Single hidden variable X – a single allele genotype (a,b, or o)
• Model parameters - Θ={a ,b ,o}
2.Hidden Markov Models:
• Two hidden variables Ts , Es for every state state s
(Es – chooses signal ; Ts – chooses next state)
• Model parameters – transition and emmission probabilities.
![Page 8: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/8.jpg)
8
The EM algorithm
Start with some set of parameters- Θ.
Iterate until convergence:
• E-step:
calculate the expected count for every possible result of every hidden variable in the model, as implied by data and Θ
• M-step:
For every hidden variable:
- Use expected counts as statistics to yield Θ’ MLE(data,Θ)
![Page 9: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/9.jpg)
9
The EM algorithmE-step:
calculate the expected count for every possible result of every hidden variable in the model, as implied by data and Θ
M-step:
For every hidden variable:- Use expected counts as statistics to yield Θ’ MLE(data,Θ)
In our example:
• Single hidden variable X – a single allele genotype (a,b, or o)
• Model parameters - Θ={a ,b ,o}
E-step: count the expected number of a,b,o alleles in
population(total number of counts - 2n).
M-step: set ’a = #a/2n ; ’b = #b/2n ; ’o = #o/2n .
![Page 10: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/10.jpg)
10
E-step calculations – gene counting
genotype
a/o
a/a
b/o
b/b
a/b
o/o
gene count
a b o
1 0 1
2 0 0
0 1 1
0 2 0
1 1 0
0 0 2
pheno-
type
A
B
AB
O
prob
2 a o 2
a
2 b o 2
b
2 a b 2
o
gene count
a b o
0
0
1 1 0
0 0 2
2
2o
o a
2a
o a
2
2o
o a
2
2o
o b
2
2o
o b
2b
o b
1*
+2*
1*
1*
+2*
1*
observed outcome
of “experiment”
result(s) ofhidden variables
![Page 11: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/11.jpg)
11
Datatype #people
A 100
B 200
AB 50
O 50
We start with an initial guess: 0 = {0.2, 0.2, 0.6}
A numeric example
Sufficient statistics:
nA , nB , nAB , nO
a b o
![Page 12: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/12.jpg)
12
1st iteration: 0= {0.2, 0.2, 0.6}
27
2( )100 200 0 50 1 50 0 164
(2 )o a
o a
A numeric example - execution of EMData
type #people
A 100
B 200
AB 50
O 50E-step: A B AB O
E[(#a)] =
E[(#b)] =
E[(#o)] =
47
2( )100 0 200 50 1 50 0 278
(2 )o b
o b
17
2 2100 200 50 0 50 2 357
(2 ) (2 )o o
o a o b
800 = 2nM-step:2 4 1
7 7 7164 278 357' 0.205 ; ' 0.348 ; ' 0.447
800 800 800a b o
1= {0.205, 0.348, 0.447}
![Page 13: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/13.jpg)
13
A numeric example - execution of EMData
type #people
A 100
B 200
AB 50
O 50E-step: A B AB O
E[(#a)] =
E[(#b)] =
E[(#o)] =
800 = 2nM-step:
2nd iteration: 1= {0.205, 0.348, 0.447}
2= {0.211, 0.383, 0.406}
168.66 306.04 325.3' 0.211 ; ' 0.383 ; ' 0.406
800 800 800a b o
2( )100 200 0 50 1 50 0 168.66
(2 )o a
o a
2( )100 0 200 50 1 50 0 306.04
(2 )o b
o b
2 2
100 200 50 0 50 2 325.3(2 ) (2 )
o o
o a o b
![Page 14: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/14.jpg)
14
E-step:
2( )[# ] 1
(2 )
2( )[# ] 1
(2 )
2 2[# ] 2
(2 ) (2 )
o aA AB
o a
o bB AB
o b
o oA B O
o a o b
E a n n
E b n n
E o n n n
Sufficient statistics – nA , nB , nAB , nO
M-step: [# ] [# ] [# ]; ;
2 2 2a b b
E a E b E b
n n n
EM algorithm for the ABO locus - summary
![Page 15: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/15.jpg)
15
Iteration update formula:
2( )11
2 (2 )
2( )11
2 (2 )
2 212
2 (2 ) (2 )
o aa A AB
o a
o bb B AB
o b
o oo A B O
o a o b
n nn
n nn
n n nn
Sufficient statistics – nA , nB , nAB , nO ,
EM algorithm for the ABO locus - summary
![Page 16: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/16.jpg)
16
EM algorithm – ABO exampleData
type #people
A 100
B 200
AB 50
O 50
0.20
0.38
0.42
a,
b, o
Learning iteration
![Page 17: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/17.jpg)
17
EM algorithm – ABO exampleData
type #people
A 100
B 200
AB 50
O 50
0.20
0.38
0.42
a,
b, o
Learning iteration
good convergence(maybe)
![Page 18: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/18.jpg)
18
Alternative solution
Alternative view:
• Single hidden variable X’ – a maternal allele genotype (a,b, or
o)
• Model parameters - Θ={a ,b ,o}
E-step: count the expected number of maternal a,b,o alleles
in population (total number of counts - n).
M-step: set ’a = #a/n ; ’b = #b/n ; ’o = #o/n .
Initial view:• Single hidden variable X – a single allele genotype (a,b, or o)
• Model parameters - Θ={a ,b ,o}
E-step: count the expected number of a,b,o alleles in population(total number of counts - 2n).
M-step: set ’a = #a/2n ; ’b = #b/2n ; ’o = #o/2n .
![Page 19: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/19.jpg)
19
count
a b o
1* 0
1*
0 1*
1*
1/2 1/2 0
0 0 1
E-step calculations – gene countingmat.gen.
o
a
o
b
a
b
o
count
a b o
0 0 1
1 0 0
0 0 1
0 1 0
1 0 0
0 1 0
0 0 1
pheno-
type
A
B
AB
O
prob
2a o
o a
2
o
o a
2o
o b
observed outcome
of “experiment”
result(s) ofhidden variables
b o
( )b b o
a b
2o
a b
a o
( )a a o
2b o
o b
Exactly ½ of what we got by gene counting
![Page 20: . Learning – EM in The ABO locus Tutorial #9 © Ilan Gronau. Based on original slides of Ydo Wexler & Dan Geiger.](https://reader030.fdocuments.in/reader030/viewer/2022032523/56649d7d5503460f94a5fc6d/html5/thumbnails/20.jpg)
20
Iteration update formula:
( )1 1
(2 ) 2
( )1 1
(2 ) 2
11
(2 ) (2 )
o aa A AB
o a
o bb B AB
o b
o oo A B O
o a o b
n nn
n nn
n n nn
Sufficient statistics – nA , nB , nAB , nO ,
EM algorithm for the ABO locus - summary