Generative Models
description
Transcript of Generative Models
Generative Models
Announcements
• Probability Review (Friday, 1:15 Gates B03)
• Late days…
• To be fair…
• Start the p-set early
double late days.
Where we are
Machine LearningVariable
Based
Search
CS221
Machine LearningVariable
Based
Search
CS221
Machine Learning
Search
Variable Based
CS221
Where We Left Off
Where We Left Off
Loopy Not loopy
Purple Not Purple Purple Not Purple
Drugged 0.108 0.012 0.072 0.008
Not Drugged 0.016 0.064 0.144 0.576
Key IdeaIf we have a joint distribution over all variables, then given evidence (which could be multiple variables) E = e, we can find the probability of any query variable X = x.
These are values in our table!
Y is all variables that aren’t in X or E
Y is all variables that aren’t in E
Key IdeaIf we have a joint distribution over all variables, then given evidence (which could be multiple variables) E = e, we can find the probability of any query variable X = x.
Key IdeaIf we have a joint distribution over all variables, then given evidence (which could be multiple variables) E = e, we can find the probability of any query variable X = x.
Since we know that p(x | e)’s must sum to 1
Loopy Not loopy
Purple Not Purple Purple Not Purple
Drugged 0.108 0.012 0.072 0.008
Not Drugged 0.016 0.064 0.144 0.576
Key Idea
Loopy Not loopy
Purple Not Purple Purple Not Purple
Drugged 0.108 0.012 0.072 0.008
Not Drugged 0.016 0.064 0.144 0.576
Key Idea
Loopy Not loopy
Purple Not Purple Purple Not Purple
Drugged 0.108 0.012 0.072 0.008
Not Drugged 0.016 0.064 0.144 0.576
Key Idea
Loopy Not loopy
Purple Not Purple Purple Not Purple
Drugged 0.108 0.012 0.072 0.008
Not Drugged 0.016 0.064 0.144 0.576
Key Idea
Loopy Not loopy
Purple Not Purple Purple Not Purple
Drugged 0.108 0.012 0.072 0.008
Not Drugged 0.016 0.064 0.144 0.576
Key Idea
Loopy Not loopy
Purple Not Purple Purple Not Purple
Drugged 0.108 0.012 0.072 0.008
Not Drugged 0.016 0.064 0.144 0.576
Key Idea
Key Idea
Our joint gets too big
Where We Left OffLoopy Not loopy
Purple Not Purple Purple Not Purple
Drugged 0.108 0.012 0.072 0.008
Not Drugged 0.016 0.064 0.144 0.576
Add variable Snowden location: { Hong Kong, Sao Paulo, Moscow, Nairobi, Caracas, Guantanamo}
Size of the table is now 2*2*2*6 = 48
But what does Snowden have to do with drugged out rockstars?
Really are independent…
Joint is exponential in size.
Independence
l = loopyp = purpled = druggeds = snowden
If we have two tables, one over l, p, d and one for s, we could recreate the joint.
What else is independent?
SnowdenDrugged
Purple Loopy
What else is independent?
SnowdenDrugged
Purple Loopy
Purple and loopy?
What else is independent?
SnowdenDrugged
Purple Loopy
Both caused by drugged
What else is independent?
SnowdenDrugged
Purple Loopy
If you know drugged, purple and loopy are
independent!
Conditional Independence
If you know drugged, purple and loopy are
independent!
If you know drugged, purple and loopy are
independent!
Conditional Independence
Joint
This is important!
If you know drugged, purple and loopy are
independent!
𝑃 (𝑙 ,𝑝 ,𝑑)=𝑃 (𝑙 ,𝑝|𝑑 )𝑃 (𝑑)
Conditional Independence
Joint
If you know drugged, purple and loopy are
independent!
Conditional Independence
Joint
Drugged
Purple Loopy
No longer need the full joint.
Conditional Independence
We only need p(var | causes) for each var.
Model the world with variables
And what causes what
Bayesian Network
Bayesian Network
Bayesian Network
CoughFeverVomit
FluStomach
Bug
Bayesian Network
CoughFeverVomit
FluStomach
Bug
Bayesian Network
Cough (c)Fever (t)Vomit (v)
Flu (f)Stomach bug (s)
Bayesian Network
Cough (c)Vomit (v)
Flu (f)Stomach bug (s)
Joint
Fever (t)
Bayesian Network
Joint
Bayesian Network
Cough (c)Fever (t)Vomit (v)
Flu (f)Stomach bug (s)
Joint
Definition: Bayes Net = DAGDAG: directed acyclic graph (BN’s structure)
• Nodes: random variables (typically discrete, but methods also exist to handle continuous variables)
• Arcs: indicate probabilistic dependencies between nodes. Go from cause to effect.
• CPDs: conditional probability distribution (BN’s parameters) Conditional probabilities at each node, usually stored as a table (conditional probability table, or CPT)
Root nodes are a special case – no parents, so just use priors in CPD:
iiii xxP of nodesparent all ofset theis where)|(
)()|( so , iiii xPxP
Formally
What does NSA do with our data?
Real World Problem
Formal Problem
Solution
Model the problem
Apply an Algorithm
Evaluate
The AI Pipeline
Live Research
Research Project
g3
t1 t2 t3
e1 e2 e3
g1 g2 b
i
?
Research Project
g3
t1 t2 t3
e1 e2 e3
g1 g2 b
i
?
Research Project
g1 g1*≃?
Modeling Surprise
g1 g1*≃?
Competition
Chose top 5
Test how well they predict grades
Select a finalist (gets +)
TA Review
Actually re-grade
Publish?
On worst pset question
Prize
+Due Tuesday before class (email staff. Subject:
Modeling Regrades)
Novel Science
http://vimeo.com/60381274
What does NSA do with our data?
Research Project
g3
t1 t2 t3
e1 e2 e3
g1 g2 b
i
?
Can someone fix this?
Peer Graders