Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition...
Transcript of Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition...
![Page 1: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/1.jpg)
Probabilistic Programming; Ways Forward
Frank Wood
![Page 2: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/2.jpg)
Outline• What is probabilistic programming?
• What are the goals of the field?
• What are some challenges?
• Where are we now?
• Ways forward…
![Page 3: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/3.jpg)
What is probabilistic programming?
![Page 4: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/4.jpg)
An Emerging Field
ML: Algorithms &Applications
STATS: Inference &
Theory
PL: Compilers,Semantics,
Analysis
ProbabilisticProgramming
![Page 5: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/5.jpg)
Conceptualization
Parameters
Program
Output
CS
Parameters
Program
Observations
Probabilistic Programming Statistics
p(✓|x)
p(x|✓)p(✓)
x
![Page 6: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/6.jpg)
Operative Definition“Probabilistic programs are usual functional or imperative programs with two added constructs:
(1) the ability to draw values at random from distributions, and
(2) the ability to condition values of variables in a program via observations.”
Gordon et al, 2014
![Page 7: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/7.jpg)
What are the goals of probabilistic
programming?
![Page 8: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/8.jpg)
Increased Productivity
(fn [x] (logb 1.04 (+ 1 x)))
Lines of Matlab/Java Code
Line
s of
Ang
lican
Cod
e
HPYP, [Wood 2007]
DDPMO, [Neiswanger et al 2014]
PDIA, [Pfau 2010]
Collapsed LDA
DP Conjugate Mixture
log lin
p(⋅|d
ata)
![Page 9: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/9.jpg)
Automatic Inference
Programming Language Representation / Abstraction Layer
Inference Engine(s)
Models
CARON ET AL.
This lack of consistency is shared by other models based on the Polya urn construction (Zhuet al., 2005; Ahmed and Xing, 2008; Blei and Frazier, 2011). Blei and Frazier (2011) provide adetailed discussion on this issue and describe cases where one should or should not bother about it.
It is possible to define a slightly modified version of our model that is consistent under marginal-isation, at the expense of an additional set of latent variables. This is described in Appendix C.
3.2 Stationary Models for Cluster Locations
To ensure we obtain a first-order stationary Pitman-Yor process mixture model, we also need tosatisfy (B). This can be easily achieved if for k 2 I(mt
t)
Uk,t ⇠⇢
p (·|Uk,t�1) if k 2 I(mtt�1)
H otherwise
where H is the invariant distribution of the Markov transition kernel p (·|·). In the time seriesliterature, many approaches are available to build such transition kernels based on copulas (Joe,1997) or Gibbs sampling techniques (Pitt and Walker, 2005).
Combining the stationary Pitman-Yor and cluster locations models, we can summarize the fullmodel by the following Bayesian network in Figure 1. It can also be summarized using a Chineserestaurant metaphor (see Figure 2).
Figure 1: A representation of the time-varying Pitman-Yor process mixture as a directed graphi-cal model, representing conditional independencies between variables. All assignmentvariables and observations at time t are denoted ct and zt, respectively.
3.3 Properties of the Models
Under the uniform deletion model, the number At =P
imti,t�1 of alive allocation variables at time
t can be written as
At =
t�1X
j=1
nX
k=1
Xj,k
8
c0 Hr
� �m
c1 ⇡m
Hy ✓m
r0
s0
r1
s1
y1
r2
s2
y2
rT
sT
yT
r3
s3
y31
Gaussian Mixture Model
¼
µc
yi
k
k
i
N
K
K
α
Gπ
θc
yi
k
k o
i
N
K
K
α
Gπ
θc
yi
k
k o
i
N
1
1
Figure : From left to right: graphical models for a finite Gaussian mixture model(GMM), a Bayesian GMM, and an infinite GMM
ci |~⇡ ⇠ Discrete(~⇡)
~
yi |ci = k ;⇥ ⇠ Gaussian(·|✓k).
~⇡|↵ ⇠ Dirichlet(·| ↵K
, . . . ,
↵
K
)
⇥ ⇠ G0
Wood (University of Oxford) Unsupervised Machine Learning January, 2014 16 / 19
Latent Dirichlet Allocation
↵
w
diz
di �k �
d = 1 . . . D
i = 1 . . . N
d.
✓d
k = 1 . . . K
Figure 1. Graphical model for LDA model
Lecture LDA
LDA is a hierarchical model used to model text documents. Each document is modeled as
a mixture of topics. Each topic is defined as a distribution over the words in the vocabulary.
Here, we will denote by K the number of topics in the model. We use D to indicate the
number of documents, M to denote the number of words in the vocabulary, and N
d. to
denote the number of words in document d. We will assume that the words have been
translated to the set of integers {1, . . . , M} through the use of a static dictionary. This is
for convenience only and the integer mapping will contain no semantic information. The
generative model for the D documents can be thought of as sequentially drawing a topic
mixture ✓d for each document independently from a DirK(↵
~
1) distribution, where DirK(
~
�)
is a Dirichlet distribution over the K-dimensional simplex with parameters [�1, �2, . . . , �K ].
Each of K topics {�k}Kk=1 are drawn independently from DirM (�
~
1). Then, for each of the
i = 1 . . . N
d. words in document d, an assignment variable z
di is drawn from Mult(✓
d).
Conditional on the assignment variable z
di , word i in document d, denoted as w
di , is drawn
independently from Mult(�zdi). The graphical model for the process can be seen in Figure 1.
The model is parameterized by the vector valued parameters {✓d}Dd=1, and {�k}K
k=1, the
parameters {Z
di }d=1,...,D,i=1,...,Nd
., and the scalar positive parameters ↵ and �. The model
is formally written as:
✓d ⇠ DirK(↵
~
1)
�k ⇠ DirM (�
~
1)
z
di ⇠ Mult(✓d)
w
di ⇠ Mult(�zd
i)
1
✓d ⇠ DirK (↵~1)
�k ⇠ DirM(�~1)
z
di ⇠ Discrete(✓d)
w
di ⇠ Discrete(�zdi
)
Wood (University of Oxford) Unsupervised Machine Learning January, 2014 15 / 19
![Page 10: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/10.jpg)
What are some challenges?
![Page 11: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/11.jpg)
Challenges• Unbounded recursion
• Equality and continuous variables
![Page 12: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/12.jpg)
Unbounded Recursion(defn geometric "generates geometrically distributed values in {0,1,2,...}" ([p] (geometric p 0)) ([p n] (if (sample (flip p)) n (geometric p (+ n 1)))))
0
1
1-pp
1-pp
2
1-pp
![Page 13: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/13.jpg)
Defining Distributions(defm pick-a-stick [stick v l k] “picks a stick given a stick generator given a value v ~ uniform-continuous(0,1) should be called with l = 0.0, k=1” (let [u (+ l (stick k))] (if (> u v) k (pick-a-stick stick v u (+ k 1)))))
(stick 1) (stick 2) (stick 3) (stick 4) (…) (stick 6)
v (sample (uniform-continuous 0 1))
![Page 14: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/14.jpg)
Semantics and Termination(defn p [] (if (sample (flip 0.5)) 1 (if (sample (flip 0.5)) (p) (infinite-loop))))
(def infinite-loop #(loop [] (recur)))
1
1
1
0.50.5
0.50.5
0.50.5
0.50.5
0.50.5
p(x = 1) =1X
n=1
1
2
2n�1
=2
3?
![Page 15: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/15.jpg)
Equality and Continuous Variables
Why are your probabilistic programming systems anti-
equality?
![Page 16: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/16.jpg)
(defquery bayes-net [] (let [is-cloudy (sample (flip 0.5)) is-raining (cond (= is-cloudy true ) (sample (flip 0.8)) (= is-cloudy false) (sample (flip 0.2))) sprinkler (cond (= is-cloudy true ) (sample (flip 0.1)) (= is-cloudy false) (sample (flip 0.5))) wet-grass (cond (and (= sprinkler true) (= is-raining true))
(sample (flip 0.99)) (and (= sprinkler false) (= is-raining false)) (sample (flip 0.0)) (or (= sprinkler true) (= is-raining true))
(sample (flip 0.9)))] (observe (= wet-grass true))
(predict :s (hash-map :is-cloudy is-cloudy :is-raining is-raining :sprinkler sprinkler))))
Equality
![Page 17: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/17.jpg)
(defquery bayes-net [] (let [is-cloudy (sample (flip 0.5)) is-raining (cond (= is-cloudy true ) (sample (flip 0.8)) (= is-cloudy false) (sample (flip 0.2))) sprinkler (cond (= is-cloudy true ) (sample (flip 0.1)) (= is-cloudy false) (sample (flip 0.5))) wet-grass (cond (and (= sprinkler true) (= is-raining true))
(sample (flip 0.99)) (and (= sprinkler false) (= is-raining false)) (sample (flip 0.0)) (or (= sprinkler true) (= is-raining true))
(sample (flip 0.9)))] (observe (dirac wet-grass) true)
(predict :s (hash-map :is-cloudy is-cloudy :is-raining is-raining :sprinkler sprinkler))))
p(x|y = o) / �(y � o)p(x,y)
= p(x,y = o)
Dirac Observe
![Page 18: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/18.jpg)
(defquery bayes-net [] (let [is-cloudy (sample (flip 0.5)) is-raining (cond (= is-cloudy true ) (sample (flip 0.8)) (= is-cloudy false) (sample (flip 0.2))) sprinkler (cond (= is-cloudy true ) (sample (flip 0.1)) (= is-cloudy false) (sample (flip 0.5))) wet-grass (cond (and (= sprinkler true) (= is-raining true))
(sample (flip 0.99)) (and (= sprinkler false) (= is-raining false)) (sample (flip 0.0)) (or (= sprinkler true) (= is-raining true))
(sample (flip 0.9)))] (observe (normal 0.0 tolerance) (d wet-grass true))
(predict :s (hash-map :is-cloudy is-cloudy :is-raining is-raining :sprinkler sprinkler))))
ABC Observe
p(x|y = o) / p(d(y,o))p(x,y)
![Page 19: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/19.jpg)
(defquery bayes-net [] (let [is-cloudy (sample (flip 0.5)) is-raining (cond (= is-cloudy true ) (sample (flip 0.8)) (= is-cloudy false) (sample (flip 0.2))) sprinkler (cond (= is-cloudy true ) (sample (flip 0.1)) (= is-cloudy false) (sample (flip 0.5))) wet-grass (cond (and (= sprinkler true) (= is-raining true))
(flip 0.99) (and (= sprinkler false) (= is-raining false)) (flip 0.0) (or (= sprinkler true) (= is-raining true))
(flip 0.9))] (observe wet-grass true)
(predict :s (hash-map :is-cloudy is-cloudy :is-raining is-raining :sprinkler sprinkler))))
Noisy Observe
p(x|y = o) / p(o|x)p(x)
![Page 20: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/20.jpg)
Continuous Variables(defquery unknown-mean [] (let [sigma (sqrt 2) mu (marsaglia-normal 1 5)] (observe (normal mu sigma) 9) (observe (normal mu sigma) 8) (predict :mu mu)))
![Page 21: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/21.jpg)
Measure Theoretic Challenges
(defquery which-nationality [gpa] (let [nationality (sample (categorical [["USA" 0.25] ["India" 0.75]])) simulated_gpa (if (= nationality "USA") (american-gpa) (indian-gpa))] (observe (dirac simulated_gpa) gpa) (predict :nationality nationality)))
p(nationality = “USA"| gpa = 4.0) = ?
The “Indian GPA problem”
![Page 22: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/22.jpg)
American GPA Distribution [0,4](defn american-gpa [] (if (sample (flip 0.95)) (* 4 (sample (beta 8 2))) (if (sample (flip 0.85)) 4.0 0.0)))
![Page 23: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/23.jpg)
Indian GPA Distribution [0,10](defn indian-gpa [] (if (sample (flip 0.99)) (* 10 (sample (beta 5 5))) (if (sample (flip 0.1)) 0.0 10.0)))
![Page 24: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/24.jpg)
Mixed GPA Distribution(defn student-gpa [] (if (sample (flip 0.25)) (american-gpa) (indian-gpa)))
![Page 25: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/25.jpg)
Measure Theoretic Challenges
(defquery which-nationality [gpa tolerance] (let [nationality (sample (categorical [["USA" 0.25] ["India" 0.75]])) simulated_gpa (if (= nationality "USA") (american-gpa) (indian-gpa))] (observe (normal simulated_gpa tolerance) gpa) (predict :nationality nationality)))
p(nationality = “USA"| gpa = 4.0) = ?
The “Indian GPA problem” by Russell
![Page 26: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/26.jpg)
Where are we now?
![Page 27: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/27.jpg)
Discrete RV’s Only
2000
1990
2010
SystemsPL
HANSAI
IBAL
Figaro
ML STATS
WinBUGS
BUGS
JAGS
STANLibBi
Venture Anglican
Church
Probabilistic-C
infer.NET
webChurch
Blog
Factorie
AI
Prism
Prolog
KMP
Bounded Recursion
Problog
Simula
![Page 28: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/28.jpg)
Ways forward...
![Page 29: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/29.jpg)
Trace Probability• observe data points
• internal random choices
• simulate from
by running the program forward
• weight execution traces byy1 y2
✓
x1 x2
x11 x12 x13 x21 x22
{ {etc
p(y1:N ,x1:N ) =NY
n=1
g(yn|x1:n)f(xn|x1:n�1)
y1 y2
x1 x2 x3
y3
f(xn|x1:n�1)
g(yn|x1:n)
xn
yn
![Page 30: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/30.jpg)
n = 1 n = 2Iteratively,
- simulate - weight - resample
SMC
Observe
Parti
cle
![Page 31: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/31.jpg)
Intuitively
- run- wait - fork
SMC for Probabilistic ProgrammingTh
read
s
observe delimiter
continuations
![Page 32: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/32.jpg)
SMC Inner Loop
n n n
…
n n n
…
n n n
…
• Sequential Monte Carlo is now a building block for other inference techniques
• Particle MCMC - PIMH : “particle
independent Metropolis-Hastings”
- iCSMC : “iterated conditional SMC”
-‐
s=1
s=2
s=3
[Andrieu, Doucet, Holenstein 2010]
[W., van de Meent, Mansinghka 2014]
![Page 33: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/33.jpg)
SMC slowed down for clarity
SMC Parallelism Bottleneck
![Page 34: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/34.jpg)
Asynchronously
- simulate - weight - branch
n = 1 n = 2
Particle Cascade
Paige, W., Doucet, Teh; NIPS 2014
![Page 35: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/35.jpg)
Particle Cascade
![Page 36: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/36.jpg)
Particle Cascade
![Page 37: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/37.jpg)
Particle Cascade
![Page 38: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/38.jpg)
Particle Cascade
![Page 39: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/39.jpg)
Particle Cascade
![Page 40: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/40.jpg)
Particle Cascade
![Page 41: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/41.jpg)
Particle Cascade
![Page 42: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/42.jpg)
The particle cascade provides an unbiased estimator of the marginal likelihood, whose variance decreases proportionally to the number of initial particles K0:
Theorem: For any K0 ≥ 1 and n ≥ 0, .
Theorem: For any n ≥ 0, there exists a constant an such that
p(y0:n) :=1
K0
KnX
k=1
W kn
V[p(y0:n)] <anK0
E[p(y0:n)] = p(y0:n)
Theoretical Properties
![Page 43: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/43.jpg)
Conclusion
![Page 44: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/44.jpg)
Bubble Up
Inference
Probabilistic Programming Language
Models
Applications
Probabilistic Programming System
![Page 45: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/45.jpg)
Thank You• Questions?
• Funding: DARPA, Amazon, Microsoft
![Page 46: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/46.jpg)
Opportunities• Parallelism
“Asynchronous Anytime Sequential Monte Carlo” [Paige, W., Doucet, Teh NIPS 2014]
• Backwards passing “Particle Gibbs with Ancestor Sampling for Probabilistic Programs” [van de Meent, Yang, Mansinghka, W. AISTATS 2015]
• Search “Maximum a Posteriori Estimation by Search in Probabilistic Models” [Tolpin, W., SOCS, 2015]
• Adaptation “Output-Sensitive Adaptive Metropolis-Hastings for Probabilistic Programs” [Tolpin, van de Meent, Paige, W ; in submission]
• Novel proposals “Adaptive PMCMC” [Paige, W.; in submission]
![Page 47: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/47.jpg)
Probabilistic-C z0 ⇠ Discrete([1/K, . . . , 1/K]) zn|zn�1 ⇠ Discrete(Tzn�1) yn|zn ⇠ Normal(µzn ,�
2)
Paige & W.; ICML 2014
![Page 48: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/48.jpg)
How can you participate?
![Page 49: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/49.jpg)
Ways to Participate• Contribute applications
• https://bitbucket.org/fwood/anglican-examples
• Contribute inference algorithms
• https://bitbucket.org/dtolpin/embang
![Page 50: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/50.jpg)
An Analogy
Automatic Differentiation Supervised Learning
Probabilistic Programming Unsupervised Learning
![Page 51: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/51.jpg)
General Purpose Inference(defquery sat-solver [N formula] "explores an N-dimensional universe for worlds that satisfy the formula” (let [state (repeatedly N (fn [] (sample (flip 0.5))))] (observe (dirac (formula state)) true) (predict :state state)))
(defdist dirac "Dirac distribution" [x] [] (sample [this] x) (observe [this value] (if (= x value) 0.0 NegInf)))
(defm satisfiable-3cnf-formula [state] (let [v (fn [i] (nth state i))] (and (or (v 0) (not (v 1)) (not (v 2))) (or (not (v 0)) (v 1) (v 2)) (or (not (v 0)) (not (v 1)) (not (v 2))))))
![Page 52: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/52.jpg)
General Purpose Inference(defquery md5-inverse [L md5str] "conditional distribution of strings that map to the same MD5 hashed string" (let [mesg (sample (string-generative-model L))] (observe (dirac md5str) (md5 mesg)) (predict :message mesg))))
![Page 53: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/53.jpg)
NN
AI
RLPM
Vision
![Page 54: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/54.jpg)
Particle Cascade
![Page 55: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/55.jpg)
Not Sum-Product: Bayesian HMMTk ⇠ Dirichlet(↵k)Suppose the transition matrix is unknown:
Paige & W.; ICML 2014
![Page 56: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/56.jpg)
2000
1990
2010
Range of EffectivenessPL
HANSAI
IBAL
Figaro
ML STATS
WinBUGS
BUGS
JAGS
STANLibBi
Venture Anglican
Church
Probabilistic-C
infer.NET
webChurch
Blog
Factorie
AI
Prism
Prolog
KMP
![Page 57: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/57.jpg)
Continuous Variables(defm marsaglia-normal [mean var] (let [d (uniform-continuous -1.0 1.0) x (sample d) y (sample d) s (+ (* x x) (* y y))] (if (< s 1) (+ mean (* (sqrt var)
(* x (sqrt (* -2 (/ ( log s) s)))))) (marsaglia-normal mean var))))
![Page 58: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/58.jpg)
Scalability: Particle Count
• Comparison across particle-based inference approaches: raw speed of drawing samples
![Page 59: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/59.jpg)
Unbounded Recursion
Expressivity Efficiency
![Page 60: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/60.jpg)
Credits
• Code highlighting: http://hilite.me
![Page 61: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/61.jpg)
Forward Inference (SMC)
Observe
Parti
cle
/ C
ontin
uatio
n
![Page 62: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/62.jpg)
Bayesian Nonparametrics(defm pick-a-stick [stick v l k] ; picks a stick given a stick generator ; given a value v ~ uniform-continuous(0,1) ; should be called with l = 0.0, k=1 (let [u (+ l (stick k))] (if (> u v) k (pick-a-stick stick v u (+ k 1)))))
(defm remaining [b k] (if (<= k 0) 1 (* (- 1 (b k)) (remaining b (- k 1)))))
(defm polya [stick] ; given a stick generating function ; polya returns a function that samples ; stick indexes from the stick lengths (let [uc01 (uniform-continuous 0 1)] (fn [] (let [v (sample uc01)] (pick-a-stick stick v 0.0 1)))))
(defm dirichlet-process-breaking-rule [alpha k] (sample (beta 1.0 alpha)))
(defm stick [breaking-rule] ; given a breaking-rule function which ; returns a value between 1 and 0 given a ; stick index k returns a function that ; returns the stick length for index k (let [b (mem breaking-rule)] (fn [k] (if (< 0 k) (* (b k) (remaining b (- k 1))) 0))))
(stick 1) (stick 2) (stick 3) (stick 4) (…) (stick 6)
v (sample(uniform-continuous 0 1))
![Page 63: Probabilistic Programming; Ways Forwardfwood/talks/2015/dali-keynote...Operative Definition “Probabilistic programs are usual functional or imperative programs with two added constructs:](https://reader034.fdocuments.in/reader034/viewer/2022042106/5e854bf83613f103616e4418/html5/thumbnails/63.jpg)
Syntax & Implementation Considerations• Embedded vs. Standalone
• Imperative vs. functional
• Lisp vs. Python vs. C vs.