Fast Robustness Quantification with Variational...
Transcript of Fast Robustness Quantification with Variational...
Fast Robustness Quantification with Variational Bayes
ITT Career Development Assistant Professor,
MIT
Tamara Broderick
With: Ryan Giordano, Rachael Meager, Jonathan Huggins, Michael I. Jordan
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
p(✓|x) /✓ p(x|✓)p(✓)
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
p(✓|x) /✓ p(x|✓)p(✓)
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
p(✓|x) /✓ p(x|✓)p(✓)
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
p(✓|x) /✓ p(x|✓)p(✓)
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
p(✓|x) /✓ p(x|✓)p(✓)
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
p(✓|x) /✓ p(x|✓)p(✓)
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
p(✓|x) /✓ p(x|✓)p(✓)
Bayes Theorem
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
p(✓|x) /✓ p(x|✓)p(✓)
Bayes Theorem
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
p(✓|x) /✓ p(x|✓)p(✓)
Bayes Theorem
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
1
p(✓|x) /✓ p(x|✓)p(✓)
Bayes Theorem
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
Robustness quantification
1
p(✓|x) /✓ p(x|✓)p(✓)
Bayes Theorem
• Robustness • Global & local • Rarely used • Approximation,
MCMC • Our solution: linear
response variational Bayes
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
Robustness quantification
1
p(✓|x) /✓ p(x|✓)p(✓)
Bayes Theorem
• Robustness • Global & local • Rarely used • Approximation,
MCMC • Our solution: linear
response variational Bayes
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
Robustness quantification
1
p(✓|x) /✓ p(x|✓)p(✓)
Bayes Theorem
• Robustness • Global & local • Rarely used • Approximation,
MCMC • Our solution: linear
response variational Bayes
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
Robustness quantification
1
p(✓|x) /✓ p(x|✓)p(✓)
Bayes Theorem
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
Robustness quantification
1
p(✓|x) /✓ p(x|✓)p(✓)
• Robustness • Global & local • Rarely used • Approximation
MCMC • Our solution: linear
response variational Bayes
Bayes Theorem
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
Robustness quantification
1
p(✓|x) /✓ p(x|✓)p(✓)
• Robustness • Global & local • Rarely used • Approximation,
MCMC • Our solution: linear
response variational Bayes
Bayes Theorem
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
Robustness quantification
1
p(✓|x) /✓ p(x|✓)p(✓)
• Robustness • Global & local • Rarely used • Approximation,
MCMC • Our solution: linear
response variational Bayes
Bayes Theorem
• Bayesian inference • Complex, modular models; posterior distribution
• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models
Robustness quantification
1
p(✓|x) /✓ p(x|✓)p(✓)
• Robustness • Global & local • Rarely used • Approximation,
MCMC • Our solution: linear
response variational Bayes
Bayes Theorem
• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB
2
Robustness quantification
• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB
2
Robustness quantification
• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB
2
Robustness quantification
• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB
2
Robustness quantification
• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB
2
Robustness quantification
• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB
2
Robustness quantification
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast
p(✓|x)q(✓)
q⇤(✓)
Variational Bayes
3
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast
p(✓|x)q(✓)
q⇤(✓)
Variational Bayes
3
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast
p(✓|x)q(✓)
q⇤(✓)
Variational Bayes
3
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast
q⇤(✓)
Variational Bayes
3
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast
q⇤(✓)
Variational Bayes
q(✓)
3
q(✓)
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast
q⇤(✓)
Variational Bayes
p(✓|x)
3
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast
q⇤(✓)
p(✓|x)
q⇤(✓)
Variational Bayes
3
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast
q⇤(✓)
Variational Bayes
p(✓|x)
q⇤(✓)
3
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast
q⇤(✓)
Variational Bayes
p(✓|x)
q⇤(✓)
3
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast
q⇤(✓)
Variational Bayes
p(✓|x)
q⇤(✓)
3
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast
q⇤(✓)
Variational Bayes
p(✓|x)
q⇤(✓)
3
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast
q⇤(✓)
[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]
Variational Bayes
p(✓|x)
q⇤(✓)
3
• Variational Bayes (VB) • Approximation for
posterior • Minimize Kullback-Liebler
(KL) divergence:
p(✓|x)
KL(qkp(·|x))
• VB practical success • point estimates and prediction • fast, streaming, distributed
q⇤(✓)
[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]
Variational Bayes
p(✓|x)
q⇤(✓)
3
• Variational Bayes
!
• Mean-field variational Bayes (MFVB)
!
• Underestimates variance (sometimes severely)
• No covariance estimates
What about uncertainty?
[Bishop 2006]
q(✓) =JY
j=1
q(✓j)
KL(q||p(·|x)) =Z
✓q(✓) log
q(✓)
p(✓|x)d✓
✓1
✓2
4
• Variational Bayes
!
• Mean-field variational Bayes (MFVB)
!
• Underestimates variance (sometimes severely)
• No covariance estimates
What about uncertainty?
q(✓) =JY
j=1
q(✓j)
4
• Variational Bayes
!
• Mean-field variational Bayes (MFVB)
!
• Underestimates variance (sometimes severely)
• No covariance estimates
What about uncertainty?
[Bishop 2006]
q(✓) =JY
j=1
q(✓j)
✓1
✓2p(✓|x)
4
• Variational Bayes
!
• Mean-field variational Bayes (MFVB)
!
• Underestimates variance (sometimes severely)
• No covariance estimates
What about uncertainty?
[Bishop 2006]
q(✓) =JY
j=1
q(✓j)
KL(q||p(·|x)) =Z
✓q(✓) log
q(✓)
p(✓|x)d✓
✓1
✓2p(✓|x)
4
• Variational Bayes
!
• Mean-field variational Bayes (MFVB)
!
• Underestimates variance (sometimes severely)
• No covariance estimates
What about uncertainty?
[Bishop 2006]
q(✓) =JY
j=1
q(✓j)
KL(q||p(·|x)) =Z
✓q(✓) log
q(✓)
p(✓|x)d✓
✓1
✓2p(✓|x)
4
• Variational Bayes
!
• Mean-field variational Bayes (MFVB)
!
• Underestimates variance (sometimes severely)
• No covariance estimates
What about uncertainty?
[Bishop 2006]
q(✓) =JY
j=1
q(✓j)
KL(q||p(·|x)) =Z
✓q(✓) log
q(✓)
p(✓|x)d✓
✓1
✓2p(✓|x)
q⇤(✓)
4
• Variational Bayes
!
• Mean-field variational Bayes (MFVB)
!
• Underestimates variance (sometimes severely)
• No covariance estimates
What about uncertainty?
[Bishop 2006]
q(✓) =JY
j=1
q(✓j)
KL(q||p(·|x)) =Z
✓q(✓) log
q(✓)
p(✓|x)d✓
✓1
✓2p(✓|x)
q⇤(✓)
4
• Variational Bayes
!
• Mean-field variational Bayes (MFVB)
!
• Underestimates variance (sometimes severely)
• No covariance estimates
What about uncertainty?
[Bishop 2006]
q(✓) =JY
j=1
q(✓j)
KL(q||p(·|x)) =Z
✓q(✓) log
q(✓)
p(✓|x)d✓
✓1
✓2p(✓|x)
q⇤(✓)
4
• Variational Bayes
!
• Mean-field variational Bayes (MFVB)
!
• Underestimates variance (sometimes severely)
• No covariance estimates
What about uncertainty?
q(✓) =JY
j=1
q(✓j)
KL(q||p(·|x)) =Z
✓q(✓) log
q(✓)
p(✓|x)d✓
✓1
✓2
[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]
p(✓|x)
q⇤(✓)
4
• Variational Bayes
!
• Mean-field variational Bayes (MFVB)
!
• Underestimates variance (sometimes severely)
• No covariance estimates
What about uncertainty?
q(✓) =JY
j=1
q(✓j)
KL(q||p(·|x)) =Z
✓q(✓) log
q(✓)
p(✓|x)d✓
✓1
✓2
[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]
p(✓|x)
q⇤(✓)
[Dunson 2014; Bardenet, Doucet, Holmes 2015]4
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
p(✓|x)
[Bishop 2006]5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
⌃ =d
dtT
d
dtC
p(·|x)(t)
�����t=05
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
[Bishop 2006]
p(✓|x)
q⇤(✓)
5
• Cumulant-generating function
!
• True posterior covariance vs MFVB covariance
!
• “Linear response” !
• The LRVB approximation
V :=d2
dtT dtCq⇤(t)
����t=0
Linear response
mean =d
dtC(t)
����t=0
⌃ :=d2
dtT dtC
p(·|x)(t)
����t=0
log pt(✓) := log p(✓|x) + t
T✓ � C(t), MFVB q⇤t
⌃ =d
dtTEpt✓
����t=0
⇡ d
dtTEq⇤t ✓
����t=0
=: ⌃
[Bishop 2006]
C(t) := logEetT ✓
p(✓|x)
q⇤(✓)
5
• LRVB covariance estimate
• Suppose exponential family with mean parametrization
LRVB estimator⌃ :=
d
dtTEq⇤t ✓
����t=0
qt mt
6
• LRVB covariance estimate
• Suppose exponential family with mean parametrization
LRVB estimator⌃ :=
d
dtTEq⇤t ✓
����t=0
qt mt
6
• LRVB covariance estimate
• Suppose exponential family with mean parametrization
LRVB estimator⌃ :=
d
dtTEq⇤t ✓
����t=0
qt mt
= (I � V H)�1V
6
• LRVB covariance estimate
• Suppose exponential family with mean parametrization
LRVB estimator
⌃ =
✓@2KL
@m@mT
����m=m⇤
◆�1
⌃ :=d
dtTEq⇤t ✓
����t=0
qt mt
= (I � V H)�1V
6
• LRVB covariance estimate
• Suppose exponential family with mean parametrization
LRVB estimator
⌃ =
✓@2KL
@m@mT
����m=m⇤
◆�1
⌃ :=d
dtTEq⇤t ✓
����t=0
qt mt
= (I � V H)�1V
6
• LRVB covariance estimate
• Suppose exponential family with mean parametrization
LRVB estimator
⌃ =
✓@2KL
@m@mT
����m=m⇤
◆�1
⌃ :=d
dtTEq⇤t ✓
����t=0
qt mt
= (I � V H)�1V
6
• LRVB covariance estimate
• Suppose exponential family with mean parametrization
• Symmetric and positive definite at local min of KL
• The LRVB assumption:
LRVB estimator
⌃ =
✓@2KL
@m@mT
����m=m⇤
◆�1
⌃ :=d
dtTEq⇤t ✓
����t=0
qt mt
= (I � V H)�1V
6
• LRVB covariance estimate
• Suppose exponential family with mean parametrization
• Symmetric and positive definite at local min of KL
• The LRVB assumption: Ept✓ ⇡ Eq⇤t ✓
LRVB estimator
⌃ =
✓@2KL
@m@mT
����m=m⇤
◆�1
⌃ :=d
dtTEq⇤t ✓
����t=0
qt mt
= (I � V H)�1V
6
• LRVB covariance estimate
• Suppose exponential family with mean parametrization
• Symmetric and positive definite at local min of KL
• The LRVB assumption: Ept✓ ⇡ Eq⇤t ✓p(✓|x)
q⇤(✓)
[Bishop 2006]
LRVB estimator
⌃ =
✓@2KL
@m@mT
����m=m⇤
◆�1
⌃ :=d
dtTEq⇤t ✓
����t=0
qt mt
= (I � V H)�1V
6
• LRVB covariance estimate
• Suppose exponential family with mean parametrization
• Symmetric and positive definite at local min of KL
• The LRVB assumption: Ept✓ ⇡ Eq⇤t ✓p(✓|x)
q⇤(✓)
• LRVB estimate is exact when MFVB gives exact mean (e.g. multivariate normal)
[Bishop 2006]
LRVB estimator
⌃ =
✓@2KL
@m@mT
����m=m⇤
◆�1
⌃ :=d
dtTEq⇤t ✓
����t=0
qt mt
= (I � V H)�1V
6
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
profit
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
profit
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
profit
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
profit
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
profit1 if microcredit
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
profit1 if microcredit
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
profit1 if microcredit
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
profit1 if microcredit
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
profit1 if microcredit
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
✓µk
⌧k
◆iid⇠ N
✓✓µ⌧
◆, C
◆
profit1 if microcredit
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
✓µk
⌧k
◆iid⇠ N
✓✓µ⌧
◆, C
◆
��2k
iid⇠ �(a, b)
profit1 if microcredit
7
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
✓µk
⌧k
◆iid⇠ N
✓✓µ⌧
◆, C
◆ ✓µ⌧
◆iid⇠ N
✓✓µ0
⌧0
◆,⇤�1
◆
��2k
iid⇠ �(a, b)
profit1 if microcredit
7C ⇠ Sep&LKJ(⌘, c, d)
Microcredit Experiment
8
Microcredit Experiment
MFV
B
8
Microcredit Experiment
MFV
B
8
Microcredit Experiment• One set of 2500
MCMC draws: 45 minutes
• All of MFVB optimization, LRVB uncertainties, all sensitivity measures: 58 seconds!
• Many other models and data sets: Mixture models, generalized linear mixed models, etc
MFV
B
8
Microcredit Experiment• One set of 2500
MCMC draws: 45 minutes
• All of MFVB optimization, LRVB uncertainties, all sensitivity measures: 58 seconds!
• Many other models and data sets: Mixture models, generalized linear mixed models, etc
MFV
B
8
Microcredit Experiment• One set of 2500
MCMC draws: 45 minutes
• All of MFVB optimization, LRVB uncertainties, all sensitivity measures: 58 seconds!
• Many other models and data sets: Mixture models, generalized linear mixed models, etc
MFV
B
LRVB,!MFVB
8
Microcredit Experiment• One set of 2500
MCMC draws: 45 minutes
• All of MFVB optimization, LRVB uncertainties, all sensitivity measures: 58 seconds!
• Many other models and data sets: Mixture models, generalized linear mixed models, etc
MFV
B
LRVB,!MFVB
8
Robustness quantification
• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB
9
Robustness quantification
• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB
9
Robustness quantification• Bayes Theorem
p(✓|x)/✓ p(x|✓)p(✓)
10
Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)
/✓ p(x|✓)p(✓|↵)
10
Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)
/✓ p(x|✓)p(✓|↵)
10
Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)
/✓ p(x|✓)p(✓|↵)
• Sensitivity
10
Bayes Theorem
Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)
/✓ p(x|✓)p(✓|↵)
• Sensitivity
10
Bayes Theorem
Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)
/✓ p(x|✓)p(✓|↵)
• Sensitivity
S :=dEp↵ [g(✓)]
d↵
����↵
�↵
10
Bayes Theorem
Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)
/✓ p(x|✓)p(✓|↵)
• Sensitivity
S :=dEp↵ [g(✓)]
d↵
����↵
�↵
10
Bayes Theorem
Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)
/✓ p(x|✓)p(✓|↵)
• Sensitivity
S :=dEp↵ [g(✓)]
d↵
����↵
�↵
10
Bayes Theorem
Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)
/✓ p(x|✓)p(✓|↵)
• Sensitivity
S :=dEp↵ [g(✓)]
d↵
����↵
�↵
10
Bayes Theorem
Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)
/✓ p(x|✓)p(✓|↵)
• Sensitivity
S :=dEp↵ [g(✓)]
d↵
����↵
�↵
⇡dEq⇤↵ [g(✓)]
d↵
����↵
�↵ =: S
10
Bayes Theorem
Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)
/✓ p(x|✓)p(✓|↵)
• Sensitivity
S :=dEp↵ [g(✓)]
d↵
����↵
�↵
⇡dEq⇤↵ [g(✓)]
d↵
����↵
�↵ =: S LRVB estimator
10
Bayes Theorem
Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)
/✓ p(x|✓)p(✓|↵)
• Sensitivity
S :=dEp↵ [g(✓)]
d↵
����↵
�↵
⇡dEq⇤↵ [g(✓)]
d↵
����↵
�↵ =: S LRVB estimator
• When in exponential familyq⇤↵
10
Bayes Theorem
Robustness quantification• Bayes Theorem
S = A
✓@2KL
@m@mT
����m=m⇤
◆�1
B
p↵(✓) := p(✓|x,↵)/✓ p(x|✓)p(✓|↵)
• Sensitivity
S :=dEp↵ [g(✓)]
d↵
����↵
�↵
⇡dEq⇤↵ [g(✓)]
d↵
����↵
�↵ =: S LRVB estimator
• When in exponential familyq⇤↵
10
C ⇠ Sep&LKJ(⌘, c, d)
Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,
Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:
!
!• Priors and hyperpriors:
yknindep⇠ N (µk + Tkn⌧k,�
2k)
✓µk
⌧k
◆iid⇠ N
✓✓µ⌧
◆, C
◆ ✓µ⌧
◆iid⇠ N
✓✓µ0
⌧0
◆,⇤�1
◆
��2k
iid⇠ �(a, b)
profit1 if microcredit
11
Microcredit Experiment
12
Microcredit Experiment
MFV
B
12
Microcredit Experiment
• Perturb Λ11: 0.03 ➔ 0.04
MFV
B
12
Microcredit Experiment
• Perturb Λ11: 0.03 ➔ 0.04
Sensitivity
MFV
BLR
VB
12
Microcredit Experiment
• Perturb Λ11: 0.03 ➔ 0.04
Sensitivity
MFV
BLR
VB
12
Microcredit Experiment
13
• Sensitivity of the expected microcredit effect (τ)
• Normalized to be on scale of standard deviations in τ
• E.g.
Microcredit Experiment
13
• Sensitivity of the expected microcredit effect (τ)
• Normalized to be on scale of standard deviations in τ
• E.g.
Microcredit Experiment
13
• Sensitivity of the expected microcredit effect (τ)
• Normalized to be on scale of standard deviations in τ
• E.g.
Microcredit Experiment
13
• Sensitivity of the expected microcredit effect (τ)
• Normalized to be on scale of standard deviations in τ
• E.g.
Microcredit Experiment
13
• Sensitivity of the expected microcredit effect (τ)
• Normalized to be on scale of standard deviations in τ
• E.g.
Microcredit Experiment
13
• Sensitivity of the expected microcredit effect (τ)
• Normalized to be on scale of standard deviations in τ
• E.g.
Microcredit Experiment
StdDevq⌧ = 1.8
13
• Sensitivity of the expected microcredit effect (τ)
• Normalized to be on scale of standard deviations in τ
• E.g.
Microcredit Experiment
StdDevq⌧ = 1.8Eq⌧ = 3.7
13
• Sensitivity of the expected microcredit effect (τ)
• Normalized to be on scale of standard deviations in τ
• E.g.
Microcredit Experiment
StdDevq⌧ = 1.8Eq⌧ = 3.7= 2.06 ⇤ StdDevq⌧
13
• Sensitivity of the expected microcredit effect (τ)
• Normalized to be on scale of standard deviations in τ
• E.g.
Microcredit Experiment
StdDevq⌧ = 1.8Eq⌧ = 3.7= 2.06 ⇤ StdDevq⌧
⇤12 + = 0.03
13
• Sensitivity of the expected microcredit effect (τ)
• Normalized to be on scale of standard deviations in τ
• E.g.
Microcredit Experiment
StdDevq⌧ = 1.8Eq⌧ = 3.7= 2.06 ⇤ StdDevq⌧
⇤12 + = 0.03
13Eq⌧ < 1.0 ⇤ StdDevq⌧
Conclusion• We provide linear response variational Bayes:
supplements MFVB for fast & accurate covariance estimate
• More from LRVB: fast & accurate robustness quantification
• Interested in your data and models: • Sensitivity to prior perturbations • Sensitivity to data perturbations
14
T. Broderick, N. Boyd, A. Wibisono, A. C. Wilson, and M. I. Jordan. Streaming variational Bayes. NIPS, 2013. !R Giordano, T Broderick, and MI Jordan. Linear response methods for accurate covariance estimates from mean field variational Bayes. NIPS, 2015.!!R Giordano, T Broderick, and MI Jordan. Robust Inference with Variational Bayes. NIPS AABI Workshop, 2015. ArXiv:1512.02578.!! https://github.com/rgiordan/MicrocreditLRVB! !J. Huggins, T. Campbell, and T. Broderick. Core sets for scalable Bayesian logistic regression. Under review. ArXiv:1605.06423.
References
15
References
R Bardenet, A Doucet, and C Holmes. On Markov chain Monte Carlo methods for tall data. arXiv, 2015.
CM Bishop. Pattern Recognition and Machine Learning.
D Dunson. Robust and scalable approach to Bayesian inference. Talk at ISBA 2014.
DJC MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.
R Meager. Understanding the impact of microcredit expansions: A Bayesian hierarchical analysis of 7 randomised experiments. ArXiv:1506.06669, 2015.
RE Turner and M Sahani. Two problems with variational expectation maximisation for time-series models. In D Barber, AT Cemgil, and S Chiappa, editors, Bayesian Time Series Models, 2011.
B Wang and M Titterington. Inadequacy of interval estimates corresponding to variational Bayesian approximations. In AISTATS, 2004.
16