Fast Robustness Quantification with Variational...

Post on 15-May-2020

20 views 0 download

Transcript of Fast Robustness Quantification with Variational...

Fast Robustness Quantification with Variational Bayes

ITT Career Development Assistant Professor,

MIT

Tamara Broderick

With: Ryan Giordano, Rachael Meager, Jonathan Huggins, Michael I. Jordan

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

p(✓|x) /✓ p(x|✓)p(✓)

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

p(✓|x) /✓ p(x|✓)p(✓)

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

p(✓|x) /✓ p(x|✓)p(✓)

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

p(✓|x) /✓ p(x|✓)p(✓)

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

p(✓|x) /✓ p(x|✓)p(✓)

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

p(✓|x) /✓ p(x|✓)p(✓)

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

p(✓|x) /✓ p(x|✓)p(✓)

Bayes Theorem

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

p(✓|x) /✓ p(x|✓)p(✓)

Bayes Theorem

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

p(✓|x) /✓ p(x|✓)p(✓)

Bayes Theorem

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

1

p(✓|x) /✓ p(x|✓)p(✓)

Bayes Theorem

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

Robustness quantification

1

p(✓|x) /✓ p(x|✓)p(✓)

Bayes Theorem

• Robustness • Global & local • Rarely used • Approximation,

MCMC • Our solution: linear

response variational Bayes

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

Robustness quantification

1

p(✓|x) /✓ p(x|✓)p(✓)

Bayes Theorem

• Robustness • Global & local • Rarely used • Approximation,

MCMC • Our solution: linear

response variational Bayes

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

Robustness quantification

1

p(✓|x) /✓ p(x|✓)p(✓)

Bayes Theorem

• Robustness • Global & local • Rarely used • Approximation,

MCMC • Our solution: linear

response variational Bayes

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

Robustness quantification

1

p(✓|x) /✓ p(x|✓)p(✓)

Bayes Theorem

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

Robustness quantification

1

p(✓|x) /✓ p(x|✓)p(✓)

• Robustness • Global & local • Rarely used • Approximation

MCMC • Our solution: linear

response variational Bayes

Bayes Theorem

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

Robustness quantification

1

p(✓|x) /✓ p(x|✓)p(✓)

• Robustness • Global & local • Rarely used • Approximation,

MCMC • Our solution: linear

response variational Bayes

Bayes Theorem

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

Robustness quantification

1

p(✓|x) /✓ p(x|✓)p(✓)

• Robustness • Global & local • Rarely used • Approximation,

MCMC • Our solution: linear

response variational Bayes

Bayes Theorem

• Bayesian inference • Complex, modular models; posterior distribution

• Have to express prior beliefs in a distribution: challenges • Time-consuming; subjective; complex models

Robustness quantification

1

p(✓|x) /✓ p(x|✓)p(✓)

• Robustness • Global & local • Rarely used • Approximation,

MCMC • Our solution: linear

response variational Bayes

Bayes Theorem

• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB

2

Robustness quantification

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast

p(✓|x)q(✓)

q⇤(✓)

Variational Bayes

3

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast

p(✓|x)q(✓)

q⇤(✓)

Variational Bayes

3

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast

p(✓|x)q(✓)

q⇤(✓)

Variational Bayes

3

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast

q⇤(✓)

Variational Bayes

3

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast

q⇤(✓)

Variational Bayes

q(✓)

3

q(✓)

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast

q⇤(✓)

Variational Bayes

p(✓|x)

3

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast

q⇤(✓)

p(✓|x)

q⇤(✓)

Variational Bayes

3

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast

q⇤(✓)

Variational Bayes

p(✓|x)

q⇤(✓)

3

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast

q⇤(✓)

Variational Bayes

p(✓|x)

q⇤(✓)

3

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast

q⇤(✓)

Variational Bayes

p(✓|x)

q⇤(✓)

3

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast

q⇤(✓)

Variational Bayes

p(✓|x)

q⇤(✓)

3

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast

q⇤(✓)

[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

Variational Bayes

p(✓|x)

q⇤(✓)

3

• Variational Bayes (VB) • Approximation for

posterior • Minimize Kullback-Liebler

(KL) divergence:

p(✓|x)

KL(qkp(·|x))

• VB practical success • point estimates and prediction • fast, streaming, distributed

q⇤(✓)

[Broderick, Boyd, Wibisono, Wilson, Jordan 2013]

Variational Bayes

p(✓|x)

q⇤(✓)

3

• Variational Bayes

!

• Mean-field variational Bayes (MFVB)

!

• Underestimates variance (sometimes severely)

• No covariance estimates

What about uncertainty?

[Bishop 2006]

q(✓) =JY

j=1

q(✓j)

KL(q||p(·|x)) =Z

✓q(✓) log

q(✓)

p(✓|x)d✓

✓1

✓2

4

• Variational Bayes

!

• Mean-field variational Bayes (MFVB)

!

• Underestimates variance (sometimes severely)

• No covariance estimates

What about uncertainty?

q(✓) =JY

j=1

q(✓j)

4

• Variational Bayes

!

• Mean-field variational Bayes (MFVB)

!

• Underestimates variance (sometimes severely)

• No covariance estimates

What about uncertainty?

[Bishop 2006]

q(✓) =JY

j=1

q(✓j)

✓1

✓2p(✓|x)

4

• Variational Bayes

!

• Mean-field variational Bayes (MFVB)

!

• Underestimates variance (sometimes severely)

• No covariance estimates

What about uncertainty?

[Bishop 2006]

q(✓) =JY

j=1

q(✓j)

KL(q||p(·|x)) =Z

✓q(✓) log

q(✓)

p(✓|x)d✓

✓1

✓2p(✓|x)

4

• Variational Bayes

!

• Mean-field variational Bayes (MFVB)

!

• Underestimates variance (sometimes severely)

• No covariance estimates

What about uncertainty?

[Bishop 2006]

q(✓) =JY

j=1

q(✓j)

KL(q||p(·|x)) =Z

✓q(✓) log

q(✓)

p(✓|x)d✓

✓1

✓2p(✓|x)

4

• Variational Bayes

!

• Mean-field variational Bayes (MFVB)

!

• Underestimates variance (sometimes severely)

• No covariance estimates

What about uncertainty?

[Bishop 2006]

q(✓) =JY

j=1

q(✓j)

KL(q||p(·|x)) =Z

✓q(✓) log

q(✓)

p(✓|x)d✓

✓1

✓2p(✓|x)

q⇤(✓)

4

• Variational Bayes

!

• Mean-field variational Bayes (MFVB)

!

• Underestimates variance (sometimes severely)

• No covariance estimates

What about uncertainty?

[Bishop 2006]

q(✓) =JY

j=1

q(✓j)

KL(q||p(·|x)) =Z

✓q(✓) log

q(✓)

p(✓|x)d✓

✓1

✓2p(✓|x)

q⇤(✓)

4

• Variational Bayes

!

• Mean-field variational Bayes (MFVB)

!

• Underestimates variance (sometimes severely)

• No covariance estimates

What about uncertainty?

[Bishop 2006]

q(✓) =JY

j=1

q(✓j)

KL(q||p(·|x)) =Z

✓q(✓) log

q(✓)

p(✓|x)d✓

✓1

✓2p(✓|x)

q⇤(✓)

4

• Variational Bayes

!

• Mean-field variational Bayes (MFVB)

!

• Underestimates variance (sometimes severely)

• No covariance estimates

What about uncertainty?

q(✓) =JY

j=1

q(✓j)

KL(q||p(·|x)) =Z

✓q(✓) log

q(✓)

p(✓|x)d✓

✓1

✓2

[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]

p(✓|x)

q⇤(✓)

4

• Variational Bayes

!

• Mean-field variational Bayes (MFVB)

!

• Underestimates variance (sometimes severely)

• No covariance estimates

What about uncertainty?

q(✓) =JY

j=1

q(✓j)

KL(q||p(·|x)) =Z

✓q(✓) log

q(✓)

p(✓|x)d✓

✓1

✓2

[MacKay 2003; Bishop 2006; Wang, Titterington 2004; Turner, Sahani 2011]

p(✓|x)

q⇤(✓)

[Dunson 2014; Bardenet, Doucet, Holmes 2015]4

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

p(✓|x)

[Bishop 2006]5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

⌃ =d

dtT

d

dtC

p(·|x)(t)

�����t=05

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

[Bishop 2006]

p(✓|x)

q⇤(✓)

5

• Cumulant-generating function

!

• True posterior covariance vs MFVB covariance

!

• “Linear response” !

• The LRVB approximation

V :=d2

dtT dtCq⇤(t)

����t=0

Linear response

mean =d

dtC(t)

����t=0

⌃ :=d2

dtT dtC

p(·|x)(t)

����t=0

log pt(✓) := log p(✓|x) + t

T✓ � C(t), MFVB q⇤t

⌃ =d

dtTEpt✓

����t=0

⇡ d

dtTEq⇤t ✓

����t=0

=: ⌃

[Bishop 2006]

C(t) := logEetT ✓

p(✓|x)

q⇤(✓)

5

• LRVB covariance estimate

• Suppose exponential family with mean parametrization

LRVB estimator⌃ :=

d

dtTEq⇤t ✓

����t=0

qt mt

6

• LRVB covariance estimate

• Suppose exponential family with mean parametrization

LRVB estimator⌃ :=

d

dtTEq⇤t ✓

����t=0

qt mt

6

• LRVB covariance estimate

• Suppose exponential family with mean parametrization

LRVB estimator⌃ :=

d

dtTEq⇤t ✓

����t=0

qt mt

= (I � V H)�1V

6

• LRVB covariance estimate

• Suppose exponential family with mean parametrization

LRVB estimator

⌃ =

✓@2KL

@m@mT

����m=m⇤

◆�1

⌃ :=d

dtTEq⇤t ✓

����t=0

qt mt

= (I � V H)�1V

6

• LRVB covariance estimate

• Suppose exponential family with mean parametrization

LRVB estimator

⌃ =

✓@2KL

@m@mT

����m=m⇤

◆�1

⌃ :=d

dtTEq⇤t ✓

����t=0

qt mt

= (I � V H)�1V

6

• LRVB covariance estimate

• Suppose exponential family with mean parametrization

LRVB estimator

⌃ =

✓@2KL

@m@mT

����m=m⇤

◆�1

⌃ :=d

dtTEq⇤t ✓

����t=0

qt mt

= (I � V H)�1V

6

• LRVB covariance estimate

• Suppose exponential family with mean parametrization

• Symmetric and positive definite at local min of KL

• The LRVB assumption:

LRVB estimator

⌃ =

✓@2KL

@m@mT

����m=m⇤

◆�1

⌃ :=d

dtTEq⇤t ✓

����t=0

qt mt

= (I � V H)�1V

6

• LRVB covariance estimate

• Suppose exponential family with mean parametrization

• Symmetric and positive definite at local min of KL

• The LRVB assumption: Ept✓ ⇡ Eq⇤t ✓

LRVB estimator

⌃ =

✓@2KL

@m@mT

����m=m⇤

◆�1

⌃ :=d

dtTEq⇤t ✓

����t=0

qt mt

= (I � V H)�1V

6

• LRVB covariance estimate

• Suppose exponential family with mean parametrization

• Symmetric and positive definite at local min of KL

• The LRVB assumption: Ept✓ ⇡ Eq⇤t ✓p(✓|x)

q⇤(✓)

[Bishop 2006]

LRVB estimator

⌃ =

✓@2KL

@m@mT

����m=m⇤

◆�1

⌃ :=d

dtTEq⇤t ✓

����t=0

qt mt

= (I � V H)�1V

6

• LRVB covariance estimate

• Suppose exponential family with mean parametrization

• Symmetric and positive definite at local min of KL

• The LRVB assumption: Ept✓ ⇡ Eq⇤t ✓p(✓|x)

q⇤(✓)

• LRVB estimate is exact when MFVB gives exact mean (e.g. multivariate normal)

[Bishop 2006]

LRVB estimator

⌃ =

✓@2KL

@m@mT

����m=m⇤

◆�1

⌃ :=d

dtTEq⇤t ✓

����t=0

qt mt

= (I � V H)�1V

6

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

profit

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

profit

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

profit

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

profit

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

profit1 if microcredit

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

profit1 if microcredit

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

profit1 if microcredit

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

profit1 if microcredit

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

profit1 if microcredit

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

✓µk

⌧k

◆iid⇠ N

✓✓µ⌧

◆, C

profit1 if microcredit

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

✓µk

⌧k

◆iid⇠ N

✓✓µ⌧

◆, C

��2k

iid⇠ �(a, b)

profit1 if microcredit

7

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

✓µk

⌧k

◆iid⇠ N

✓✓µ⌧

◆, C

◆ ✓µ⌧

◆iid⇠ N

✓✓µ0

⌧0

◆,⇤�1

��2k

iid⇠ �(a, b)

profit1 if microcredit

7C ⇠ Sep&LKJ(⌘, c, d)

Microcredit Experiment

8

Microcredit Experiment

MFV

B

8

Microcredit Experiment

MFV

B

8

Microcredit Experiment• One set of 2500

MCMC draws: 45 minutes

• All of MFVB optimization, LRVB uncertainties, all sensitivity measures: 58 seconds!

• Many other models and data sets: Mixture models, generalized linear mixed models, etc

MFV

B

8

Microcredit Experiment• One set of 2500

MCMC draws: 45 minutes

• All of MFVB optimization, LRVB uncertainties, all sensitivity measures: 58 seconds!

• Many other models and data sets: Mixture models, generalized linear mixed models, etc

MFV

B

8

Microcredit Experiment• One set of 2500

MCMC draws: 45 minutes

• All of MFVB optimization, LRVB uncertainties, all sensitivity measures: 58 seconds!

• Many other models and data sets: Mixture models, generalized linear mixed models, etc

MFV

B

LRVB,!MFVB

8

Microcredit Experiment• One set of 2500

MCMC draws: 45 minutes

• All of MFVB optimization, LRVB uncertainties, all sensitivity measures: 58 seconds!

• Many other models and data sets: Mixture models, generalized linear mixed models, etc

MFV

B

LRVB,!MFVB

8

Robustness quantification

• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB

9

Robustness quantification

• Variational Bayes as an alternative to MCMC • Challenges of VB • Accurate uncertainties from VB • Accurate robustness quantification from VB • Big idea: derivatives/perturbations are easy in VB

9

Robustness quantification• Bayes Theorem

p(✓|x)/✓ p(x|✓)p(✓)

10

Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)

/✓ p(x|✓)p(✓|↵)

10

Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)

/✓ p(x|✓)p(✓|↵)

10

Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)

/✓ p(x|✓)p(✓|↵)

• Sensitivity

10

Bayes Theorem

Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)

/✓ p(x|✓)p(✓|↵)

• Sensitivity

10

Bayes Theorem

Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)

/✓ p(x|✓)p(✓|↵)

• Sensitivity

S :=dEp↵ [g(✓)]

d↵

����↵

�↵

10

Bayes Theorem

Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)

/✓ p(x|✓)p(✓|↵)

• Sensitivity

S :=dEp↵ [g(✓)]

d↵

����↵

�↵

10

Bayes Theorem

Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)

/✓ p(x|✓)p(✓|↵)

• Sensitivity

S :=dEp↵ [g(✓)]

d↵

����↵

�↵

10

Bayes Theorem

Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)

/✓ p(x|✓)p(✓|↵)

• Sensitivity

S :=dEp↵ [g(✓)]

d↵

����↵

�↵

10

Bayes Theorem

Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)

/✓ p(x|✓)p(✓|↵)

• Sensitivity

S :=dEp↵ [g(✓)]

d↵

����↵

�↵

⇡dEq⇤↵ [g(✓)]

d↵

����↵

�↵ =: S

10

Bayes Theorem

Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)

/✓ p(x|✓)p(✓|↵)

• Sensitivity

S :=dEp↵ [g(✓)]

d↵

����↵

�↵

⇡dEq⇤↵ [g(✓)]

d↵

����↵

�↵ =: S LRVB estimator

10

Bayes Theorem

Robustness quantification• Bayes Theoremp↵(✓) := p(✓|x,↵)

/✓ p(x|✓)p(✓|↵)

• Sensitivity

S :=dEp↵ [g(✓)]

d↵

����↵

�↵

⇡dEq⇤↵ [g(✓)]

d↵

����↵

�↵ =: S LRVB estimator

• When in exponential familyq⇤↵

10

Bayes Theorem

Robustness quantification• Bayes Theorem

S = A

✓@2KL

@m@mT

����m=m⇤

◆�1

B

p↵(✓) := p(✓|x,↵)/✓ p(x|✓)p(✓|↵)

• Sensitivity

S :=dEp↵ [g(✓)]

d↵

����↵

�↵

⇡dEq⇤↵ [g(✓)]

d↵

����↵

�↵ =: S LRVB estimator

• When in exponential familyq⇤↵

10

C ⇠ Sep&LKJ(⌘, c, d)

Microcredit Experiment• Simplified from Meager (2015) • K microcredit trials (Mexico, Mongolia, Bosnia, India,

Morocco, Philippines, Ethiopia) • Nk businesses in kth site (~900 to ~17K) • Profit of nth business at kth site:

!

!• Priors and hyperpriors:

yknindep⇠ N (µk + Tkn⌧k,�

2k)

✓µk

⌧k

◆iid⇠ N

✓✓µ⌧

◆, C

◆ ✓µ⌧

◆iid⇠ N

✓✓µ0

⌧0

◆,⇤�1

��2k

iid⇠ �(a, b)

profit1 if microcredit

11

Microcredit Experiment

12

Microcredit Experiment

MFV

B

12

Microcredit Experiment

• Perturb Λ11: 0.03 ➔ 0.04

MFV

B

12

Microcredit Experiment

• Perturb Λ11: 0.03 ➔ 0.04

Sensitivity

MFV

BLR

VB

12

Microcredit Experiment

• Perturb Λ11: 0.03 ➔ 0.04

Sensitivity

MFV

BLR

VB

12

Microcredit Experiment

13

• Sensitivity of the expected microcredit effect (τ)

• Normalized to be on scale of standard deviations in τ

• E.g.

Microcredit Experiment

13

• Sensitivity of the expected microcredit effect (τ)

• Normalized to be on scale of standard deviations in τ

• E.g.

Microcredit Experiment

13

• Sensitivity of the expected microcredit effect (τ)

• Normalized to be on scale of standard deviations in τ

• E.g.

Microcredit Experiment

13

• Sensitivity of the expected microcredit effect (τ)

• Normalized to be on scale of standard deviations in τ

• E.g.

Microcredit Experiment

13

• Sensitivity of the expected microcredit effect (τ)

• Normalized to be on scale of standard deviations in τ

• E.g.

Microcredit Experiment

13

• Sensitivity of the expected microcredit effect (τ)

• Normalized to be on scale of standard deviations in τ

• E.g.

Microcredit Experiment

StdDevq⌧ = 1.8

13

• Sensitivity of the expected microcredit effect (τ)

• Normalized to be on scale of standard deviations in τ

• E.g.

Microcredit Experiment

StdDevq⌧ = 1.8Eq⌧ = 3.7

13

• Sensitivity of the expected microcredit effect (τ)

• Normalized to be on scale of standard deviations in τ

• E.g.

Microcredit Experiment

StdDevq⌧ = 1.8Eq⌧ = 3.7= 2.06 ⇤ StdDevq⌧

13

• Sensitivity of the expected microcredit effect (τ)

• Normalized to be on scale of standard deviations in τ

• E.g.

Microcredit Experiment

StdDevq⌧ = 1.8Eq⌧ = 3.7= 2.06 ⇤ StdDevq⌧

⇤12 + = 0.03

13

• Sensitivity of the expected microcredit effect (τ)

• Normalized to be on scale of standard deviations in τ

• E.g.

Microcredit Experiment

StdDevq⌧ = 1.8Eq⌧ = 3.7= 2.06 ⇤ StdDevq⌧

⇤12 + = 0.03

13Eq⌧ < 1.0 ⇤ StdDevq⌧

Conclusion• We provide linear response variational Bayes:

supplements MFVB for fast & accurate covariance estimate

• More from LRVB: fast & accurate robustness quantification

• Interested in your data and models: • Sensitivity to prior perturbations • Sensitivity to data perturbations

14

T. Broderick, N. Boyd, A. Wibisono, A. C. Wilson, and M. I. Jordan. Streaming variational Bayes. NIPS, 2013. !R Giordano, T Broderick, and MI Jordan. Linear response methods for accurate covariance estimates from mean field variational Bayes. NIPS, 2015.!!R Giordano, T Broderick, and MI Jordan. Robust Inference with Variational Bayes. NIPS AABI Workshop, 2015. ArXiv:1512.02578.!! https://github.com/rgiordan/MicrocreditLRVB! !J. Huggins, T. Campbell, and T. Broderick. Core sets for scalable Bayesian logistic regression. Under review. ArXiv:1605.06423.

References

15

References

R Bardenet, A Doucet, and C Holmes. On Markov chain Monte Carlo methods for tall data. arXiv, 2015.

CM Bishop. Pattern Recognition and Machine Learning.

D Dunson. Robust and scalable approach to Bayesian inference. Talk at ISBA 2014.

DJC MacKay. Information Theory, Inference, and Learning Algorithms. Cambridge University Press, 2003.

R Meager. Understanding the impact of microcredit expansions: A Bayesian hierarchical analysis of 7 randomised experiments. ArXiv:1506.06669, 2015.

RE Turner and M Sahani. Two problems with variational expectation maximisation for time-series models. In D Barber, AT Cemgil, and S Chiappa, editors, Bayesian Time Series Models, 2011.

B Wang and M Titterington. Inadequacy of interval estimates corresponding to variational Bayesian approximations. In AISTATS, 2004.

16