d-VMP: Distributed Variational Message Passingalessandro/pgm/Martinez.pdf · d-VMP: Distributed...

d-VMP:

Distributed Variational Message Passing

Andrés R. Masegosa1, Ana M. Martínez2,Helge Langseth1, Thomas D. Nielsen2, Antonio Salmerón3,

Darío Ramos-López3, Anders L. Madsen2,4

1Department of Computer Science, Aalborg University, Denmark2 Department of Computer and Information Science,

The Norwegian University of Science and Technology, Norway3Department of Mathematics, University of Almería, Spain

4 Hugin Expert A/S, Aalborg, Denmark

Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 1

Outline

1 Motivation

2 Variational Message Passing

3 d-VMP

4 Experimental results

5 Conclusions

Outline

1 Motivation

3 d-VMP

5 Conclusions

Motivation

I LargeI ImbalanceI ? valuesI Complex distributions

Motivation

I LargeI ImbalanceI ? valuesI Complex distributions

Attribute range

0 1 2 3 4 5 6

SVI 1%SVI 5%SVI 10%VMP 1%

Motivation

I Goal: learn a generative model for a finantial dataset to monitor thecustomers and make predictions for a single customer.

X i H i

✓↵

i = 1, . . . ,N

Popular existing approach: SVI

IStochastic Variational Inference: iteratively updates the modelparameters based on subsampled data batches.

INo estimation of all local hidden variables of the model.

INo generation of lower bound.

IPoor fit if batch of data is not representative from all data.

Our contribution:

Id-VMP: a distributed message passing scheme.

IDefined for a broader class of models (than SVI).

IBetter and faster convergence results compared to SVI.

IPosterior over all local latent variables and lower bound.

Outline

1 Motivation

3 d-VMP

5 Conclusions

Models:

I Bayesian learning on iid. data using conjugate exponential BN models:

ln p(X ) = ln hX + sX · ⌘ � AX (⌘)

X i H i

✓↵

i = 1, . . . ,N

I We want to calculate p(✓,H |D).

Variational Inference:

I Approximate p(✓,H |D) (often intractable) by finding tractableposterior distributions q 2 Q by minimizing:

minq(✓,H)2Q

KL(q(✓,H)|p(✓,H |D)),

I In the mean field variational approach, Q is assumed to fullyfactorize:

q(✓,H) =MY

q(✓k)NY

q(Hi ,j),

I Approximate p(✓,H |D) (often intractable) by finding tractableposterior distributions q 2 Q by minimizing:

minq(✓,H)2Q

KL(q(✓,H)|p(✓,H |D)),

I In the mean field variational approach, Q is assumed to fullyfactorize:

q(✓,H) =MY

q(✓k)NY

q(Hi ,j),

I Variational Inference exploits:

lnP(D)

constant

= L(q(✓,H))

Maximize

+ KL(q(✓,H)|p(✓,H |D))

Minimize

I Iterative coordinate ascent of the variational distributions.I Updates in the variational distribution of a variable only

involves variables in its Markov blanket.I Coordinate ascent algorithm formulated as a message passing

scheme.

Variational Message Passing, VMP:

IMessage from parent to child: moment parameters(expectation of the sufficient statistics).

IMessage from child to parent: natural parameters(based on the messages received from the co-parents).

Outline

1 Motivation

3 d-VMP

5 Conclusions

Distributed optimization of the lower bound:

Master

✓↵

q(t)(✓)

i = 1, . . . ,N

Slave 1

q(t)(H1)

i = 1, . . . ,N

Slave 2

q(t)(H2)

i = 1, . . . ,N

Slave 3

q(t)(H3)

q(t)(✓) is broadcasted to all the slave nodes.

maxq2Q L(q(✓,H))

Master

✓↵

q(t)(✓)

i = 1, . . . ,N

Slave 1

q(t)(H1)

i = 1, . . . ,N

Slave 2

q(t)(H2)

i = 1, . . . ,N

Slave 3

q(t)(H3)

q(t+1)(H) = arg maxq(H) L(q(H), q(t)(✓))

maxq2Q L(q(✓,H))

Master

✓↵

q(t)(✓)

i = 1, . . . ,N

Slave 1

q(t)(H1)

i = 1, . . . ,N

Slave 2

q(t)(H2)

i = 1, . . . ,N

Slave 3

q(t)(H3)

q(t+1)(Hn) = arg maxq(Hn) Ln(q(Hn), q(t)(✓))

maxq2Q L(q(✓,H))

Master

✓↵

q(t)(✓)

i = 1, . . . ,N

Slave 1

q(t)(H1)

i = 1, . . . ,N

Slave 2

q(t)(H2)

i = 1, . . . ,N

Slave 3

q(t)(H3)

q(t+1)(✓) = arg maxq(✓)

L(q(t)(H), q(✓))

May be coupled

maxq2Q L(q(✓,H))

Candidate solutions:

I Resort to a generalized mean-field approximation as SVI: doesnot factorize over the global parameters.

IProhibitive for models with a large number of global (coupled)

parameters, e.g. linear regression.

I Our proposal: VMP as a distributed projected natural

gradient ascent algorithm (PGNA).

d-VMP as a projected natural gradient ascent

IInsight 1: VMP can be expressed as a projected naturalgradient ascent algorithm.

⌘(t+1)X = ⌘(t)

X + ⇢X ,t [r⌘L(⌘(t))]+X (1)

I [·] is the projection operator.

IInsight 2: The natural gradient of the lower bound can beexpressed as follows:

r⌘✓L = mPa(✓)!✓ +

XmHi!✓

IThe gradient can be computed in parallel.

IInsight 3: Global parameters are “coupled” only if they belongto each other’s Markov blanket.

IDefine a disjoint partition of the global parameters:

R = {J1, . . . ,JS}

I d-VMP is based on performing independent global updatesover the global parameters of each partition:

⌘(t+1)Jr

= ⌘(t)Jr

+ ⇢r ,t [r⌘L(⌘(t))]+Jr

I ⇢r ,t is the learning rate. If |Jr | = 1 then ⇢r ,t = 1.

dVMP as a distributed PNGA algorithm:

Master

✓↵

⌘(t+1)Jr

i = 1, . . . ,N

Slave 1

⌘(t+1)1

i = 1, . . . ,N

Slave 2

⌘(t+1)2

i = 1, . . . ,N

Slave 3

⌘(t+1)3

⌘(t)Jr+ ⇢r ,t [O⌘L(⌘(t))]+Jr

For all Jr 2 R

Outline

1 Motivation

3 d-VMP

5 Conclusions

Model fit to the data

X ij H ijYi

Hi ✓

j = 1, . . . , J

i = 1, . . . ,N

Attribute rangeD

0 1 2 3 4 5 6

SVI 1%SVI 5%SVI 10%VMP 1%

I Representative sample of 55K clients (N) and 33 attributes (J).I “Unrolled” model of more than 3.5M nodes (75% latent variables).

0 500 1000 1500 2000

−1.5

−1.0

−5.0

Time (seconds)

nd Alg. BS(data%)/LRSVI 1%/0.55SVI 1%/0.75SVI 1%/0.99SVI 5%/0.55SVI 5%/0.75SVI 5%/0.99SVI 10%/0.55SVI 10%/0.75SVI 10%/0.99d−VMP

I Int. Conf. on Probabilistic Graphical Models, Lugano, Sept. 6–9, 2016 29

Test marginal log-likelihood

BS (% data) LR Log-Likel.

1 %0.55 -180902.870.75 -298564.030.99 -426979.52

5 %0.55 -177302.240.75 -333264.160.99 -628105.70

10 %0.55 -347035.220.75 -397525.450.99 -538087.13

d-VMP 1.0 67265.34

Mixtures of learnt posteriors for one attribute

Attribute range

−10000 0 10000 20000

SVI 1% SVI 5%SVI 10%VMP 1%

Scalability settings

I Generated data set of 42 million samples per client and 12variables.

I “Unrolled” model of more than 1 billion (10

9) nodes

(75% latent variables).I AMIDST Toolbox with Apache Flink.I Amazon Web Services (AWS) as distributed computing

environment.

Scalability results

Outline

1 Motivation

3 d-VMP

5 Conclusions

Conclusions

I Variational methods can be scaled using distributedcomputation instead of sampling techniques.

I Bayesian learning in model with more than 1 billion nodes(75% of hidden).

Thank you for your attention

Questions?

You can download our open source Java toolbox:amidsttoolbox.com

Acknowledgments: This project has received funding from the European Union’s

Seventh Framework Programme for research, technological development and

demonstration under grant agreement no 619209

d-VMP: Distributed Variational Message Passingalessandro/pgm/Martinez.pdf · d-VMP: Distributed...

Documents

Transcript of d-VMP: Distributed Variational Message Passingalessandro/pgm/Martinez.pdf · d-VMP: Distributed...

MSc projects 2020 - VMP

VSNR: THE VARIATIONAL STATIONARY NOISE REMOVER 1 Variational

Vessel Management Plan – VMP - Deepwater Groupdeepwatergroup.org/wp-content/uploads/2017/09/VMP-V… · · 2017-09-26Vessel Management Plan – VMP ... /Users/admin 1/Dropbox/DWG

MODERNIZATION PROGRAM (V-VMP)

VMP-OEM, VMP Micro, VMP500-2 - Zaxis Inc.zaxisinc.com/wp-content/uploads/VMP-Users-Manual-2015...VMP Users Manual 2015B VMP Variable Metering Pump User Manual US Patent # 7,708,535

Zaxis Inc - PumpCatalog · zaxis | leak test catalog | 2019 6 m0 3/16” ckc vmp-oem-m 100 0.05 50 1,000 m1 1/4" ckc vmp-oep-m 100 0.10 100 1,000 v1 1/4” san vmp-oem-v 100 0.32

Druckbegrenzungsventile - ABITEKv0710 vmp 1/2” 70 v0720 vmp 3/4” 90 106 code type p - t gas l mm l1 l2 l3 l4 l5 l6 g h s weight kg v0700 vmp 3/8” g 3/8” 72 134 15 26 49,5 8,5

CVMP update on activities relating to VMP Regulation

Classic, Edge, VMP - genemco.com

Variational Networks: Connecting Variational Methods … · Variational Networks: Connecting Variational Methods and Deep Learning Erich Kobler1, Teresa Klatzer1, Kerstin Hammernik1

Variational Inference & Variational Autoencoderscseweb.ucsd.edu/~dasgupta/254-deep-ul/casey-mary.pdfAuto-Encoding Variational Bayes (Kingma & Welling) SGVB (Stochastic Gradient Variational

VMP Application To Rent - Villas at Magnolia Placevillasatmagnoliaplace.com/VMP Application.pdf · VILLAS AT MAGNOLIA PLACE APPLICATION TO LEASE CONTINUED…. Page 5 of 5 RENTAL VERIFICATION

Platyhelminthes VMP 920 Infection & Immunity II Veterinary Parasitology.

CarbonWeb® VMP-CP15 · 2020. 5. 9. · CarbonWeb® VMP-CP15 CarbonWeb® VMP-CP15 Initial Resistance Model WC Nominal Size 500 FPM CarbonWeb VMP-CP15 v-bank mini-pleat HEGA (High

Installation Instructions for VMP( )-U40 ETU · Installation Instructions for VMP( )-U40 ETU 3 Electra Elite July 2004 subscriber messages, and public messages.) After the message

Vegetation Management Plan (VMP) Template

The variational principle - UNIOSfizika.unios.hr/~ilukacevic/dokumenti/materijali_za_studente/qm2/... · The variational principle The variational principle Quantum mechanics 2 -

Variational Bayesian inference - Kay Brodersen · Variational Bayesian inference is based on variational calculus. Variational calculus Standard calculus Newton, Leibniz, and others

RIGHTS-OF-WAY HERBICIDES A CITIZENS' GUIDE TO THE STATE ...waylandwells.info/wp-content/uploads/2012/07/NStarRightsofWay... · VEGETATION MANAGEMENT PLAN (VMP) The VMP is intended

A Choice of Variational Stacks: Exploring Variational Data ...web.engr.oregonstate.edu/~walkiner/papers/vamos17-variational-stacks.pdfA Choice of Variational Stacks: Exploring Variational