Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A...

40
CS839: Probabilistic Graphical Models Lecture 12: Variational Inference and Mean Field Theo Rekatsinas 1

Transcript of Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A...

Page 1: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

CS839:ProbabilisticGraphicalModels

Lecture12:Variational InferenceandMeanField

TheoRekatsinas

1

Page 2: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Summary

2

• Variational Inference(approximateinference):• LoopyBP(BetheFreeEnergy)• Mean-fieldApproximation

• Whatiscommoninthetwo?

• LoopyBP:outerapproximationofthemarginalpolytope• Mean-field:innerapproximationofthemarginalpolytope

Page 3: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Variational Methods

3

• Variational means:optimization-basedformulation• Representaquantityofinterestasthesolutiontoanoptimizationproblem• Approximatethedesiredsolutionbyrelaxing/approximatingtheintractableoptimizationproblem

• Example:

Page 4: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

InferenceproblemsinGraphicalModels

4

• Consideranundirectedgraphicalmodel(MRF)

• Thequantitatesofinterest(forinference)

• Marginaldistributions

• NormalizationconstantZ• Howtorepresentthesequantitiesinavariational form?• Exponentialfamiliesandconvexanalysis

Page 5: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

ExponentialFamilies

5

• Canonicalparameterization

• Lognormalizationconstant

• Thisisaconvexfunction• Spaceofcanonicalparameters

Page 6: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

GraphicalModelsasExponentialFamilies

6

• Undirectedgraphicalmodel(MRF)

• MRFinexponentialform:

Page 7: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Example:GaussianMRF

7

• Zero-meanmultivariateGaussiandistributionthatrespectstheMarkovpropertyofagraph

• GaussianMRFinexponentialform

Page 8: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Example:DiscreteMRF

8

Page 9: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Whyexponentialfamilies

9

• Computingtheexpectationofsufficientstatistics(meanparameters)giventhecanonicalparametersyieldsthemarginals

• Computingthenormalizeryieldsthelogpartitionfunction(orloglikelihoodfunction)

Page 10: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

ComputingMeanParameter:Bernoulli

10

• AsingleBernoullirandomvariable

• Inference=Computingthemeanparameter

• Inavariational manner:casttheprocedureofcomputingmeaninanoptimization-basedformulation

Page 11: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

ConjugateDualFunction

11

• Givenanyfunctionf(θ)itsconjugatedualfunctionis

• Conjugatedualisalwaysaconvexfunction:point-wisesupremumofaclassoflinearfunctions

Page 12: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

DualoftheDualistheOriginal

12

• Undersometechnicalconditionsonf(convexandlowersemi-continuous)thedualofdualisitself.

• Forlogpartitionfunction

Page 13: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

ComputingMeanParameter:Bernoulli

13

• Theconjugate

• Stationarycondition

• If

• If

• Wehave:

Page 14: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

ComputingMeanParameter:Bernoulli

14

• Theconjugate

• Stationarycondition

• Wehave:

• Thevariational form:

• Theoptimumisachievedat

Page 15: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Remarks

15

• Thelastfewidentitiesrelyonadeeptheoryingeneralexponentialfamily:• Thedualfunctionisthenegativeentropyfunction• Themeanparameterisrestricted• Solvingtheoptimizationreturnsthemeanparameterandlogpartitionfunction

• Extendthistogeneralexponentialfamilies/graphicalmodels.

• However,• Computingtheconjugatedualentropyisingeneralintractable• Theconstraintsetofmeanparameterishardtocharacterize• Weneedtoapproximate

Page 16: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

ComputetheConjugateDual

16

• Givenanexponentialfamily

• Thedualfunction• Stationarycondition

• DerivativesofAyieldsthemeanparameters• Thestationaryconditionbecomes• Forwhichμwehaveasolutionθ(μ)?

Page 17: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

ComputetheConjugateDual

17

• Let’sassumethereisasolutionθ(μ) suchthat

• Thedualhastheform

• Theentropyisdefinedas

• Sothedualiswhenthereisasolutionθ(μ)

Page 18: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

ComplexityofComputingConjugateDual

18

• Thedualfunctionisimplicitlydefined:

• Solvingtheinversemappingisnon-trivial• Evaluatingthenegativeentropyrequireshigh-dimensionalintegration(summation)• Forwhichμ doesithaveasolutionθ(μ)?WhatisthedomainofA*(μ)

Page 19: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

MarginalPolytope

19

• Foranydistributionp(x)andasetofsufficientstatisticsφ(x)defineavectorofmeanparameters

• p(x)isnotnecessarilyanexponentialfamily

• Thesetofallrealizablemeanparametersisaconvexset

• Fordiscreteexp.familiesthisiscalledmarginalpolytope.

Page 20: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

ConvexPolytope

20

• Convexhullrepresentation

• Half-planerepresentation• Minkowski-WeylTheorem:anynon-emptyconvexpolytopecanbecharacterizedbyafinitecollectionoflinearinequalityconstraints

Page 21: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Example:Two-nodeIsing Model

21

• Sufficientstatistics

• Meanparameters

• Two-nodeIsing model• Convexhullrepresentation

• Half-planerepresentation

Page 22: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

MarginalPolytopeforGeneralGraphs

22

• Stilldoableforconnectedbinarygraphswith3nodes:16constraints

• Fortreegraphicalmodels,thenumberofhalf-placesgrowsonlylinearlyinthegraphsize

• Generalgraphs?• Extremelyhardtocharacterizethemarginalpolytope.

Page 23: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Variational Principle

23

• Thedualfunctiontakestheform

• Thelogpartitionfunctionhasthevariational form

• Forallθ theaboveoptimizationproblemisattaineduniquelyatμ(θ)thatsatisfies

Page 24: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Example:Two-nodeIsing Model

24

• Thedistribution• Sufficientstatistics

• Themarginalpolytopeischaracterizedby

• Thedualhasanexplicitform

• Thevariational problemis• Theoptimumisattainedat

Page 25: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Variational Principle

25

• Exactvariational formulation

• Meanfieldmethod:non-convexinnerboundandexactformofentropy

• BetheapproximationandLoopyBP:polyhedralouterboundandnon-convexBetheapproximation

Page 26: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

BeliefPropagationAlgorithm

26

• Messagepassingrule:

• Marginals

• Exactfortreesbutapproximateforloopygraphs• Howdoesthisrelatetothevariational principle?Fortrees/genericgraphs?

Page 27: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

TreeGraphicalModels

27

• DiscretevariablesonatreeT=(V,E)

• Sufficientstatistics

• Exponentialrepresentationofdistribution?• Meanparametersaremarginalprobabilities:

Page 28: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

MarginalPolytopeforTrees

28

• Marginalpolytopeforgeneralgraphs

• Byjunctiontreewehave:

• Ifthen

Page 29: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

DecompositionofEntropyforTrees

29

• Fortreestheentropydecomposesas(thisisalsoourdual!):

Page 30: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

ExactVariational PrincipleforTrees

30

• Variational formulation

• AssignaLagrangemultiplierforthenormalizationconstraintandeachmarginalizationconstraint

Page 31: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Lagrangian Derivation

31

• TakingthederivativesoftheLagrangian wrt toμs μst

• Settingthemtozerosyields

Page 32: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

BPonArbitraryGraphs

32

• Twomaindifficultiesofthevariationformulation

• Themarginalpolytopeishardtocharacterize,solet’susethetree-basedouterbound

• Exactentropylacksexplicitform,solet’sapproximateitusingtheexactexpressionfortrees

Page 33: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

BetheVariational Problem

33

• CombiningthetwogivesustheBethevariational problem

• Whatishappening?• Tree-basedouterbound

Page 34: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

MeanFieldApproximation

34

Page 35: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

TractableSubgraphs

35

• ForanexponentialfamilywithsufficientstatisticsφdefinedongraphGthesetofrealizablemeanparametersetis

• Idea:restrictptoasubsetofdistributionsassociatedwithatractablesubgraph

Page 36: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

MeanFieldMethods

36

• ForagiventractablesubgraphF,asubsetofcanonicalparametersis

• Innerapproximation

• Meanfieldsolvestherelaxedproblem

• istheexactdualfunctionrestrictedto

Page 37: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Example:NaïveMeanFieldforIsing Model

37

• Ising modelin{0,1}representation

• Meanparameters

• ForfullydisconnectedgraphF

• Thedualdecomposesintosum,oneforeachnode

Page 38: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Example:NaïveMeanFieldforIsing Model

38

• Meanfieldproblem

• Thesameobjectivefunciton asinfreeenergybasedapproach

• Thenaïvemeanfieldupdateequations

• Lowerboundonlogpartitionfunction

Page 39: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

GeometryofMeanField

39

• Meanfieldoptimizationisalwaysnon-convexforanyexponentialfamilyinwhichthestatespaceisfinite

• Marginalpolytopeisaconvexhull

• containsalltheextremepoints(ifitisastrictsubsetthenitmustbenon-convex• Example:two-nodeising

• Paraboliccrosssectionalongτ1 =τ2

Page 40: Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A single Bernoulli random variable ... •Minkowski-Weyl Theorem: any non -empty ... •For

Summary

40

• Variationmethodsingeneralturninfernece intoanoptimizationproblemviaexponentialfamiliesandconvexduality

• Theexactvariational principleisintractabletosolve;Twoapproximations:• Eitherinnerorouterboundtothemarginalpolytope• Variousapproximationstotheentropyfunction

• Mean-field:non-convexinnerboundandexactformofentropy• BP:polyhedralouterboundandnon-convexBetheapproximation