Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A...
Transcript of Lecture 12: VariationalInference and Mean Field · Computing Mean Parameter: Bernoulli 10 •A...
CS839:ProbabilisticGraphicalModels
Lecture12:Variational InferenceandMeanField
TheoRekatsinas
1
Summary
2
• Variational Inference(approximateinference):• LoopyBP(BetheFreeEnergy)• Mean-fieldApproximation
• Whatiscommoninthetwo?
• LoopyBP:outerapproximationofthemarginalpolytope• Mean-field:innerapproximationofthemarginalpolytope
Variational Methods
3
• Variational means:optimization-basedformulation• Representaquantityofinterestasthesolutiontoanoptimizationproblem• Approximatethedesiredsolutionbyrelaxing/approximatingtheintractableoptimizationproblem
• Example:
InferenceproblemsinGraphicalModels
4
• Consideranundirectedgraphicalmodel(MRF)
• Thequantitatesofinterest(forinference)
• Marginaldistributions
• NormalizationconstantZ• Howtorepresentthesequantitiesinavariational form?• Exponentialfamiliesandconvexanalysis
ExponentialFamilies
5
• Canonicalparameterization
• Lognormalizationconstant
• Thisisaconvexfunction• Spaceofcanonicalparameters
GraphicalModelsasExponentialFamilies
6
• Undirectedgraphicalmodel(MRF)
• MRFinexponentialform:
Example:GaussianMRF
7
• Zero-meanmultivariateGaussiandistributionthatrespectstheMarkovpropertyofagraph
• GaussianMRFinexponentialform
Example:DiscreteMRF
8
Whyexponentialfamilies
9
• Computingtheexpectationofsufficientstatistics(meanparameters)giventhecanonicalparametersyieldsthemarginals
• Computingthenormalizeryieldsthelogpartitionfunction(orloglikelihoodfunction)
ComputingMeanParameter:Bernoulli
10
• AsingleBernoullirandomvariable
• Inference=Computingthemeanparameter
• Inavariational manner:casttheprocedureofcomputingmeaninanoptimization-basedformulation
ConjugateDualFunction
11
• Givenanyfunctionf(θ)itsconjugatedualfunctionis
• Conjugatedualisalwaysaconvexfunction:point-wisesupremumofaclassoflinearfunctions
DualoftheDualistheOriginal
12
• Undersometechnicalconditionsonf(convexandlowersemi-continuous)thedualofdualisitself.
• Forlogpartitionfunction
ComputingMeanParameter:Bernoulli
13
• Theconjugate
• Stationarycondition
• If
• If
• Wehave:
ComputingMeanParameter:Bernoulli
14
• Theconjugate
• Stationarycondition
• Wehave:
• Thevariational form:
• Theoptimumisachievedat
Remarks
15
• Thelastfewidentitiesrelyonadeeptheoryingeneralexponentialfamily:• Thedualfunctionisthenegativeentropyfunction• Themeanparameterisrestricted• Solvingtheoptimizationreturnsthemeanparameterandlogpartitionfunction
• Extendthistogeneralexponentialfamilies/graphicalmodels.
• However,• Computingtheconjugatedualentropyisingeneralintractable• Theconstraintsetofmeanparameterishardtocharacterize• Weneedtoapproximate
ComputetheConjugateDual
16
• Givenanexponentialfamily
• Thedualfunction• Stationarycondition
• DerivativesofAyieldsthemeanparameters• Thestationaryconditionbecomes• Forwhichμwehaveasolutionθ(μ)?
ComputetheConjugateDual
17
• Let’sassumethereisasolutionθ(μ) suchthat
• Thedualhastheform
• Theentropyisdefinedas
• Sothedualiswhenthereisasolutionθ(μ)
ComplexityofComputingConjugateDual
18
• Thedualfunctionisimplicitlydefined:
• Solvingtheinversemappingisnon-trivial• Evaluatingthenegativeentropyrequireshigh-dimensionalintegration(summation)• Forwhichμ doesithaveasolutionθ(μ)?WhatisthedomainofA*(μ)
MarginalPolytope
19
• Foranydistributionp(x)andasetofsufficientstatisticsφ(x)defineavectorofmeanparameters
• p(x)isnotnecessarilyanexponentialfamily
• Thesetofallrealizablemeanparametersisaconvexset
• Fordiscreteexp.familiesthisiscalledmarginalpolytope.
ConvexPolytope
20
• Convexhullrepresentation
• Half-planerepresentation• Minkowski-WeylTheorem:anynon-emptyconvexpolytopecanbecharacterizedbyafinitecollectionoflinearinequalityconstraints
Example:Two-nodeIsing Model
21
• Sufficientstatistics
• Meanparameters
• Two-nodeIsing model• Convexhullrepresentation
• Half-planerepresentation
MarginalPolytopeforGeneralGraphs
22
• Stilldoableforconnectedbinarygraphswith3nodes:16constraints
• Fortreegraphicalmodels,thenumberofhalf-placesgrowsonlylinearlyinthegraphsize
• Generalgraphs?• Extremelyhardtocharacterizethemarginalpolytope.
Variational Principle
23
• Thedualfunctiontakestheform
• Thelogpartitionfunctionhasthevariational form
• Forallθ theaboveoptimizationproblemisattaineduniquelyatμ(θ)thatsatisfies
Example:Two-nodeIsing Model
24
• Thedistribution• Sufficientstatistics
• Themarginalpolytopeischaracterizedby
• Thedualhasanexplicitform
• Thevariational problemis• Theoptimumisattainedat
Variational Principle
25
• Exactvariational formulation
• Meanfieldmethod:non-convexinnerboundandexactformofentropy
• BetheapproximationandLoopyBP:polyhedralouterboundandnon-convexBetheapproximation
BeliefPropagationAlgorithm
26
• Messagepassingrule:
• Marginals
• Exactfortreesbutapproximateforloopygraphs• Howdoesthisrelatetothevariational principle?Fortrees/genericgraphs?
TreeGraphicalModels
27
• DiscretevariablesonatreeT=(V,E)
• Sufficientstatistics
• Exponentialrepresentationofdistribution?• Meanparametersaremarginalprobabilities:
MarginalPolytopeforTrees
28
• Marginalpolytopeforgeneralgraphs
• Byjunctiontreewehave:
• Ifthen
DecompositionofEntropyforTrees
29
• Fortreestheentropydecomposesas(thisisalsoourdual!):
ExactVariational PrincipleforTrees
30
• Variational formulation
• AssignaLagrangemultiplierforthenormalizationconstraintandeachmarginalizationconstraint
Lagrangian Derivation
31
• TakingthederivativesoftheLagrangian wrt toμs μst
• Settingthemtozerosyields
BPonArbitraryGraphs
32
• Twomaindifficultiesofthevariationformulation
• Themarginalpolytopeishardtocharacterize,solet’susethetree-basedouterbound
• Exactentropylacksexplicitform,solet’sapproximateitusingtheexactexpressionfortrees
BetheVariational Problem
33
• CombiningthetwogivesustheBethevariational problem
• Whatishappening?• Tree-basedouterbound
MeanFieldApproximation
34
TractableSubgraphs
35
• ForanexponentialfamilywithsufficientstatisticsφdefinedongraphGthesetofrealizablemeanparametersetis
• Idea:restrictptoasubsetofdistributionsassociatedwithatractablesubgraph
MeanFieldMethods
36
• ForagiventractablesubgraphF,asubsetofcanonicalparametersis
• Innerapproximation
• Meanfieldsolvestherelaxedproblem
• istheexactdualfunctionrestrictedto
Example:NaïveMeanFieldforIsing Model
37
• Ising modelin{0,1}representation
• Meanparameters
• ForfullydisconnectedgraphF
• Thedualdecomposesintosum,oneforeachnode
Example:NaïveMeanFieldforIsing Model
38
• Meanfieldproblem
• Thesameobjectivefunciton asinfreeenergybasedapproach
• Thenaïvemeanfieldupdateequations
• Lowerboundonlogpartitionfunction
GeometryofMeanField
39
• Meanfieldoptimizationisalwaysnon-convexforanyexponentialfamilyinwhichthestatespaceisfinite
• Marginalpolytopeisaconvexhull
• containsalltheextremepoints(ifitisastrictsubsetthenitmustbenon-convex• Example:two-nodeising
• Paraboliccrosssectionalongτ1 =τ2
Summary
40
• Variationmethodsingeneralturninfernece intoanoptimizationproblemviaexponentialfamiliesandconvexduality
• Theexactvariational principleisintractabletosolve;Twoapproximations:• Eitherinnerorouterboundtothemarginalpolytope• Variousapproximationstotheentropyfunction
• Mean-field:non-convexinnerboundandexactformofentropy• BP:polyhedralouterboundandnon-convexBetheapproximation