Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and...

51
Probabilistic Inference by Hashing and Optimization Stefano Ermon Stanford University

Transcript of Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and...

Page 1: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Probabilistic Inference by Hashing and

Optimization

Stefano Ermon

Stanford University

Page 2: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Challenges in reasoning about complex systems

Operations Research

Management Science

Economics

Three major challenges:

1.High dimensional spaces: need to consider many variables

2.Uncertainty: limited information, need to use stochastic models

3.Preferences and utilities: need to consider optimization criteria

Statistics

Computer Science Engineering

Physics

Page 3: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Decision Making and Inference under

Limited Information and Large Dimensionality

High-dimensional Spaces Lack of information/Uncertainty

Optimization

Combinatorial Reasoning

CombinatorialOptimization

Stochastic Optimization

Probabilistic Reasoning

Decision-making

Statistics

Preferences and Utilities

3

Page 4: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Decision Making and Inference for Sustainability

Lack of Information/Uncertainty

fuel cells materials[SAT-12, AAAI-15, IJCAI-15]

EV batteries[ECML-12]

Poverty traps[AAAI-15]

Poverty mapping[ongoing]

High-dimensional Spaces

natural resources management [UAI-10, IJCAI-11]

Preferences and Utilities

HabitatMonitoring

[ongoing]

4

Page 5: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Research Agenda

5

Foundations

Applications

inference by hashing and optimization

[ICML-13, UAI-13, ICML-14, ICML-15, UAI-15, AAAI-16]

bridging statistical physics and computer science

[CP-10, IJCAI-11, NIPS-11, NIPS-12]

High-throughput data mining and catalyst discovery

[SAT-12, AAAI-15, IJCAI-15]

sampling[NIPS-13, UAI-12]

Poverty mapping[AAAI-16]

natural resources management

[UAI-10, IJCAI-11]

Page 6: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Outline

1. Introduction

2. WISH: probabilistic inference by hashing and optimization

I. Formalism

II. Empirical validation

3. Families of hash functions

4. Random projections and hashing

5. Conclusions

6

Page 7: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Statistical Modeling and Probabilistic Inference

Statistical Modeling:

• Stochastic variables

• Models are probability distributions

• Goal is to make inferences/predictions. Examples:

– What is the probability that an object is in a certain

location, given some noisy sensor measurements?

– How well will this policy perform on average?

• Inference is needed also to learn models from data

Probabilistic inference is a cornerstone of statistical

machine learning and statistical applications

7

Page 8: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Progress So Far

Fundamental problem: studied since the early days of

computing (Manhattan Project – diffusion behavior of nuclear

particles)

Much progress has been made (contributions from CS,

physics, statistics, etc.). To date, two main approaches:

– Markov Chain Monte Carlo Sampling (ranked one of top

10 algorithms of all times)

– Variational

This work: fundamentally different approach based on

hashing and optimization [Chakraborty, Fremont, Meel, Seshia, and

Vardi; Achlioptas et al.; Belle et al.; Gomes et al., …]8

Page 9: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

A Worst-Case Perspective

9

P

NP

P^#P

PSPACE

NP-complete/hard:Combinatorial Reasoning

SAT, puzzles,

Integer Programming (MIP)

Note: widely believed hierarchy; know P≠EXP for sure

Easy

Hard

PH

EXP

#P-complete/hard:Probabilistic Reasoning

#SAT, sampling

High-dimensional Spaces

Uncertainty

High-dimensional Spaces

Uncertainty

Page 10: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Inference by Hashing and Optimization

Main technical contribution: If we allow

– Small failure probability (e.g., 0.1%)

– Small error (e.g., 1% relative error)

we can reduce to small number of NP-equivalent queries (combinatorial search/optimization)

Search: Is there a way to set the variables such that some property holds?

Integration: How many ways to set the variables s.t. a certain property holds?

10

High-dimensional Spaces

Uncertainty High-dimensional Spaces

Uncertainty

#P NP

Page 11: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

• Progress in combinatorial search since the 1990s

– Boolean Satisfiability problem SAT / SMT

– Mixed Integer Programming MIP

• Problem size: from 100 variables, 200 constraints (early 90s)

to 1,000,000 variables and 5,000,000 constraints in 20 years

• Time is mature for probabilistic inference applications.

Techniques traditionally used for deductive reasoning

(verification, synthesis) brought to the realm of probability and

inductive reasoning

A Practical Perspective

11

High-dimensional combinatorial spaces

Preferences and Utilities

Page 12: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Probabilistic Inference

For any configuration (or state), defined by

an assignment of values to the random variables, we can

compute the weight/probability of that configuration.

12

SPRINKLERRAINGRASS WET

SPRINKLER

True False

RA

INTrue 0.01 0.99

False 0.4 0.6

RAIN

True False

0.2 0.8

Example: Pr [Rain=T, Sprinkler=T, Wet=T] ∝ 0.01 * 0.2 * 0.99

WET

True False

… …

0.99 0.01

… …

Factor

Page 13: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Probabilistic Inference

What is the (conditional) probability of an event? For example,

Pr[Wet=T] = ∑x={T,F} ∑y={T,F} Pr[Rain=x, Sprinkler=y, Wet=T]

involves expectations/nested sums (sum up contributions from all

possible variable assignments)

13

SPRINKLERRAINGRASS WET

Page 14: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Probabilistic Inference in High Dimensions

• We are given – a set of 2n configurations (= assignments)

– non-negative weights w

• from Bayes Net, factor graph, weighted CNF..

• Goal: compute total weight

• Example 1: n=2 binary variables, sum over 4 configurations

• Example 2: n=100 variables, sum over 2100 ≈1030 terms (curse of dimensionality, generally #P-complete)

14

1 4 …

2n configurations

P[Wet=T]

= ∑x={T,F} ∑y={T,F} P[Rain=x, Sprinkler=y, Wet=T]

= 0 + 0.15 + 0.29 + 0.002 = 0.442

.002

Rain=FSprinkler=F

Rain=FSprinkler=T

Rain=TSprinkler =F

Rain=TSprinkler=T

configurations x

1 3

.29.150

Page 15: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Probabilistic model with 10 binary variables

1024 configurations sorted by weight

A New Approach: WISH

15

x1 x2 x10

? ?? … ? ? ? ? ? ? ?

Increasing weight

configurations

wei

ght

Page 16: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

A new approach : WISH

Probabilistic model with 10 binary variables

1024 configurations sorted by weight

16

x1 x2 x10

? 50? … ? ? ? ? ? ? ?

Increasing weight

Use optimization to find most likely configuration(NP-equivalent)

configurations

wei

ght

16

Page 17: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

A new approach : WISH

Probabilistic model with 10 binary variables

1024 configurations sorted by weight

17

x1 x2 x10

? 500

Increasing weight

… ? ? ? ? ? ? ?

Use optimization to find least likely configuration

configurations

wei

ght

Page 18: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

A new approach : WISH

Probabilistic model with 10 binary variables

1024 configurations sorted by weight

18

0≤?

≤50

x1 x2 x10

500

Increasing weight

… 0≤?

≤50

0≤?

≤50

0≤?

≤50

0≤?

≤50

0≤?

≤50

0≤?

≤50

0≤?

≤50

Page 19: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

A new approach : WISH

Probabilistic model with 10 binary variables

1024 configurations sorted by weight

19

x1 x2 x10

Weight distribution could be like this

Total weight = 1023 * 0 + 50 = 50

0? 500

Increasing weight

… 0? 0? 0? 0? 0? 0? 0?

Page 20: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

A new approach : WISH

Probabilistic model with 10 binary variables

1024 configurations sorted by weight

20

x1 x2 x10

OR could be like this

Total weight = 1023 * 50 + 0 = 51150

50? 500

Increasing weight

… 50? 50? 50? 50? 50? 50? 50?

50 <= Total weight <= 1023 * 50

Page 21: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

A new approach : WISH

Probabilistic model with 10 binary variables

1024 configurations sorted by weight

21

40

x1 x2 x10

500

Increasing weight

… 0≤?

≤40

0≤?

≤40

0≤?

≤40

0≤?

≤40

0≤?

≤40

0≤?

≤40

0≤?

≤40

Find weight of 2nd most likely configuration(how explained later)

90 <= Total weight <= 50 + 40*1022

Page 22: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

A new approach : WISH

Probabilistic model with 10 binary variables

1024 configurations sorted by weight

22

40

x1 x2 x10

500

Increasing weight

… 0≤?

≤9

0≤?

≤9

0≤?

≤9

0≤?

≤9

0≤?

≤99

9≤?

≤40

Find weight of 4th most likely(how explained later)

90 + 9*2 <= Total weight <= 50 + 40*2+9*1020

Page 23: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

A new approach : WISH

Probabilistic model with 10 binary variables

1024 configurations sorted by weight

23

40

x1 x2 x10

500

Increasing weight

… 0≤?

≤55

5≤?

≤9

5≤?

≤9

5≤?

≤99

9≤?

≤40

Find weight of 8th most likely configuration(how explained later)

90 + 9*2 +5*4<= Total weight <= 50 + 40*2+9*4+5*1016

Page 24: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

A New Approach: WISH

If we have

• Weight of the most likely configuration

• Weight of 2nd most likely configuration

• Weight of 4th most likely configuration

• …

• Weight of 512th most likely configuration

24

x1 x2 x10

Increasing weight

Weight of 2i-th most likely configuration

We can show:

Factor of 2 approximation of the total weight

How to compute the weight of 2i-th most likely configuration when i is large (say i=50) ?

Page 25: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Random parity constraints

• XOR/parity constraints:

– Example: a b c d = 1 satisfied if an odd number of a,b,c,d are set to 1

• For every configuration, set weight to zero if parity constraint is

not satisfied

25

Randomly generated parity constraint X

Each variable added with prob. 0.5

x1 x3 x4 x7 x10 = 1

configurations

wei

ght

Page 26: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

0 50 1000

1

2

3

Configurations

Weig

ht

Strongly Universal Hashing

• XOR/parity constraints to the rescue

– a b c d = 1: satisfied if an odd number of a,b,c,d are set to 1

Two crucial properties (strongly universal hashing [Valiant-Vazirani, Stockmeyer]):

1) Uniformity : For every configuration A, Pr [ A satisfies X ] = 0.5

2) Pairwise independence: For every two configurations A and B,

“A satisfies X ” and “B satisfies X ” are independent events

26

Randomly generated parity constraint X

Each variable added with prob. 0.5

x1 x3 x4 x7 x10 = 1

configurations

wei

ght

Page 27: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Universal Hashing and Optimization

• Example: suppose we want to compute the weight of the 64th most

likely configuration

• Weight of the mode (most likely configuration) of the randomly projected

distribution is “often” “close enough” to the weight of the 64th most likely

configuration of the original distribution

27

0 500 10000

1

2

3

Configurations

Weig

ht

Use optimization to find most likely configuration

configurations

wei

ght

configurations

wei

ght

After adding 6 random parity constraints we are left with 1024/64 =1024/26=16

configurations on average

Page 28: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Visual working of the algorithm

• How it works

28

….

median M1

1 random parity constraint

2 random parity constraints

…. ….

3 random parity constraints

median M2 median M3

….

Mode M0 + + +×1 ×2 ×4 + …

Weight function

n times

Log(n) times

Optimization to find the mode optimization

finds the mode

≈ weight of 2nd

most likely conf.≈ weight of 4th

most likely conf.≈ weight of 8th

most likely conf.Final estimate for the total weight

xnx1

Page 29: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Accuracy Guarantees

Main Theorem (stated informally):

With probability at least 1- δ (e.g., 99.9%),

WISH (Weighted-Sums-Hashing) computes a sum defined

over 2n configurations (probabilistic inference, #P-hard)

with a relative error that can be made as small as desired,

and it requires solving θ(n log n) optimization instances

(NP-equivalent problems).

29

Page 30: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Key Features

• Strong accuracy guarantees

• Can leverage recent advances in optimization

• Decoding techniques for error correcting codes (LDPC)

• Even without optimality guarantees, bounds/approximations

on the optimization translate to bounds/approximations on

the weighted sum

• Straightforward parallelization (independent optimizations)

– We experimented with >600 cores

30

Page 31: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Outline

1. Introduction

2. WISH: probabilistic inference by hashing and optimization

I. Formalism

II. Empirical validation

3. Families of hash functions

4. Random projections and hashing

5. Conclusions

31

Page 32: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Counting via NP-Queries: Sudoku

Filling a Sudoku puzzle is a combinatorial search problem.

We are reasonably good at it.

Computers are extremely good at it.

32

Page 33: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Counting via NP-Queries: Sudoku

• How many ways to fill a valid sudoku square?

• WISH estimate: 1.634×1021

• Traditional probabilistic inference methods fail. Cannot reason about

such complex constraints.

33

1 2

….

?

≈1077 possible squares

{ 1 if the square is valid

0 otherwise

Specialized proof:6.671×1021

x1 x2 x81

Page 34: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

x1

x2

x3

xn

x4

Inference benchmark: Ising models

3434

Statistical physics Computer Vision

Page 35: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Number of variables

Log

no

rmal

izat

ion

co

nst

ant

Inference benchmark: Ising models from physics

35

Guaranteed relative error: ground truth is provably within this small error band(too small to see with this scale)

Variational methods provide inaccurate estimates (well outside the error band)

35

WISH

Belief Propagation

Mean Field

BP variant

=Ground Truth

Page 36: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Inference benchmark: Ising models from physics

3636

Page 37: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Inference benchmark: Ising models

37

Gibbs Sampling

Page 38: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Inference benchmark: Ising models

38

Gibbs Sampling

Belief Prop.

Page 39: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Inference benchmark: Ising models

39

Gibbs Sampling

Belief Prop.

WISH

Page 40: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Outline

1. Introduction

2. WISH: probabilistic inference by hashing and optimization

I. Formalism

II. Empirical validation

3. Families of hash functions

4. Random projections and hashing

5. Conclusions

40

Page 41: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Hashing as coin flipping

• Hashing as a distributed coin flipping mechanism

• Ideally, each configuration flips a coin independently

0000 … 11110001 1101 1110

Heads Tails

Configurations

Coin flips

Page 42: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

• Issue: we cannot simulate so many independent coin flips (one for

each possible variable assignment)

• Solution: each configuration flips a coin pairwise independently

• Any two coin flips are independent

• Three or more might not be independent

Still works! Pairwise independence guarantees that configurations do not

cluster too much in single bucket.

Can be simulated using parity constraints: add each variable with

probability ½.

Pairwise Independent Hashing

``Long’’ parity constraints are difficult to deal with!

Page 43: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

• New class of average universal hash functions (coin flips generation

mechanism) [ICML-14]

• Pairs of coin flips are NOT guaranteed to be independent anymore

• Key idea: Look at large enough sets. If we look at all pairs, on average

they are “independent enough”. Main result:

1. These coin flips are good enough for probabilistic inference (applies to all

previous techniques based on hashing; same theoretical guarantees)

2. Can be implemented with short parity constraints

Average Universal Hashing

Page 44: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Short Parity Constraints for Probabilistic Inference

and Model Counting

Main Theorem (stated informally): [AAAI-16]

For large enough n (= number of variables),

– Parity constraints of length log(n) are sufficient.

– Parity constraints of length log(n) are also necessary.

Proof borrows ideas from the theory of low-density parity

check codes.

Short constraints are much easier to deal with in practice.

44

Page 45: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Model Selection

• Model Selection: we have several competing statistical models. Which

one fits the data better?

– Pick the one that is most likely to have generated the data

– In the majority of models of interest, probabilities are defined up to a

normalization constant

– Ranking requires the computation of the normalization constant

• Model selection for RBM models:

45

Handwritten Digits

Using short constraints we get state of the art estimates on Restricted Boltzmann Machines (vs. Annealed Importance Sampling) [UAI-15]

Page 46: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Outline

1. Introduction

2. WISH: probabilistic inference by hashing and optimization

I. Formalism

II. Empirical validation

3. Families of hash functions

4. Random projections and hashing

5. Conclusions

46

Page 47: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Hashing as a random projection

• XOR/parity constraints:

– Example: a b c d = 1 satisfied if an odd number of a,b,c,d are set to 1

47

Randomly generated parity constraint X

Each variable added with prob. 0.5

x1 x3 x4 x7 x10 = 1

configurations

wei

ght

Set weight to zero if constraint is not satisfied

configurations

wei

ght

0 50 1000

1

2

3

Configurations

Weig

ht Random projection

This random projection:

1. “simplifies” the model

2. preserves its “key properties”

Page 48: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Variational Methods and Random Projections

48

Family of “simple” distributions Q

I-projection

p’Random projection

pComplex, intractable model

(probability distribution)

I-projection

Tractable distribution that is as close as possible to p

Cannot bound this distanceKey issue with all variational methods: the approximation can be arbitrarily bad Taking a random projection of p:

1. Properties of p are preserved2. p’ is “simpler” and therefore guaranteed to be “close” to Q

Idea: take a random projection first, then an I-projection [AISTATS-16]

Main Result: provably good approximation with high probability

Page 49: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Mean Field with Random Projections

49

Standard variational approach

(I-projection)

Augmented withrandom projections

reduction

Deep generative model of

handwritten digits

Family of “simple” distributions Q

I-projection

p’Random projection

pComplex, intractable model

(probability distribution)

I-projection

Adding random projections leads to an optimization problem:

• With less variables• With additional non-convex constraints• Still convex coordinate-wise

Page 50: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Summary

50

Probabilistic inference by hashing and

optimization:

– Probabilistic inference reduced to a small number of

combinatorial optimization problems by universal hashing

– Hashing as a random projection: simplifies a model while

preserving some of its key properties

– Can leverage existing combinatorial solvers and variational

methods

– Provides formal accuracy guarantees

– Works well in practice on a range of domains

Page 51: Probabilistic Inference by Hashing and Optimization€¦ · bridging statistical physics and computer science [CP -10, IJCAI 11, NIPS 12] ... Statistical Modeling and Probabilistic

Conclusions

51

Inductive ReasoningMachine LearningProbabilistic InferenceData

Deductive ReasoningLogicConstraint OptimizationBackground Theories (physical laws)

• Foundational AI/machine learning work:– New probabilistic inference by hashing and optimization

– Learning

– Decision making under uncertainty

– …

• Applications:– Incorporating background knowledge into probabilistic models to aid

scientific discovery

– …

Three challenges:

1. High dimensional spaces2. Uncertainty3. Preferences and utilities