Ahsanul Haque , Swarup Chandra , Latifur Khan * and Michael Baron + * Department of Computer...

Ahsanul Haque*, Swarup Chandra*, Latifur Khan* and Michael Baron+

* Department of Computer Science, University of Texas at Dallas+ Department of Mathematical Sciences, University of Texas at Dallas

MapReduce Guided Approximate Inference Over

Graphical Models

This material is based upon work supported by

University Of Texas at Dallas

2 University Of Texas at Dallas

AgendaBrief overview on Inference techniquesProblemProposed ApproachesExperimentsDiscussion


Graphical ModelsA probabilistic graphical model G is a

collection of functions over a set of random variables.

Generally represented as a network of nodes:Each node denoting a random variable (e.g.,

data feature).Each edge denotes relationship between two

random variables.

Two types of representations:Bayesian network is represented by directed

graph.Markov network is represented by undirected

graph.


Example Graphical Model

Inference is needed to evaluate Probability of Evidence, Prior and Posterior Marginal, Most Probable Explanation (MPE) and Maximum a Posteriori (MAP) queries.

Probability of Evidence needs to be evaluated in classification problems.

A C (A,C)

0 0 5

0 1 100

1 0 15

1 1 20

A

B

C

D

E

F

(A,C) (C,E)

(D,F)(B,D)

(C,D)(A,B) (E,F)

Sample Factor:


Exact InferenceExact Inference algorithms, e.g., Variable Elimination

provide accurate results for Probability of Evidence.

Challenges:Exponential time and space complexity.Computationally intractable on large graphs.

Approximate Inference algorithms are used widely in practice to evaluate queries within resource limit.Sampling based, e.g., Gibbs Sampling, Importance

Sampling.Propagation based, e.g., Iterative Join Graph Propagation.


Adaptive Importance Sampling (AIS)

Adaptive Importance Sampling (AIS) is an approximate Inference algorithm where-Samples are generated from a known distribution Q,

called the proposal distribution.Q is updated periodically based on the sample weights.Probability of evidence is evaluated using the samples

generated using the proposal distribution and by calculating the following expected value with respect to Q.

P(E = e) EQ [], where R=X\EConsidering weight of each sample reduces the variance

in expected value due to occurrence of rare events.


RB-AISWe focus on a special type of AIS in this paper,

called Rao-Blackwellized Adaptive Importance Sampling (RB-AIS).

In RB-AIS, a set of variables, Xw ⊂ X \ Xe (called w-cutset variables) are sampled.

Xw is chosen in such a way that Exact Inference over X \ Xw, Xe is tractable.Large |Xw| results in quicker evaluation of query

but more erroneous result.Small |Xw| results in more accurate result but

takes more time.Trade off!

V. Gogate and R. Dechter, “Approximate inference algorithms for hybrid bayesian networks with discrete constraints.” in UAI. AUAI Press, 2005, pp. 209–216.


RB-AIS : StepsStart

Initial Q on Xw

Generate Samples

Calculate Sample Weights

Update Q and Z

Converge?

End

Yes

No

Ψ 𝑥=𝑉𝐸 (𝐺𝑒𝑤)𝑄(𝑿=𝑥)Q(X = x)


ProblemReal world applications require good quality

result within the time constraint.Typically, real world networks are large and

complex (i.e., large tree width).For instance, if we want to model facebook users

using graphical models, it will have billions of nodes in it!

Even RB-AIS may run out of time to provide a quality estimate within the time limit.For instance, RB-AIS takes more than 6 hours to

find out a single probability of evidence on a network having only 67 nodes and 271 factors.


ChallengesTo design a parallel and distributed

approach for RB-AIS, following challenges need to be addressed:RB-AIS updates Q periodically. Since values of Q and Z at iteration i depends

on those values at iteration i -1, a proper synchronization mechanism is needed.

Distributing the task of sample generation on Xw over the worker nodes.


Proposed ApproachesWe design and implement two MapReduce based

approaches for distributed and parallel computation of inference queries using RB-AIS.

Distributed Sampling in Mappers (DSM)Parallel sampling.Sequential weight calculation.Each MapReduce Job Unit(MJU) contains only one MapReduce

Job.

Distributed Weight Calculation in Reducers (DWCR)Parallel sampling.Parallel weight calculation.Each MapReduce Job Unit(MJU) contains two MapReduce Jobs.


Distributed Sampling in Mappers (DSM)

Reducer

1 (X1, x11, Qi[X1])

n (X1, x1n, Qi[X1])

Shuffle and Sort: aggregate values by keys

X1 Qi[x1]

Map 1

Input to ith MJU: Xw, Qi

X2 Qi[x2] X3 Qi[x3] Xm Qi[xm]-1 Z

1 (X2, x21, Qi[X2])

n (X2, x2n, Qi[X2])

1 (X3, x31, Qi[X3])

n (X3, x3n, Qi[X3])

1 (Xm, xm1, Qi[Xm])

n (Xm, xmn, Qi[Xm])

s (X1, x1s, Q[X1]) (X2, x2s, Q[X2]) (X3, x3s, Q[X3]) (Xm, xms, Q[Xm])

Calculate 1, 2 …n Update Z, and Qi to Qi+1

-1 Z

X1 Qi+1[x1] X2 Qi+1[x2] X3 Qi+1[x3] Xm Qi+1[xm] -1 Z

Combine x1s, x2s…xms to form xs, where s = {1,2…n}

Map 2 Map 3 Map m


Distributed Weight Calculation in Reducers (DWCR)

Input to ith MJU: Xw, Qi

Map 1Input: X1 ⊂ Xw

Output: Partial Samplesx1 ∈ X1

Map 2Input: X1 ⊂ Xw

Output: Partial Samplesx2 ∈ X2

Map mInput: Xm ⊂ Xw

Output: Partial Samplesxm ∈ Xm

ReducerUpdate Z, and Qi to

Qi+1

Reducer 1Combine partial Samples

s: xi → x; i ∈ {1….m}Calculate weight Ψx

Reducer 2Combine partial Samples


Reducer rCombine partial Samples


Map 1Output Ψx

Map 2Output Ψx

Map jOutput Ψx


SetupPerformance Metrics:

Speedup = Tsq/Td

Tsq = Execution time of sequential approach.Td = Execution time of distributed approach.

Scaleup = Ts/Tp

Ts = Execution time using single Machine.Tp = Execution time using multiple Machines.

Hadoop version 1.2.1. 8 data nodes, 1 name node.Each machine has 2.2GHz processor and 4 GB of RAM.

NetworkNumber of Nodes

Number of Factors

54.wcsp[

1] 67 271

29.wcsp[

1] 82 462

404.wcsp[1] 100 710

[1] “The probabilistic inference challenge (pic2011),” http://www.cs.huji.ac.il/project/PASCAL/showNet.php, 2011, last updated on 10.23.2014.


Speedup


Scaleup


DiscussionBoth of the approaches achieve substantial speedup

and scaleup comparing with the sequential execution.

DWCR has better speedup and scalability than DSM.

Weight calculation is computationally more expensive than sample generation.

DWCR does both parallel weight calculation and parallel sampling, so it outperforms DSM.

Both of the approaches show similar accuracy to the sequential execution asymptotically.


Questions?

Ahsanul Haque , Swarup Chandra , Latifur Khan * and Michael Baron + * Department of Computer...

Documents

Transcript of Ahsanul Haque , Swarup Chandra , Latifur Khan * and Michael Baron + * Department of Computer...

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer...

Documents

Transcript of Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer...

Ahsanul Haque , Swarup Chandra , Latifur Khan * and Michael Baron + * Department of Computer...

Transcript of Ahsanul Haque , Swarup Chandra , Latifur Khan * and Michael Baron + * Department of Computer...