Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer...
-
Upload
jeremy-ray -
Category
Documents
-
view
215 -
download
0
Transcript of Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer...
Ahsanul Haque*, Swarup Chandra*, Latifur Khan* and Michael Baron+
* Department of Computer Science, University of Texas at Dallas+ Department of Mathematical Sciences, University of Texas at Dallas
MapReduce Guided Approximate Inference Over
Graphical Models
This material is based upon work supported by
University Of Texas at Dallas
2 University Of Texas at Dallas
AgendaBrief overview on Inference techniquesProblemProposed ApproachesExperimentsDiscussion
3 University Of Texas at Dallas
AgendaBrief overview on Inference techniquesProblemProposed ApproachesExperimentsDiscussion
4 University Of Texas at Dallas
Graphical ModelsA probabilistic graphical model G is a
collection of functions over a set of random variables.
Generally represented as a network of nodes:Each node denoting a random variable (e.g.,
data feature).Each edge denotes relationship between two
random variables.
Two types of representations:Bayesian network is represented by directed
graph.Markov network is represented by undirected
graph.
5 University Of Texas at Dallas
Example Graphical Model
Inference is needed to evaluate Probability of Evidence, Prior and Posterior Marginal, Most Probable Explanation (MPE) and Maximum a Posteriori (MAP) queries.
Probability of Evidence needs to be evaluated in classification problems.
A C (A,C)
0 0 5
0 1 100
1 0 15
1 1 20
A
B
C
D
E
F
(A,C) (C,E)
(D,F)(B,D)
(C,D)(A,B) (E,F)
Sample Factor:
6 University Of Texas at Dallas
Exact InferenceExact Inference algorithms, e.g., Variable Elimination
provide accurate results for Probability of Evidence.
Challenges:Exponential time and space complexity.Computationally intractable on large graphs.
Approximate Inference algorithms are used widely in practice to evaluate queries within resource limit.Sampling based, e.g., Gibbs Sampling, Importance
Sampling.Propagation based, e.g., Iterative Join Graph Propagation.
7 University Of Texas at Dallas
Adaptive Importance Sampling (AIS)
Adaptive Importance Sampling (AIS) is an approximate Inference algorithm where-Samples are generated from a known distribution Q,
called the proposal distribution.Q is updated periodically based on the sample weights.Probability of evidence is evaluated using the samples
generated using the proposal distribution and by calculating the following expected value with respect to Q.
P(E = e) EQ [], where R=X\EConsidering weight of each sample reduces the variance
in expected value due to occurrence of rare events.
8 University Of Texas at Dallas
RB-AISWe focus on a special type of AIS in this paper,
called Rao-Blackwellized Adaptive Importance Sampling (RB-AIS).
In RB-AIS, a set of variables, Xw ⊂ X \ Xe (called w-cutset variables) are sampled.
Xw is chosen in such a way that Exact Inference over X \ Xw, Xe is tractable.Large |Xw| results in quicker evaluation of query
but more erroneous result.Small |Xw| results in more accurate result but
takes more time.Trade off!
V. Gogate and R. Dechter, “Approximate inference algorithms for hybrid bayesian networks with discrete constraints.” in UAI. AUAI Press, 2005, pp. 209–216.
9 University Of Texas at Dallas
RB-AIS : StepsStart
Initial Q on Xw
Generate Samples
Calculate Sample Weights
Update Q and Z
Converge?
End
Yes
No
Ψ 𝑥=𝑉𝐸 (𝐺𝑒𝑤)𝑄(𝑿=𝑥)Q(X = x)
10 University Of Texas at Dallas
AgendaBrief overview on Inference techniquesProblemProposed ApproachesExperimentsDiscussion
11 University Of Texas at Dallas
ProblemReal world applications require good quality
result within the time constraint.Typically, real world networks are large and
complex (i.e., large tree width).For instance, if we want to model facebook users
using graphical models, it will have billions of nodes in it!
Even RB-AIS may run out of time to provide a quality estimate within the time limit.For instance, RB-AIS takes more than 6 hours to
find out a single probability of evidence on a network having only 67 nodes and 271 factors.
12 University Of Texas at Dallas
AgendaBrief overview on Inference techniquesProblemProposed ApproachesExperimentsDiscussion
13 University Of Texas at Dallas
ChallengesTo design a parallel and distributed
approach for RB-AIS, following challenges need to be addressed:RB-AIS updates Q periodically. Since values of Q and Z at iteration i depends
on those values at iteration i -1, a proper synchronization mechanism is needed.
Distributing the task of sample generation on Xw over the worker nodes.
14 University Of Texas at Dallas
Proposed ApproachesWe design and implement two MapReduce based
approaches for distributed and parallel computation of inference queries using RB-AIS.
Distributed Sampling in Mappers (DSM)Parallel sampling.Sequential weight calculation.Each MapReduce Job Unit(MJU) contains only one MapReduce
Job.
Distributed Weight Calculation in Reducers (DWCR)Parallel sampling.Parallel weight calculation.Each MapReduce Job Unit(MJU) contains two MapReduce Jobs.
15 University Of Texas at Dallas
Distributed Sampling in Mappers (DSM)
Reducer
1 (X1, x11, Qi[X1])
n (X1, x1n, Qi[X1])
Shuffle and Sort: aggregate values by keys
X1 Qi[x1]
Map 1
Input to ith MJU: Xw, Qi
X2 Qi[x2] X3 Qi[x3] Xm Qi[xm]-1 Z
1 (X2, x21, Qi[X2])
n (X2, x2n, Qi[X2])
1 (X3, x31, Qi[X3])
n (X3, x3n, Qi[X3])
1 (Xm, xm1, Qi[Xm])
n (Xm, xmn, Qi[Xm])
s (X1, x1s, Q[X1]) (X2, x2s, Q[X2]) (X3, x3s, Q[X3]) (Xm, xms, Q[Xm])
Calculate 1, 2 …n Update Z, and Qi to Qi+1
-1 Z
X1 Qi+1[x1] X2 Qi+1[x2] X3 Qi+1[x3] Xm Qi+1[xm] -1 Z
Combine x1s, x2s…xms to form xs, where s = {1,2…n}
Map 2 Map 3 Map m
16 University Of Texas at Dallas
Distributed Weight Calculation in Reducers (DWCR)
Input to ith MJU: Xw, Qi
Map 1Input: X1 ⊂ Xw
Output: Partial Samplesx1 ∈ X1
Map 2Input: X1 ⊂ Xw
Output: Partial Samplesx2 ∈ X2
Map mInput: Xm ⊂ Xw
Output: Partial Samplesxm ∈ Xm
ReducerUpdate Z, and Qi to
Qi+1
Reducer 1Combine partial Samples
s: xi → x; i ∈ {1….m}Calculate weight Ψx
Reducer 2Combine partial Samples
s: xi → x; i ∈ {1….m}Calculate weight Ψx
Reducer rCombine partial Samples
s: xi → x; i ∈ {1….m}Calculate weight Ψx
Map 1Output Ψx
Map 2Output Ψx
Map jOutput Ψx
17 University Of Texas at Dallas
AgendaBrief overview on Inference techniquesProblemProposed ApproachesExperimentsDiscussion
18 University Of Texas at Dallas
SetupPerformance Metrics:
Speedup = Tsq/Td
Tsq = Execution time of sequential approach.Td = Execution time of distributed approach.
Scaleup = Ts/Tp
Ts = Execution time using single Machine.Tp = Execution time using multiple Machines.
Hadoop version 1.2.1. 8 data nodes, 1 name node.Each machine has 2.2GHz processor and 4 GB of RAM.
NetworkNumber of Nodes
Number of Factors
54.wcsp[
1] 67 271
29.wcsp[
1] 82 462
404.wcsp[1] 100 710
[1] “The probabilistic inference challenge (pic2011),” http://www.cs.huji.ac.il/project/PASCAL/showNet.php, 2011, last updated on 10.23.2014.
19 University Of Texas at Dallas
Speedup
20 University Of Texas at Dallas
Scaleup
21 University Of Texas at Dallas
DiscussionBoth of the approaches achieve substantial speedup
and scaleup comparing with the sequential execution.
DWCR has better speedup and scalability than DSM.
Weight calculation is computationally more expensive than sample generation.
DWCR does both parallel weight calculation and parallel sampling, so it outperforms DSM.
Both of the approaches show similar accuracy to the sequential execution asymptotically.
22 University Of Texas at Dallas
Questions?