PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2,...
-
Upload
clemence-mccormick -
Category
Documents
-
view
217 -
download
0
Transcript of PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2,...
![Page 1: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/1.jpg)
PREDIcT: Towards Predicting the Runtime of Iterative Analytics
Adrian Popescu1, Andrey Balmin2, Vuk Ercegovac3, Anastasia Ailamaki1
1 2 3
![Page 2: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/2.jpg)
Predicting Runtime of Iterative Analytics
2
computation messaging synch
Requirements:• # of iterations• per iteration resources (key features), i.e.,
for Bulk Synchronous Parallel (BSP):
• cost model
Challenges:• dependence on prior iterations• variable resource requirements
Tim
e Ite
ratio
n 1
Workers
Tim
e Ite
ratio
n 2
Partitioned Input
![Page 3: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/3.jpg)
PREDIcT at a Glance
3• Cost model for BSP Execution Model
Reso
urc
es
Iterations
Sample run
Iterations
Actual run
Reso
urc
es
• Transformations: • Input dataset: sampling• Parameters: transform function
• Prediction methodology for iterative analytics on graphs:
Proportionality for resources,similarity for # of iterations
![Page 4: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/4.jpg)
Supported AnalyticsGlobal convergence metric: e.g., an average, a ratio, fix point

Ranking (e.g., PageRank)
Graph processing (e.g., neighborhood estimation)
Graph clustering (e.g., semi-clustering)
![Page 5: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/5.jpg)
Example: PageRank
⇒ Sampling technique
⇒ Transform function

• PageRank of a page: given by the rank of its inbound pages
• Rank computation: iterative
• Convergence: RankChange < τG1. graph structure:
connectivity, degree ratio, diameter
2. parameters: N, τG
1
2 4
3
8
7
6
5G
![Page 6: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/6.jpg)
Sampling: Biased Random Jump
• Variation of Random Jump (RJ) / random walk
Sampling scale-free graphs: e.g., web graphs
11
1
2 3
5
4
6 7
8 9 10
12 13 14
15 16
2 3
5 6
8 9
12 13
11
1
5
4
6 7
8 9
RJ BRJ
• Seed vertices: k high out degree nodes (hubs)
G
Disconnected Connected sampleBRJ: Improving connectivity at the same sampling ratio
![Page 7: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/7.jpg)
Transformations: Preserving Iterations
1 3
8
5S Sampling Ratio (SR) = 50%
1
2 4
3
8
7
6
5G
Convergence: RankChange (G) < τG

τS = τG / SR
Average rank change: RankChange(S) prop. w/ RankChange(G)Transform function T:
Sample and transform function preserve iterations
S maintains: connectivity, in/out degree ratio, effective diameter
![Page 8: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/8.jpg)
Prediction
Cost ModelF (X1,…,Xk)
Extrapolator
Runtime
Scaled features
Profiled features
Sample run Estimated actual run
Two extrapolation factors:• on edges• on vertices
Customized cost model for the Bulk Synchronous Parallel execution model: i.e., Giraph BSP

![Page 9: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/9.jpg)
9
Tim
e Ite
ratio
n 1
Workers
Partitioned Input
Cost Model: Translating Features into Time
Active vertices, message counts
Message counts / sizes,Locality of messages
Skew
computation messaging synch
• Each phase but synch: multivariate linear
regression
• Synchronization: identifying critical path
Bulk Synchronous Parallel Model
![Page 10: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/10.jpg)
Experimental Evaluation• Setup: 10 machines, 6C CPUs Intel X5660, 48GB
RAM, 1Gbps
• Datasets: Real graph datasets: Wikipedia (Wiki), Twitter (TW), UK-2002 (UK), LiveJournal(LJ), with sizes in [1,25] GB
• Representative Algorithms: PageRank (PR), Top-k Ranking and semi-clustering (SC)
• Default transformations: BRJ and Tr = (IDConf, τS = τG / SR)
• Metrics: signed relative error: RE=(Predicted - Actual) * 100 % / Actual (i.e., “+” = over-prediction, “-” = under-prediction) 10
![Page 11: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/11.jpg)
Predicting Features (Iterations)Giraph BSP, 10 machines, real datasets in [1,25] GB

![Page 12: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/12.jpg)
Predicting Features (Iterations)
Predicting iterations for semi-clustering: Ϯ= 0:01(left), and Ϯ = 0:001 (right).
![Page 13: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/13.jpg)
Predicting Features (Iterations)
Predicting key features for top-k ranking: Predicting iterations (left), and predicting remote message bytes (right).
![Page 14: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/14.jpg)
Predicting Features (Iterations)
LJ Wiki UK-2002 Twitter0
13
25
38
50Actual UpBound PREDIcT
Nu
mb
er
of
ite
rati
on
sPageRank
Sampling Ratio = 0.1
PREDIcT reduces relative error from [104, 168]% to [0, 11]%
![Page 15: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/15.jpg)
Predicting Time
0.01
0.05
0.1 0.15
0.2 0.25
-0.7
-0.4
0.0
0.3
0.6LJ Wiki UK-2002
Sampling Ratio
Rela
tive E
rror
Tim
e
Semi-clustering
0.01 0.05 0.1 0.15 0.2 0.25-0.2
0.0
0.1
0.3
0.5LJ Wiki UK-2002
Sampling Ratio
Rela
tive E
rror
Tim
e
Neighborhood estimation
[10, 30]% relative error for 15% sample
Algorithms with variable work/iteration• Cumulated impact of: # of iterations and per iteration
resources
![Page 16: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/16.jpg)
Impact
• PREDIcT: Experimental methodology for estimating key features and runtime for iterative analytics on graphs
• Enables key feature prediction: pluggable transformations, and runtime prediction: cost model
• Accurate empirical solution:• Iterations: [0, 11]% (opposed to [104,168]%)• Time: [10, 30]%
http://dias.epfl.ch/predict
Thank you!
![Page 17: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/17.jpg)
Backup slides
17
![Page 18: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/18.jpg)
Cost Model: Model Fitting
Multivariate regression
Pool of BSP features
Model Fitting
Historical runs
• Training data: sample run + historical runs (if such runs exist)
• Customizable cost model (per input algorithm)
F (X1,…,Xk)
Sample run
18
![Page 19: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/19.jpg)
Cost Model
compute message sync
Itera
tion
W1 W2W3
Active vertices,Message countsMessage counts,Message sizes,Locality of messagesPartitioning scheme / skew
• Bulk Synchronous Parallel execution model
• Specialized for network intensive algorithms
• Each phase but sync: multivariate regression
• Synchronization modeled implicitly

Customized Cost Model for Bulk Synchronous Parallel Execution Model
![Page 20: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/20.jpg)
Feasibility Analysis
PR (UK) PR (TW) SC (UK) CC (UK) CC (TW)0
1250
2500
3750
5000
Actual runSample run
Runti
me (
sec)
20Feasible for algorithms dominated by iteration time
![Page 21: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/21.jpg)
Context: BSP Processing Model
Giraph BSPW1
W2
W3
W4
Vertex centric model: Each vertex performs local processing, then messaging
Algorithms in BSP are inherently iterative
Itera
tion
W1 W2W3
compute message sync
Bulk Synchronous Parallel (BSP) W4
![Page 22: PREDIcT: Towards Predicting the Runtime of Iterative Analytics Adrian Popescu 1, Andrey Balmin 2, Vuk Ercegovac 3, Anastasia Ailamaki 1 1 2 3.](https://reader036.fdocuments.in/reader036/viewer/2022062517/56649f145503460f94c28642/html5/thumbnails/22.jpg)
Prediction
Cost ModelF (X1,…,Xk)
Extrapolator
Runtime
Scaled features
Profiled features
Sample run Estimated actual run
Two extrapolation factors:• on edges• on vertices
Customized cost model for the Bulk Synchronous Parallel execution model: i.e., Giraph BSP