Computer Science and Engineering Predicting Performance for Grid-Based...

[email protected]. 1

Computer Science and Engineering

IPDPS’07 Predicting Performance for Grid-Based Datamining

A Performance Prediction Framework for Grid-Based Data Mining Applications

Leonid Glimcher

Gagan Agrawal




Motivating Scenario

Data Repository Clusters

Compute Clusters

User?

3 stages:•Disk i/o,•Network,•Compute.




Remote Data Analysis

• Remote data analysis– Grid is a good fit– Details can be very tedious

• Middleware abstracts away lots of development details

• Resource selection – crucial to performance• Performance prediction facilitates resource

selection




Presentation Road Map

• Problem statement and motivation• Middleware background• Our performance prediction approach• Experimental evaluation• Related work• Conclusions




Problem Statement

Given: Parallel data processing application Execution time break-down (profile) Configurations of available computing resources Dataset replicas in different size repositories

Predict application execution time in order to select right dataset replica and resource configuration




FREERIDE-G Design




FREERIDE-G Processing

KEY observation: most data mining algorithms follow canonical loop

Middleware API: • Subset of data to be

processed• Reduction object • Local and global reduction

operations • Iterator

While( ) {

forall( data instances d) {

I = process(d)

R(I) = R(I) op d

}

…….

}




Performance Prediction Approach

• 3 Phases of execution:– Retrieval at data server– Data delivery to compute node– Parallel processing at compute node

• Special processing structure:– Generalized reduction

Texec = Tdisk + Tnetwork + Tcompute




Needed profile information

Numbers of storage nodes (n) compute nodes (c)

Available bandwidth between these (b), in profile configuration

Execution time breakdown: data retrieval (td)

network communication (tn)

data processing (tc) components

Dataset size (s)

Reduction object information: maximum size communication time

Global reduction time




Data Retrieval and Communication Time

Data Retrieval:

Dataset size (s) and number of data hosts (n) for base profile and predicted configuration (s’ and n’).

Used to scale td.

Data Communication:

Also need dataset size and number of data hosts, as well as bandwidth (b and b’).

Used to scale tn.

tT nnetwtork b

b

n

n

s

s

''

'




Initial Data Processing Time Prediction

Dataset size (s) and number of compute nodes (c):

• base profile (s,c) • predicted profile (s’, c’)

Used to scale up tc.

Limitations – not modeling:• Inter-processor

communication time• Global reduction time

ccompute tc

c

s

sT

'

'




Modeling Interprocessor Communication

• Parallel computation involves communication of reduction object

• Communication time (Tro)• Reduction object size (r)• Interprocessor bandwidth (w)• Latency (l)• Reduction object size either

remains constant or scales linearly Tt roc

T '

lrwT ro

^

''

'TT rocompute

Tc

c

s

s

ccompute tc

c

s

sT

'

'




Modeling Global Reduction

• Global reduction time (Tg) is also serialized

• Depending on application, global reduction time:

– Scales linearly with number of nodes but is constant independent of size

– Stays constant independent of number of nodes, but scales linearly with data size

TTt grocT "

^^

"'

'TTT grocompute

Tc

c

s

s

^

''

'TT rocompute

Tc

c

s

s




Modeling Across Heterogeneous Clusters

Need scaling factors for all 3 stages of computation (from a set of representative applications).

3/)(

3

3

2

2

1

1

TT

TT

TT

sdisk

disk

disk

disk

disk

disk

A

B

A

B

A

B

d

^^^^

TsTsTsT computenetworkdiskexec AcAnAdB




FREERIDE-G Applications

Data mining:• K-means clustering• KNN search• EM clustering

Scientific data processing:• Vortex extraction (right)• Molecular defect detection

and categorization




Experimental Setup

Base:700 MHz Pentiums connected through Myrinet LaNai 7.0

Heterogeneous prediction:2.4 GHz Opteron 250’s connected through Infiniband (1Gb)

Goal – to correctly model changes in:1. Parallel configuration2. Dataset size3. Network bandwidth4. Underlying resources

TTTexact

predictedexactError||




Modeling Parallel Performance

Errors for 3 approaches for:

1. Vortex detection, base:• 1-1 configuration• 710 MB dataset

2. Defect detection, base:• 1-1 configuration• 130 MB dataset

Results:• modeling reduction pays

off• accurate predictions

Vortex Detection (base: 1-1 configuration, 710MB dataset)

0.00%

0.50%

1.00%

1.50%

2.00%

2.50%

3.00%

3.50%

4.00%

4.50%

5.00%

1 cn 2 cn 4 cn 8 cn 16 cn 2 cn 4 cn 8 cn 16 cn 4 cn 8 cn 16 cn 8 cn 16 cn

1 2 4 8Number of data nodes

Re

lati

ve

pre

dic

tio

n e

rro

r %

no communicationreduction communicationglobal reduction

Molecular Defect Detection (base: 1-1 configuration, 130MB dataset)

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

9.00%

10.00%



Re

lati

ve

pre

dic

tio

n e

rro

r % no communication

reduction communicationglobal reduction




Modeling Dataset SizeEM clustering (base: 1-1 configuration/350 MB, predicted: 1.4 GB dataset)

0.00%

1.00%

2.00%

3.00%



Re

lati

ve

pre

dic

tio

n e

rro

r %

global reduction

Molecular Defect Detection (base: 1-1 configuration/130MB dataset; predicting: 1.8 GB dataset)

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%



Re

lati

ve

pre

dic

tio

n e

rro

r %

global reduction

Errors for 1 (best) approach for:1. EM clustering (1.4 GB) , base:

• 1-1 configuration• 350 MB dataset

2. Defect detection (1.8 GB), base:• 1-1 configuration• 130 MB dataset

Results:• biggest error when number of

data nodes is same as number of compute nodes

• accurate predictions




Impact of Network Bandwidth

Errors for 1 (best) approach for:1. EM clustering (250 Kbps) ,

base:• 1-1 configuration• 500 Kbps

2. Defect detection (250 Kbps), base:• 1-1 configuration• 500 Kbps

Results:• biggest error when number of

data nodes is same as number of compute nodes

• Modeling reduction is most accurate

EM clustering (base: 4-4 configuration/1.4GB dataset; predicting: 130 MB dataset)

0.00%

0.50%

1.00%

1.50%

2.00%



Re

lati

ve

pre

dic

tio

n e

rro

r %

global reduction

Molecular Defect Detection (base: 4-4 configuration/1.8 GB dataset; predicting: 350 dataset)

0.00%

0.50%

1.00%

1.50%



Re

lati

ve

pre

dic

tio

n e

rro

r %

global reduction




Predictions for different type of cluster

Errors for 1 (best) approach for:1. Defect detection (1.8 GB) ,

base:• 1-1 configuration• 710 MB dataset

2. EM clustering (700 MB), base:• 8-8 configuration• 350 MB dataset

Results:• Scaling factors different• Largest error when predicted

configuration has same number of compute nodes as base

Molecular Defect Detection (base: 4-4 configuration, 130MB dataset;prediction: 1.8 GB dataset)

0.00%

5.00%

10.00%

15.00%

20.00%

25.00%



Re

lati

ve

pre

dic

tio

n e

rro

r %

global reduction

EM clustering (base: 8-8 configuration, 350 MB dataset; prediction: 700 MB dataset)

0.00%

1.00%

2.00%

3.00%

4.00%

5.00%

6.00%

7.00%

8.00%

9.00%

10.00%



Re

lati

ve

pre

dic

tio

n e

rro

r %

global reduction




Existing Work

3 broad categories for resource allocation: Heuristic approach to mapping Prediction through modeling:

Statistical estimation/predictionAnalytical modeling of parallel

application Simulation based performance prediction




Summary

• Performance prediction approach • Exploits similarities in application processing

structure to come up with very accurate results• Approach accurately models changes in:

– Computing configuration– Dataset size– Network bandwidth– Underlying compute resources

Computer Science and Engineering Predicting Performance for Grid-Based...

Documents

Transcript of Computer Science and Engineering Predicting Performance for Grid-Based...