Grid parallelization and tests
description
Transcript of Grid parallelization and tests
1
Grid parallelization and tests
CERN
GRACE Final ReviewAmsterdam, 15-16 February 2005
2GRACE Review February 2005 - Amsterdam
1. Two GRACE Grid integration models: M1, M2
2. Pre-conditions for the tests
3. Work performed
4. General test results
5. Model 1 test results
6. Simulation of Model 2
7. Model 2 tests results
8. Comparison
9. Conclusions
Contents
3GRACE Review February 2005 - Amsterdam
Application workflow
Single search Grid workflow M1M2
Approach used: M1 - M2
4GRACE Review February 2005 - Amsterdam
• Adopted Content and Categorization Engines release 4.45. These components have been later on improved and optimized by the partners
• A convenient testing corpus of documents has been selected (English documents, correct pdf to txt conversion, small and large sizes)
• Configuration problems of GILDA replica manager have been solved (intervention of site administrators)
• Search result set size is considered in average between 0.1 and 4 MBs of text
• The Usage of DAG for the job model in GILDA has been discarded
Pre-conditions
5GRACE Review February 2005 - Amsterdam
• Preparation of a test plan and report template
• Creation of the testing corpus of documents
• Verification of testing pre-conditions
• Creation of the test scripts for semi-automatic testing
• Testing on Gilda testbed
• Creation of scripts for validation of output and parsing of logging
• Collection and analysis of the results
Work performed
6GRACE Review February 2005 - Amsterdam
M1 M2 General Total
Total number of jobs submitted 58 727 395 1180
• general (RM, RB, functional, etc.) tests started in October 2004 • main testing period November 2004 • submitted more than 1000 jobs
Testing: job submission
Model 1General tests
Model 2
7GRACE Review February 2005 - Amsterdam
V1 Input Data size
V2 Worker Node Specifications
V3 Number of parallel jobs
V1 ID 0 1 2 3 4 5 6
Size 0,1 MB 0,5 MB 1,0 MB 1,5 MB 2,0 MB 3,0 MB 4,0 MB
V3 ID 0 1 2 3 4 5 6 7 8 9 10 11 12 13
JobsN 1 2 3 4 5 6 7 8 9 10 11 12 14 16
Variable Parameters
V2 ID 0 / “Spec1” 1 / “Spec2” 2 / “Spec3”
Specifications PIV 2.4 GHz, 512 MB RAM PIII 800 MHz, 1GB RAM PIII 1000 MHz, 2 GB RAM
Comment The fastest machine in the GILDA testbed
The slowest machine in the GILDA testbed
The most common machine in the GILDA testbed
8GRACE Review February 2005 - Amsterdam
G1 Total Execution Time Execution time (P4) as a function of input data size (V1) on worker nodes with different specifications (V2).
G2 Detailed Execution Time Execution time (P4) as a function of input data size (V1) split by text normalization and categorization. V2 is fixed.
G3 Output Size Output size (P9) as a function of input data size (V1).
G4 UI Waiting Time UI waiting time (P7) as a function of the number of sub-jobs (V3) with fixed input data size (V1).
G5 Spent Computing Time Spent computing time (P8) as a function of the number of sub-jobs (V3) with fixed input size (V1).
G6 Optimal number of Jobs Optimal number of jobs (FN1, FN2) as a function of the input size (V1).
G7 Optimal UI Waiting Time UI waiting time (P7) as a function of the input size (V1) when applying the optimal splitting (FN1, FN2).
G8 Spent Computing Time Optimal Spent Computing Time (P8) as a function of the input size (V1) when applying the optimal splitting (FN1, FN2).
Graphs
9GRACE Review February 2005 - Amsterdam
M1 - G1
0,0
5,0
10,0
15,0
20,0
25,0
30,0
35,0
40,0
0 0,5 1 1,5 2 2,5 3 3,5 4
InputSize in MB
Exe
cuti
on
Tim
e in
Ho
urs
Spec1 Spec2 Spec3
M1 - G2
0
5
10
15
20
25
30
35
0 0,5 1 1,5 2 2,5 3 3,5 4
InputSize in MB
Exe
cuti
on
Tim
e in
Ho
urs
Categorization
Normalization
M1 - G3
0
500
1000
1500
2000
2500
3000
0 0,5 1 1,5 2 2,5 3 3,5 4
InputSize in MB
Siz
e in
KB
Categories OutputSandbox (compressed) Index Files
Jobs Per Day (triggered, probably executed later)
0
50
100
150
200
250
5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
Day (November)
Nu
mb
er o
f Jo
bs
M1 Jobs M2 Jobs General Jobs M2 - G2 - V1=Spec3 (TT3B)
0,00
0,50
1,00
1,50
2,00
2,50
3,00
3,50
4,00
4,50
5,00
0,5 1 1,5 2 2,5 3 3,5 4
InputSize in MB
Tim
e in
Ho
urs
CategorizationEngine
ContentEngine
M2 - G6 - V2=Spec3 (TT3B)
0
2
4
6
8
10
12
14
16
18
0 0,5 1 1,5 2 2,5 3 3,5 4
InputSize in MB
Op
tim
al n
um
be
r o
f jo
bs
M2 - G7
0,00
1,00
2,00
3,00
4,00
5,00
6,00
7,00
0 0,5 1 1,5 2 2,5 3 3,5 4
InputSize in MB
UI W
ait
ing
Tim
e in
Ho
urs
V2=Spec3
V2=Spec2
V2=*
M2 - Spent computing time with optimal number of jobs
0,00
10,00
20,00
30,00
40,00
50,00
60,00
0 0,5 1 1,5 2 2,5 3 3,5 4
InputSize in MB
Sp
en
t C
om
pu
tin
g T
ime
in H
ou
rs
V2=Spec3
V2=Spec2
V2=*
Comparing P8: M1 - M2, V2=Spec3
0,0
5,0
10,0
15,0
20,0
25,0
30,0
35,0
40,0
45,0
0 0,5 1 1,5 2 2,5 3 3,5 4
InputSize in MB
Sp
en
t C
om
pu
tin
g T
ime
in H
ou
rs
M1
M2
ResultsResults collected and published on a study and test report
10GRACE Review February 2005 - Amsterdam
General tests
11GRACE Review February 2005 - Amsterdam
F1 Job submission Submission of a job to the Grid
F2 Job Status check Status checking while the job is running
F3 Results Retrieval Retrieving the output sandbox after successful execution
F4 Results Validation Validate that the results are complete. Output files exist and ane not empty: indexes, NDF, categories
F5 Error Testing Testing if error conditions return the proper error messages:
Input data not available, GRACE application not available,
ContentEngine failure, CategorizationEngine failure
The functional tests were successful. Problems related to the Grid nodes configuration were experienced and fixed:
• RB Configuration Problems• RM/SE Configuration Problems
Functional tests
12GRACE Review February 2005 - Amsterdam
M1 [sec] M2 [sec]
P1 Job submission time 66,9 ± 23,9 34,9 ± 7,3
P2 Job brokering time 28,5 ± 5,1 26,6 ± 3,9
P3 Job queuing time 72,5 ± 19,0 68,6 ± 21,8
P4 Job execution time 0,69 + 7,88 * I See graphs
P5 Job retrieving time 18,1 ± 5,9 17,6 ± 1,4
Average Grid overhead 3.1 min 2.5 min
Depends on input data size
On empty queues
Depends on output data size
Depends on GRACE performance
I = Input Size in MB
Variable
Performance tests (I)
13GRACE Review February 2005 - Amsterdam
Grid overhead
Grid overhead is 3 minutes in average
submission
brokering
queuing
retrieving
14GRACE Review February 2005 - Amsterdam
M1 M2
P6 Job failure rate 19,0 % 15,3 %
11 failed out of 58 jobs 22 failed out of 144 jobs
We identified as main cause of failure the misbehavior of the resource broker (RB) which needed re-initialization (performed by the GILDA team).
After re-initialization 23 jobs were executed, all successfully
Aborted at the broker.
Not considered: failures due to one CE which broke
Job success rate > 80%
Performance tests (II)
The Grid performed well, job success rate > 80%
15GRACE Review February 2005 - Amsterdam
Model 1
16GRACE Review February 2005 - Amsterdam
0,0
5,0
10,0
15,0
20,0
25,0
30,0
35,0
40,0
0 0,5 1 1,5 2 2,5 3 3,5 4
InputSize in MB
Exe
cuti
on
Tim
e in
Ho
urs
Spec1 Spec2 Spec3
Tests performed on machines with different specifications
The normalization job is the most demanding
M1 performaces: execution time/input size
CategorizationNormalization
17GRACE Review February 2005 - Amsterdam
Model 2
18GRACE Review February 2005 - Amsterdam
• Search results are split outside the Grid
• Grid parallel jobs execute Text normalization
• Jobs are monitored for status
• Results are stored on the Grid (Replica Manager)
• Grid Categorization job executes:– normalized documents merging from SEs– categorization processing
• Job is monitored and results retrieved
M2 description
19GRACE Review February 2005 - Amsterdam
Kopt Ideal optimal splitting number
infinite-worker-nodes Grid
any splitting is possible
Function which minimizes the UI waiting time with resource-saving parameter α
Keff Real optimal splitting number
available worker nodes
input data file size
splitting sequence
Kopt considering constraints
M2 Simulation
Increase due to job submission
overhead
KoptKopt1
α {
Kopt2
α
Computing timeWaiting time at UI
20GRACE Review February 2005 - Amsterdam
M2 performances
execution time/n. of parallel jobs
execution time/input size
Grid overheadUI waiting time
CategorizationNormalization
Input size=2MB
Splitting parameter = 9
21GRACE Review February 2005 - Amsterdam
Comparison M1 and M2
execution time/input size
Model 1
Model 2
computing time/input size
Model 1
Model 2
22GRACE Review February 2005 - Amsterdam
• Grid performed well: low failure rate, prompt reply of Grid administrators to problems, good coordination with Gilda team
• Parallelization proved to improve application performances and lower the query failure rate
Conclusions