Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids...
-
Upload
joella-jefferson -
Category
Documents
-
view
222 -
download
1
Transcript of Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids...
![Page 1: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/1.jpg)
1Euro-Par 2007, Rennes, 29th August
The Characteristics and Performance of Groups of Jobs in
Grids
Alexandru Iosup, Mathieu Jan*, Ozan Sonmez and Dick Epema
PDS GroupDelft University of Technology
The Netherlands
*: now postdoc LRI/INRIA Futurs, Orsay (Paris South), France
![Page 2: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/2.jpg)
Euro-Par 2007, Rennes, 29th August 2
Outline
• Why looking at groups of jobs?
• Grid traces and environment summary
• Definitions of groups of jobs
• The characteristics of jobs grouping• Workload-level analysis• Group-level analysis• Job-level analysis
• Conclusion and future work
![Page 3: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/3.jpg)
Euro-Par 2007, Rennes, 29th August 3
Why looking at groups of jobs?
• Current grids run almost exclusive single-node jobs [Grid2006]• Traces analysis: LCG, Grid3, TeraGrid, DAS-2
• How jobs are related then? What is their structure?• Batches of identical jobs?• Something else?
• No such analysis using long-term data from production and research grid environment
• No analysis of the impact of groups of jobs on the performance of grids
![Page 4: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/4.jpg)
Euro-Par 2007, Rennes, 29th August 4
Our research questions
• What are the dependencies among the jobs submitted by a single user?
• What is the physical structure of such groupings?
• What is the impact of the job groupings on the performance of grids?
![Page 5: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/5.jpg)
Euro-Par 2007, Rennes, 29th August 5
Grid traces: Grid’5000 (1/3)
• Experimental platform• Grid’5000: 9 sites, 15 clusters• All clusters managed by OAR
• Trace period: 05/2004 - 11/2006• CPUs: ~ 2500• Jobs: 951 K• Users: 473• Groups: 10• Consumed CPU time: 651 years
![Page 6: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/6.jpg)
Euro-Par 2007, Rennes, 29th August 6
Grid traces: NorduGrid (2/3)
• Large scale production grid • NorduGrid: ~75 sites• Handled via ARC middleware
• Advanced Resource Connector
• Trace period: 05/2004 - 02/2006• CPUs: ~ 2000• Jobs: 781 K• Users: 387• Groups: 106• Consumed CPU time: 2443 years
![Page 7: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/7.jpg)
Euro-Par 2007, Rennes, 29th August 7
Grid traces: GLOW (3/3)
• Grid Laboratory Of Wisconsin• Campus wide distributed computing
environment• Condor based
• Trace period: 09/2006 - 01/2007• CPUs: ~ 1400• Jobs: 216 K• Users: 18• Groups: 1• Consumed CPU time: 55 years
![Page 8: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/8.jpg)
Euro-Par 2007, Rennes, 29th August 8
Grid traces summary
Period 05/2004 - 11/2006
05/2004 - 02/2006
09/2006 - 01/2007
Sites 15 ~75 1
CPUs ~2500 ~2000 ~1400
Jobs 951 K 781 K 216 K
Groups 10 106 1
Users 473 387 18
Consumed CPU time
651 years 2443 years 55 years
![Page 9: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/9.jpg)
Euro-Par 2007, Rennes, 29th August 9
Groups of jobs: definitions (1/2)
• Batch submission
Maximal contiguous subsequence G of such that for any two successive jobs J, J’ in G
• Parameter Sweep Application (PSA)• Batch submission + jobs execute the same application
![Page 10: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/10.jpg)
Euro-Par 2007, Rennes, 29th August 10
Groups of jobs: definitions (2/2)
• In this talk, we focus on batch submissions
![Page 11: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/11.jpg)
Euro-Par 2007, Rennes, 29th August 11
Characteristics of jobs groupings
• In our analysis, = 120 seconds
![Page 12: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/12.jpg)
Euro-Par 2007, Rennes, 29th August 12
Workload-level analysis
Grid’5000 NorduGrid GLOW
Submissions
26k 50k 13k
Jobs 808k (951k)
738k (781k) 205k (216k)
CPU time 193y (651y)
2192y (2443y)
53y (55y)
• Batches
• Continued• NorduGrid & GLOW: identical to batches• Grid’5000: 14k sub, 910k jobs, 462y
• Bursty: less submissions, more jobs
![Page 13: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/13.jpg)
Euro-Par 2007, Rennes, 29th August 13
Group-level analysis: size of batches
• 75% of batches are size 15-20 (Grid’5000 and NorduGrid) or <10 (GLOW)• Average: 31+/-110 (Grid’5000), 15+/-33 (NorduGrid) and 15+/-38 (GLOW)• Heavy-tail distribution
![Page 14: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/14.jpg)
Euro-Par 2007, Rennes, 29th August 14
Group-level analysis: inter-arrival time (seconds)
• Expected high inter-arrival time for batches• 50% of the values are between 400 and 700 seconds• Reminder: = 120 seconds
![Page 15: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/15.jpg)
Euro-Par 2007, Rennes, 29th August 15
Group-level analysis: duration (seconds)
• Duration of batches are higher than for single jobs• For NorduGrid, average duration of batches is 1.5 day vs. 1
day for single jobs
![Page 16: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/16.jpg)
Euro-Par 2007, Rennes, 29th August 16
Group-level analysis: consumed CPU time (KCPUs)
• Consumed CPU time is much higher for batches than for single jobs!
![Page 17: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/17.jpg)
Euro-Par 2007, Rennes, 29th August 17
Job-level analysis: run time (seconds)
• Average run time for batches• Grid’5000: 0.66+/-6.65 days• GLOW: 1.04+/-3.18 days• NorduGrid: 2.27+/-5.59 days
![Page 18: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/18.jpg)
Euro-Par 2007, Rennes, 29th August 18
Job-level analysis: wait time (seconds)
• NorduGrid: no wait time information in the trace • Average wait times of batches are higher than
• The runtime of batches• The wait time of single jobs
![Page 19: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/19.jpg)
Euro-Par 2007, Rennes, 29th August 19
Job-level analysis: consumed CPU time (KCPUs)
• No clear distinction between batches and single jobs
![Page 20: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/20.jpg)
Euro-Par 2007, Rennes, 29th August 20
Other analyses
• Do parallel jobs inside batches exists?• Average parallelism: 1+/-1 (Grid’5000), 2+/-7 (NorduGrid)
and 1 (GLOW)• Grid’5000: 37% of batches are of size 2, 9% of size >2,
max. = 325
• To what extend batches are PSAs?• In Grid’5000, 75% of batches are PSAs• PSAs compared to batches:
• Increased grouped size by 9 in average• Average duration time divided by 5.7
![Page 21: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/21.jpg)
Euro-Par 2007, Rennes, 29th August 21
Performance impact of grouped submissions
• Batches display an high AIT value• Over 4000% of the ART!
• Research direction for designing scheduling policies for batches: minimization of the AIT of batches
• Performances metrics• Group runtime (RT)• Group duration (DT)• Group idle time: IT = DT - RT
Batches Single jobs
ART (s) AIT (s) ART (s) AIT (s)
Grid’5000
14 181 568 483 4 127 4 233
![Page 22: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/22.jpg)
Euro-Par 2007, Rennes, 29th August 22
Conclusion & future work
• Formally defined 3 types of groups of jobs• Batch (and PSAs), continued and bursty
• Analysis of 3 long-term traces from large and different platforms• Up to 96% of CPU time consumed by batch submissions
• Performance analysis of batches compared to single jobs
• Future work • Deeper analysis (Grid Workloads Archives)• Research direction: minimization of idle time in groups• Trace driven simulations• Dynamic resource availability [Grid2007]
![Page 23: Euro-Par 2007, Rennes, 29th August 1 The Characteristics and Performance of Groups of Jobs in Grids Alexandru Iosup, Mathieu Jan *, Ozan Sonmez and Dick.](https://reader030.fdocuments.in/reader030/viewer/2022032707/56649e0c5503460f94af5326/html5/thumbnails/23.jpg)
Euro-Par 2007, Rennes, 29th August 23
Thank you! Questions? Remarks? Observations?
Help building our community’sGrid Workloads Archive:
http://gwa.ewi.tudelft.nl/