New Seaborg Queue Configuration Results May 29, 2003 David Turner NERSC User Services Group
description
Transcript of New Seaborg Queue Configuration Results May 29, 2003 David Turner NERSC User Services Group
New Seaborg Queue Configuration
Results
May 29, 2003
David TurnerNERSC User Services Group
510-486-4027
Introduction
• Review of LoadLeveler Class Structure— NERSC-3 classes — Proposed NERSC-3 Extended classes— Current NERSC-3 Extended classes— Objectives of current class structure
• Effects of Current Structure— Connect time
wallclock * nodes * 16— Wait time
start time - submit time— Connect time / Wait time
• Conclusions
NERSC-3 Class Structure
Class Nodes (Procs) Time Limit
interactive 8 (128) 30 min
debug 16 (256) 30 min
premium 128 (2,048) 8 hrs
regular 128 (2,048) 8 hrs
low 128 (2,048) 8 hrs
regular_long 32 (512) 24 hrs
Proposed Class Structure
Class Nodes (Procs) Time Priority
interactive 8 (128) 30 min 1
debug 24 (384) 30 min 1
premium 256 (4,096) 12 hrs 2
large 32 – 256 (512 – 4,096) 48 hrs 3
regular 256 (4,096) 12 hrs 4
regular_long 32 (512) 24 hrs 4
low 256 (4,096) 12 hrs 5
Other Proposed Changes
• Various limit adjustments— Increase user run limit from 4 to 6— Eliminate class limit of 7 in regular_long— Retain 1 running, 1 queued limit in regular_long
• Eliminate aging— Incompatible with class priorities
• Schedule lowest load average and smallest memory nodes first
• Tune scheduling parameters to maintain responsiveness
Current Class Structure
Submit Class
LLClass
Max Nodes
MaxProcs
Max Hours
RelativePriority
interactive interactive 1–8 1–128 0.5 1
debug debug 1–24 1–384 0.5 2
premium pre_1 1–31 1–496 12 7
pre_32 32–127 497–2032 48 5
pre_128 128–380 2033–6080 48 3
regular reg_1 1–31 1–496 12 8
reg_1l 1–31 1–496 24 8
reg_32 32–127 497–2032 48 6
reg_128 128–380 2033–6080 48 4
low low 1–128 1–2048 12 9
General Batch Policies
• Each user may have:— 6 jobs running— 10 jobs considered for scheduling (idle state)— 30 jobs submitted
• The class run limit for reg_1l is 15 jobs• Jobs requesting 8 hours or less will complete
before scheduled outages• Jobs placed on “user hold” (status HU) will be
removed after one week
Objectives of Class Structure
• Allow 4096-way jobs— Current MPI maximum
• Favor “large” jobs• Provide longer time limit for “regular” jobs• Provide more resources to “long” jobs
— Allow greater access
• Provide more resources to interactive and debug jobs— As needed
All while maintaining system responsiveness
N3 vs. N3E
• N3— October 1, 2002 – March 2, 2003— 153 days
• N3E— March 3, 2003 – May 20, 2003— 79 days
Jobs Per Week
Nodes N3 N3E
1 - 15 9208.6 8534.7
16 - 31 143.9 253.3
32 - 63 13.7 88.9
64 - 127 12.4 48.0
128+ 3.6 17.8
Total 9382.2 8942.7
Connect Time vs. Class
0
10
20
30
40
50
60
70
80
90
%
Low Regular Premium
N3
N3E
Charge Class
Connect Time vs. Size
0
10
20
30
40
50
60
70
80
%
1-15 16-31 32-63 64-127 128+
N3
N3E
Number of Nodes
Wait Time vs. Size
0:00:00
4:00:00
8:00:00
12:00:00
16:00:00
20:00:00
24:00:00
1-15 16-31 32-63 64-127 128+
N3
N3E
Number of Nodes
Connect Time / Wait Time
0.00
150.00
300.00
450.00
600.00
750.00
900.00
1-15 16-31 32-63 64-127 128+
N3
N3E
Number of Nodes
Conclusions
• Users running larger jobs• Users running longer jobs• Interactive and debug throughput maintained
Resources I
http://hpcf.nersc.gov/running_jobs/ibm/llsum/summary.php
Resources II
http://hpcf.nersc.gov/running_jobs/ibm/llsum/
End of Talk
This slide intentionally left blank.