Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug...
Transcript of Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug...
![Page 2: Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug queue with fastest turnaround time •Users can attain higher priority with higher](https://reader034.fdocuments.in/reader034/viewer/2022052008/601d9823d57aa52fd95a7bb8/html5/thumbnails/2.jpg)
Agenda
• Job Scheduling Goals• Understanding the Needs of the Users• Configuration Parameterization• Incentivizing User Behavior• Topology Awareness• Weekly Resource Management Discussion
2Managing HPC Systems and Centers
![Page 3: Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug queue with fastest turnaround time •Users can attain higher priority with higher](https://reader034.fdocuments.in/reader034/viewer/2022052008/601d9823d57aa52fd95a7bb8/html5/thumbnails/3.jpg)
Job Scheduling Goals
• Know and prioritize goals to measure against• Scheduling goals are frequently in conflict: it’s a balancing act• If there is dissatisfaction with the scheduler
1. Identify if goals are being met by configuration2. Question whether the dissatisfaction is tolerable, or if goals need
adjustment3. If goals are adjusted, then adjust configuration to match and then
monitor
3Managing HPC Systems and Centers
![Page 4: Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug queue with fastest turnaround time •Users can attain higher priority with higher](https://reader034.fdocuments.in/reader034/viewer/2022052008/601d9823d57aa52fd95a7bb8/html5/thumbnails/4.jpg)
Job Scheduling Goals (on Blue Waters)
• No user or project is favored by policy• Large jobs have higher priority• System is highly utilized• Job turnaround time is minimized• Debug queue with fastest turnaround time• Users can attain higher priority with higher charge to allocation• Scheduler commands are responsive• Predictable job start times
4Managing HPC Systems and Centers
![Page 5: Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug queue with fastest turnaround time •Users can attain higher priority with higher](https://reader034.fdocuments.in/reader034/viewer/2022052008/601d9823d57aa52fd95a7bb8/html5/thumbnails/5.jpg)
Understanding the Needs of the Users• Evaluate requirements of users
• Wall clock time• Job turn around time• System availability• Multitenancy
• Variables Beyond Control• Job geometry (requested resources, such as walltime or nodes)• Job volume submitted• Walltime accuracy• Application stability
5Managing HPC Systems and Centers
![Page 6: Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug queue with fastest turnaround time •Users can attain higher priority with higher](https://reader034.fdocuments.in/reader034/viewer/2022052008/601d9823d57aa52fd95a7bb8/html5/thumbnails/6.jpg)
Configuration Parameterization
• Identify the tools to manipulate scheduling behavior• QoS, Queues, Reservations, Fairshare
• Avoid unnecessarily complex configurations• Queues might be configured for varying:
• Priority• Time• Job size• Resource type
6Managing HPC Systems and Centers
![Page 7: Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug queue with fastest turnaround time •Users can attain higher priority with higher](https://reader034.fdocuments.in/reader034/viewer/2022052008/601d9823d57aa52fd95a7bb8/html5/thumbnails/7.jpg)
Incentivizing User Behavior
• Discounts provide user incentives to encourage a submission behavior• This can be as easy as changing charge factor for a specific queue
• Examples: Seasonal submission lull, specific job sizes, preemptible queues, backfillable job
• Scheduler product built-ins will vary – custom efforts sometimes necessary
7Managing HPC Systems and Centers
![Page 8: Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug queue with fastest turnaround time •Users can attain higher priority with higher](https://reader034.fdocuments.in/reader034/viewer/2022052008/601d9823d57aa52fd95a7bb8/html5/thumbnails/8.jpg)
Topology Awareness
• Placing jobs in network locations optimal for tightly coupled communication
• Can be beneficial to some applications by improving performance and runtime consistency
• Represents a constraint and can affect turnaround time• May reduce utilization, but increase overall throughput through
average performance enhancement
8Managing HPC Systems and Centers
![Page 9: Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug queue with fastest turnaround time •Users can attain higher priority with higher](https://reader034.fdocuments.in/reader034/viewer/2022052008/601d9823d57aa52fd95a7bb8/html5/thumbnails/9.jpg)
Weekly Resource Management Discussion
• Review tickets submitted that are scheduling related• View storage utilization and usage• View system utilization• Look at wait times per queue and per user• Look at scheduler performance (response time)• Review for any user behavior that could potentially affect system
procedures and policy
9Managing HPC Systems and Centers
![Page 10: Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug queue with fastest turnaround time •Users can attain higher priority with higher](https://reader034.fdocuments.in/reader034/viewer/2022052008/601d9823d57aa52fd95a7bb8/html5/thumbnails/10.jpg)
10Managing HPC Systems and Centers
Filesystem Activity
![Page 11: Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug queue with fastest turnaround time •Users can attain higher priority with higher](https://reader034.fdocuments.in/reader034/viewer/2022052008/601d9823d57aa52fd95a7bb8/html5/thumbnails/11.jpg)
Filesystem Load and Response Time
11Managing HPC Systems and Centers
![Page 12: Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug queue with fastest turnaround time •Users can attain higher priority with higher](https://reader034.fdocuments.in/reader034/viewer/2022052008/601d9823d57aa52fd95a7bb8/html5/thumbnails/12.jpg)
12Managing HPC Systems and Centers
Wait Times (Xdmod) and Historical Utilization
![Page 13: Resource Management on Blue Waters · 2020. 7. 13. · •Job turnaround time is minimized •Debug queue with fastest turnaround time •Users can attain higher priority with higher](https://reader034.fdocuments.in/reader034/viewer/2022052008/601d9823d57aa52fd95a7bb8/html5/thumbnails/13.jpg)
Scheduler Statistics and Iteration Time
13Managing HPC Systems and Centers