Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit /...

13
Copyright 2012 FUJITSU LIMITED Technical Computing Suite Job Management Software PRIMERGY x86 cluster Supercomputer PRIMEHPC FX10 Toshiaki Mikamo Fujitsu Limited

Transcript of Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit /...

Page 1: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Copyright 2012 FUJITSU LIMITED

Technical Computing Suite Job Management Software

PRIMERGY x86 cluster

Supercomputer PRIMEHPC FX10

Toshiaki Mikamo

Fujitsu Limited

Page 2: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Copyright 2012 FUJITSU LIMITED

Outline

System Configuration and Software Stack

Features

The major functions of job scheduler Efficient Resource Usage Fair Share Scheduling System-optimal Resource Assignment

Summary and Future 1

Page 3: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Copyright 2012 FUJITSU LIMITED

Hybrid System Configuration

File management

nodes

Job management

nodes

Control nodes

Login nodes

User

Management nodes

Administrator

Global file system (Data storage area)

Local file system (Temporary area occupied by jobs)

6D mesh/torus Interconnect (Tofu)

IO network (IB), management network (GbE)

• Login • Compilation • Job submission

• System operations management • Job operations management

Supercomputer PRIMEHPC FX10

Fat-Tree Interconnect (Infiniband)

PRIMERGY x86 cluster

2

Page 4: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Technical Computing Suite

Copyright 2012 FUJITSU LIMITED

System Software Stack User/ISV Applications

High-performance file system Lustre-based distributed file system High scalability IO bandwidth guarantee High reliability & availability

HPC Portal / System Management Portal

Supercomputer PRIMEHPC FX10

System operations management System configuration management System control System monitoring System installation & operation

Job operations management Job manager Job scheduler Resource management Parallel execution environment

VISIMPACTTM

Shared L2 cache on a chip Hardware intra-processor

synchronization

Compilers

Support Tools

MPI Library Scalability of High-Func. Barrier Comm.

IDE Profiler & Tuning tools Interactive debugger

Hybrid parallel programming Sector cache support SIMD / Register file extensions

Linux-based enhanced Operating System PRIMERGY x86 cluster

Red Hat Enterprise Linux

3

Page 5: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Copyright 2012 FUJITSU LIMITED

Features Same job operations in FX10 and PRIMERGY

Efficient, fair and system-optimal job scheduling See slide below for details

Resource / Access control Elapsed time limit / CPU time limit / Physical memory limit

Enable / Disable execute permission of job operation commands

Reduce OS jitter / Power saving control

Job statistical information The amount of CPU time / Memory / IO

SIMD rate / MIPS / MFLOPS 4

Page 6: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Copyright 2012 FUJITSU LIMITED

Job Scheduler

Renew our job scheduler for large-scale system

Our job scheduler features: Multi-process

enable to coexist multiple scheduler in a cluster.

Multi-thread enable to balance the load of scheduling.

5

Page 7: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Copyright 2012 FUJITSU LIMITED

Efficient Resource Usage Backfill scheduling for keeping the resources busy Our scheduler manages space(compute nodes) and time.

It will backfill the low priority jobs so as not to prevent high priority jobs. Time

Job A Job B

Job D

Running job

Job D

Job A Job B

Job C

Running job

Job C

Now t1 t2 t3

Not backfilled

Backfilled

Job D

Job C

6

Page 8: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Copyright 2012 FUJITSU LIMITED

Fair Share Scheduling Fairly share resources between users/groups based on past usage.

① Fair share value is issued in advance for each user/group.

② The value is changed by the result of resource usage.

③ The job execution priority is determined dynamically according to the value.

Fair share value is like money.

time

Payment

Return of overpaid

Deposit

Fair

sha

re v

alue

(m

oney)

Payment[P] = (#Node allocated) x (Elapsed time limit of job) Deposit[D] = (Elapsed time) x (Recovery rate) Return of overpaid[R] = P - ((#Node allocated) x (Actual elapsed time of job))

7

Page 9: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Copyright 2012 FUJITSU LIMITED

Optimal Job Scheduling for FX10 Interconnect topology-aware resource assignment One interconnect unit : 12 nodes (2 x 3 x 2)

Job assignment rule: rectangular solid shape Guaranteeing neighbor communication

Avoiding interfering with other jobs

Rotates rectangular solid of interconnect unit to reduce fragmentation

In-use unoccupied

x

z y 8

4 6 8

6

4 6

8

4 4

8

6

8

Page 10: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Asynchronous file staging

Copyright 2012 FUJITSU LIMITED

Login nodes Global file system

(Data storage area)

IO network (IB), management network (GbE)

PRIMEHPC FX10

Time

Stage IN

Stage IN

Job A Job B

Running job Job C

Now t1 t2 t3

Compute nodes

IO nodes

Stage OUT

Stage IN

Stage IN Asynchronously transfer files from Global to Local FS before the job starts.

Co-scheduling of computation and file transfer.

Interconnect

IO nodes

Compute nodes

Local file system

Stage IN/OUT

Job A

Stage OUT

Stage OUT Asynchronously transfer files from Local to Global FS after the job ends.

Async. Async.

Optimal Job Scheduling for FX10

9

Page 11: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Copyright 2012 FUJITSU LIMITED

Fine-grained node assignment Node selection method : balancing / concentration

Rank placement policy : pack / unpack

Priority control of allocated nodes

Execution mode : node is occupied or not by a job.

Strict core assignment Processes are bound to cores in the job territory.

No process can move to cores in other job territory.

Node concentration

Job A

Job B

Job C

Node#0 Node#1 Node#2

Job D

Job B Job A

Rank unpack

R0 R1 R0

R1

Rank pack

Node#1 Node#2 Node#0

Node

0

1

2

3

4

5

6

7

Job A Job B

P

core

Optimal Job Scheduling for PRIMERGY

10

Page 12: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Copyright 2012 FUJITSU LIMITED

Summary and Future

We developed the job management software. Unified operability on PRIMEHPC FX10 and PRIMERGY

New job scheduler : Efficiency, Fairness and System-optimization

Practical resource control and job statistical information

Future Work Operation simulator

Administrator will be able to simulate the operation situation subsequent to operation parameter changes.

11

Page 13: Technical Computing Suite Job Management Software … · Elapsed time limit / CPU time limit / Physical memory limit Enable / Disable execute permissionof job operation commands Reduce

Copyright 2012 FUJITSU LIMITED 12