Towards a Realistic Scheduling Model Oliver Sinnen, Leonel Sousa, Frode Eika Sandnes IEEE TPDS, Vol....
-
date post
20-Dec-2015 -
Category
Documents
-
view
215 -
download
1
Transcript of Towards a Realistic Scheduling Model Oliver Sinnen, Leonel Sousa, Frode Eika Sandnes IEEE TPDS, Vol....
Towards a Realistic Scheduling Model
Oliver Sinnen, Leonel Sousa, Frode Eika Sandnes
IEEE TPDS, Vol. 17, No. 3, pp. 263-275, 2006.
Parallel processing is the oldest discipline in computer science –
yet the general problem is far from solved
Why is parallel processing difficult?
• ”Jo flere kokker jo mere søl”– Partitioning and
transforming problems
– Load balancing– Inter-processor
communication– Granularity– Architecture
Implementing parallel systems
• Manually– MPI– PVM– Linda
• Automatically– Parallelising compilers (Fortran)– Static scheduling
Taskgraph scheduling:Representing static
computations
Modelling computations
A=B+C
Data dependencies
A
BC
Valid sequences:CBA, BCAInvalid sequences:ABC, ACB, CAB, BAC
Another example
A = (B-C)/DF = B+G
A
BCD
F
G
Scheduling
Static taskgraph scheduling techniques
The scheduling process
A
B C
D E
A
B
D
C
E
Taskgraph
Allocation
Schedule
p1 p2tim
e
c1 c2
c3 c4 c5
Topological sorting
• Topological sorting– to order the vertices of a graph such
that the precedence constraints are not violated
• All valid schedules represent a topological sort
• Scheduling algorithms differ in how they topologically sort the graph
The importance of abstraction
• Abstraction is important to preserve generality
• Too specificfloat sum = 0;for (int i=0;i<8;i++)
{sum += a[i];}
• General and flexiblefloat sum = sumArray(a);
Communication
Communication is a major bottleneck
• Typically from 1:50 to 1:10,000 difference between computation and communication
• Communication cost not very dependent on data size.
• Interconnection network topology affect the overall time.
Scheduling work prior to 1995
• Assumptions– Zero-interprocessor communication
costs– Fully connected processor
interconnection networks.
Amounts of data transferPublic transport is a good
thing?
Data-size not is not major factor
• Multiple single messages
• Single compound message
connect send connect send connect send
connect send send send
Interconnection topology
Fully connected
The ring
To send something from here..
…to here
Interprocessor communication
• Zero vs non-zero communication overheads
• Direct links vs connecting nodes
P1 P4P3P2
Bus
P11 P12 P13 P14
P21 P22 P23 P24
P31 P32 P33 P34
P41 P42 P43 P44
Shared memoryBus-based multiprocessor
Distributed memoryMesh multiprocessor
RAM
Avoiding communication overheads
Duplication
a
b c
a a
b c
a
b
c
a
b
a
c
p1 p2 p1 p2
t=1
t=2
t=3
t=1
t=2
11 1
1 1
11
1
11
1
duplication
allocation allocation
When considering communication overheads
Classic communication model: Assumptions
• Local communications have zero communication costs
• Communication is conducted by subsystem.
• Communication can be performed concurrently
• The network is fully connected
Implications
• Network contention (not modelled)– Tasks compete for communication
resources
• Contention can be modelled:– Different types of edges– Switch verticies (in addition to
processor verticies)
Processor involvement in communication I
Two-sided involvement
(TCP/IP PC-cluster)
Processor involvement in communication II
One-sided involvement
(Shared memory Cray T3E)
Processor involvement in communication III
Third party involvement
(Dedicated DMA hardware Meiko CS-2)
Problems
• All classic scheduling models assume third-party involvement.
• Very little hardware are equipped with dedicated hardware supporting third-party involvement.
• Estimated finish-times for tasks are hugely inaccurate.
• Scheduling algorithm are very sub-optimal.
Even more problems
Results
bobcat Sun E3500 3TE-900
The End