Unit 8

Chapter 8Shared Memory Multiprocessors

A program consists of a collection of executable sub-program units.

These units, which we refer to as tasks, are also sometimes called programming grains. They must be defined, scheduled, and coordinated by hardware and software before or during program execution.

Basic Issues

Multiprocessors usually are designed for two reasons:

fault tolerance and program speedup.

These basic issues may are as follows:

1. Partitioning. This is the process whereby the original program is decomposed into basic sub-program units or tasks, each of which can be assigned to a separate processor. Partitioning is performed either by programmer directives in the original source program or by the compiler at compile time

2. Scheduling of tasks. Associated with each program is a flow of control among the sub-program units or tasks.

Certain tasks must be completed before others can be initiated (i.e., one is dependent on the other). Other tasks represent functions that can be executed independently of the main program execution. The scheduler's run-time function is to arrange the task order of execution in such a way as to minimize overall program execution time.

3. Communication and synchronization. It does the system no good to merely schedule the initiation of various tasks in the proper order, unless the data that the tasks require is made available in an efficient way. Thus, communication time has to be minimized and the receiver task must be aware of the synchronization protocol being used. An issue associated with communications is memory coherency. This property ensures that the transmitting and receiving elements have the same, or a coherent, picture of the contents of memory, at least for data which is communicated between the two tasks.

Suppose consider this

Suppose a program p is converted into a parallel form, pp. This conversion consists of partitioning pp into a set of tasks, Ti. pp (as partitioned

Partitioning

Partitioning is the process of dividing a program into tasks, each of which can be assigned to an individual processor for execution at run time.

These tasks represented as a node. Portioning occur at run time well before

execution. Program overhead (o) is the added time a

task takes to be loaded into a processor prior to beginning execution.

Overhead affects speedup For each task Ti, there is an associated

number of overhead operations oi, so that if Ti takes Oi operations without overhead, then:

In order to achieve speedup over a uniprocessor, a multiprocessor system must achieve the maximum degree of parallelism among executing subtasks or control nodes. On the other hand, if we increase the amount of parallelism by using finer-and finer-grain task sizes, we necessarily increase the amount of overhead. This defines the well known "U" shaped curve for grain size

The effects of grain size.

If uniprocessor program P1 does operation O1, then the parallel version of P1 does operations Op,

where Op³ O1.

For each task Ti, there is an associated number of overhead operations oi, so that if Ti takes Oi operations without overhead, then:

Clustering

Clustering is the grouping together of sub-tasks into a single assignable task. Clustering is usually performed both at partitioning time and during scheduling run time. :

The reasons for clustering during partition time might include

Moreover, the overhead time is

Moreover, the overhead time is:

1. Configuration dependent. Different shared memory multiprocessors may have significantly different task overheads associated with them, depending on cache size, organization, and the way caches are shared.

2. Overhead may be significantly different depending on how tasks are actually assigned (scheduled) at run time.

The detection of parallelism itself in the program is achieved by one of three methods:

Explicit statement of concurrency in the higher-level language, as in the use of such languages as CSP (communicating sequential processes) [131] or Occam [75], which allow programmers to delineate the boundaries among tasks that can be executed in parallel, and to specify communications between such tasks.

2. The use of programmer's hints in the source statement, which the compiler may choose to use or ignore.

Dependency Task List T1 T1 T2 T3 0 - - 1- 0 - T2 0 1 1 T3 Dependency matrix. A 'one' entry indicates a

dependency; e.g., in this figure a T2 depends on T1 and T3 depends on T2

8.3 Scheduling

Scheduling can be done either statically (at compile time) or dynamically (at run time)

Usually, it is performed at both times. Static scheduling information can be derived on the basis of the probable critical paths. This alone is insufficient to ensure optimum speedup or even fault tolerance.

Run time scheduling

Run-time scheduling can be performed in a number of different ways.

The scheduler itself may run on a particular processor

or it may run on any processor.

Typical run-time information includes information about the dynamic state of the program and the state of the system. The program state may include details provided by the compiler, such as information about the control structure and identification of critical paths or dependencies. Dynamic information includes information about resource availability and work load distribution. Program information must be generated by the program itself, and then gathered by a run-time routine to centralize this information.The major run-time overheads in run-time scheduling include:1. Information gathering.2. Scheduling.

Table 8.2 Scheduling.When: Scheduling can be performed at: Compile time (+) Advantage Less run time overhead Compiler lacks stall information Disadvantage May not be fault tolerant

Run time (+) Advantage More efficient execution Disadvantage

Higher overhead

How: Scheduling can be performed by:ArrangementCommentDesignated

single processor Simplest, least effort Any single processor ¯ Multiple processors Most complex,

potentially most difficult3. Dynamic execution control.

4. Dynamic data management.

Dynamic execution control is a provision for dynamic clustering or process creation at run time. Dynamic data management provides for the assignment of tasks and processors in such a way as to minimize the required amount of memory overhead delay in accessing data.

The overhead during scheduling is primarily a function of two specific program characteristics:

1. Program dynamicity

2. Granularity

8.4 Synchronization and Coherency

In practice, a program obeys the synchronization model if and only if:

1. All synchronization operations must be performed before any subsequent memory operation can be performed.

2. All pending memory operations are performed before any synchronization operation is performed.

3. Synchronization operations are sequentially consistent.

8.5 The Effects of Partitioning and Scheduling Overhead

When a program is partitioned into tasks, the maximum number of concurrent tasks can be determined. This is simply the maximum number of tasks that can be executed at any one time. It is sometimes called the degree of parallelism that exists in the program. Even if a program has a high degree of parallelism, a corresponding degree of speedup may not be achieved. Recall the definition of speedup:

T1 represents the time required for a uniprocessor to execute the program using the best uni processor algorithm. Tp is the time it takes for p processors to

Unit 8

Technology

Transcript of Unit 8