faculty.cse.tamu.edufaculty.cse.tamu.edu/slupoli/notes/CSOverview/Parallel Computing… · Web...

9
Parallel Computing Need for Parallel Processing Every generation of processor technology is faster, with more transistors in a smaller space. Processor technologies often require less voltage. Based on advances in manufacturing transistors. Regardless of how fast our computers are, we always seem to want more power and speed. CPU Cycles Consider a personal computer (PC). When you are typing on a word processor, the CPU doesn’t have much to do. CPU runs in millions or billions of cycles per second. This is MUCH faster than human time can perceive. A PC CPU is typically wasting most of its processor time. BUT, when we have lots of processing for the CPU to do, we want it done ASAP. Processing cycles are so cheap that we can buy fast processors even when most of their power is wasted. Recent PC technologies Multicore technologies o Each core is almost a separate processor. o A 4-core processor is like having the power of 4 CPUs in one. 1

Transcript of faculty.cse.tamu.edufaculty.cse.tamu.edu/slupoli/notes/CSOverview/Parallel Computing… · Web...

Page 1: faculty.cse.tamu.edufaculty.cse.tamu.edu/slupoli/notes/CSOverview/Parallel Computing… · Web viewParallel Computing. Need for Parallel Processing. Every generation of processor

Parallel ComputingNeed for Parallel Processing

Every generation of processor technology is faster, with more transistors in a smaller space.

Processor technologies often require less voltage. Based on advances in manufacturing transistors. Regardless of how fast our computers are, we always seem to want more

power and speed.

CPU Cycles Consider a personal computer (PC). When you are typing on a word processor, the CPU doesn’t have much to do. CPU runs in millions or billions of cycles per second. This is MUCH faster

than human time can perceive. A PC CPU is typically wasting most of its processor time. BUT, when we have lots of processing for the CPU to do, we want it done

ASAP. Processing cycles are so cheap that we can buy fast processors even when

most of their power is wasted.

Recent PC technologies Multicore technologies

o Each core is almost a separate processor.o A 4-core processor is like having the power of 4 CPUs in one.o More processors mean more power and speed (and more wasted CPU

cycles). Multiple core or multiple CPUs allow faster completion of work by dividing

the work between more than one CPU.

Classic Model: Parallel ProcessingExecution in Parallel

1

Page 2: faculty.cse.tamu.edufaculty.cse.tamu.edu/slupoli/notes/CSOverview/Parallel Computing… · Web viewParallel Computing. Need for Parallel Processing. Every generation of processor

Multiple processors available (4)

A process can be divided into serial and parallel portions.

The parallel parts are executed concurrently.

Serial Time: 10 time units Parallel Time: 4 time units

Amdahl’s Law (Analytical Model) Analytical model of parallel speedup from

1960s Parallel fraction () is run over n processors

taking /n time The part that must be executed in serial (1 –

) gets no speedup Overall performance is limited by the

fraction of the work that cannot be done in parallel (1 – )

diminishing returns with increasing processors (n)

Consider the denominator of Amdahl’s modelo Two parts are added.

2

Page 3: faculty.cse.tamu.edufaculty.cse.tamu.edu/slupoli/notes/CSOverview/Parallel Computing… · Web viewParallel Computing. Need for Parallel Processing. Every generation of processor

o Only one part changes as the number of processors (n) increases.

Break-point (Break-even point)Break even equation (focus on denominator)

The break-point is when the two halves of the denominator are equal. At this point, increasing n has a declining effect on speedup. Very large n can drive /n toward zero. But can at best, can only double the speedup (and halve the time). Beyond this break-point, increasing the number of processors assigned to the

task will produce only limited speedup savings. This effect is called “diminishing returns.”

Example Break-point #1

For a process with a parallel fraction of 90% (90% can be done in parallel, 10% must be serial).

The break-point solved for n yields 9: 9 processors The speedup with 9 processors is 5. For this example, increasing the number of processors will drive the

denominator toward 1 – (10%). Maximum possible speedup for this workload (even with infinite processors)

is 10. Discouraging?

Example Break-point #2

3

Page 4: faculty.cse.tamu.edufaculty.cse.tamu.edu/slupoli/notes/CSOverview/Parallel Computing… · Web viewParallel Computing. Need for Parallel Processing. Every generation of processor

Process with a parallel fraction of 99%. Solved for n yields 99 processors–the break-point.

Example Break-point #3

Process with a parallel fraction of 99.9%. Solved for n yields 999 processors–the break-point. Very large parallel fractions of work are actually not uncommon.

Limitations of Parallel ProcessingWhen to Add More Processors?

When below the break point When above the break point

it may be reasonable to add more processors.

it is less likely to be worthwhile to add more processors.

Multiple Levels of Parallelism

4

Page 5: faculty.cse.tamu.edufaculty.cse.tamu.edu/slupoli/notes/CSOverview/Parallel Computing… · Web viewParallel Computing. Need for Parallel Processing. Every generation of processor

Parallelism suffers from diminishing returns, resulting in limited scalability.

Allocating hardware resources to capture multiple levels of parallelism–each level operates at efficient end of speedup curves.

Manufacturers of CPUs are integrating multiple levels of parallelism on a single chip.

Levels of Parallelism Multiple levels of parallelism mitigate

the Amdahl’s law diminishing returns problems.

Each level starts at the most efficient end of the equation.

Optimal use of hardware resources spreads the processing components across levels.

Areas of Exploration & Grid ComputingClustered Computer Architecture

Multiple clusters Each cluster has an internal

interconnection.o Intracluster (within the cluster)

A 2nd-level interconnect connects the clusterso Intercluster (between clusters)

Blade-Cluster Architecture5

Page 6: faculty.cse.tamu.edufaculty.cse.tamu.edu/slupoli/notes/CSOverview/Parallel Computing… · Web viewParallel Computing. Need for Parallel Processing. Every generation of processor

Each blade has a cluster of processors plus its own memory and disk.

Intercluster network links blades together.

Grid Architecture Multiple-cluster computers linked

together Three levels of interconnection.

Interconnect Performance Interconnection network performance is critical to realizing the possible

speedup. L is used to represent the average time delay communicating over the

network. Interconnection network performance is critical to realizing the possible

speedup.

6

Page 7: faculty.cse.tamu.edufaculty.cse.tamu.edu/slupoli/notes/CSOverview/Parallel Computing… · Web viewParallel Computing. Need for Parallel Processing. Every generation of processor

Effect of Interconnect Performance on Parallel Speedup

7