Principles of Parallel Architecture

30
8/30/2006 eleg652-010-06F 1 Principles of Parallel Architecture Fall 2006 Keys to a happy Life: Diversity and Variety. Diversity in the people that you meet. Variety in the things that you do

description

Principles of Parallel Architecture. Fall 2006. Keys to a happy Life: Diversity and Variety. Diversity in the people that you meet. Variety in the things that you do. Contact Information. Teaching Assistants. Name: Juergen Ributzka. Instructor. Office: 326 Dupont Hall . - PowerPoint PPT Presentation

Transcript of Principles of Parallel Architecture

Page 1: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 1

Principles of Parallel Architecture

Fall 2006

Keys to a happy Life: Diversity and Variety. Diversity in the people that you meet. Variety in the things that you do

Page 2: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 2

Contact Information

Instructor

Name: Joseph B. Manzano

Office: 137 Evans Hall

Phone: N/A Email: [email protected]

Teaching Assistants

Name: Juergen Ributzka

Office: 326 Dupont Hall

Phone: (302) 831 0327 Email: [email protected]

Course Webpage: http://www.capsl.udel.edu/courses/eleg652/2006/

Name: Eunjung Park

Office: 326 Dupont Hall

Phone: (302) 831 0327 Email: [email protected]

Page 3: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 3

Important Course Information

Final Quiz

Final Project Due date

Grade Distribution

Activity WeightHomeworks 33.00%Class Project 33.00%Final Exam 33.00%Participation 1.00%

Four Homeworks, a comprehensive final examination and a class project assigned

by the instructor with a mentorActivities

Wednesday December 6th, 2006

Friday, December 8th, 2006

Page 4: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 4

Reference Material

Reference Books

John Henessy and David PattersonComputer Architecture: A Quantitative ApproachThird EditionMorgan Kaufmann Publishers, Inc.2003

D. E. Culler, J.P. Singh, and A. GuptaParallel Computer ArchitectureMorgan Kaufmann Publishers, Inc.1999

1

2

Page 5: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 5

Supporting MaterialsSelected publications from

Journals

IEEE Transaction on Parallel and Distributed Systems

IEEE Computer

IEEE Transactions in Computers

Conference Proceedings

PACT

MICRO

ISCA

HPCA

PLDI

Parallel Architectures and Compilation Techniques

ACM/IEEE Symposium on Micro-Architectures

International Symposium on Computer Architectures

ACM/IEEE Symposium High Performance Computer Architecture

International Symposium on Parallel Language Design and Implementation

Page 6: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 6

Course Contents

Provides an overview of technologies that are applicable in almost all aspects of computers and , soon to be, part of consumer electronics in general.

Shows the principles in which parallel machines are built and how these concepts have infiltrated other parts of the computer and entertainment industry.

Provides an understanding about how these concepts affects both hardware and software on its target machine and their different implementations.

Page 7: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 7

Expectations about this CourseYou should learn:

A basic idea about the lingo that is used in today's supercomputer/parallel machine market

Vector Processing and its place in consumer electronics

Different forms of parallelism and their current implementations

Shared memory models

Parallel Programming Models and Synchronization

Multi threaded Architectures

Page 8: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 8

Why Study Parallel Architectures?

Concepts that soon should become ubiquitous

Productively write software that takes advantages of new features of upcoming or existing hardware

Understand how current technologies haveevolved and how they can be improved

Page 9: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 9

Course Overview

Terminology and General Knowledge

Vector Processing and its Legacy

Instruction Level Parallelism: a brief overview

Multicore and Cellular architectures

Parallel (shared) memory models and synchronization primitives

Advance Topics such as Dataflow and Transactional Memory

Page 10: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 10

Course IntroductionThe Role of a Computer Architect

Maximize Productivity and Performance

Productivity = Programmability and a reduction in development time

Performance = “Reasonable” Throughput given technology and cost limitations

ParallelismTwo or more tasks may execute at the same timeAlternative to higher frequency clocksApplies to all levels of computer design

Importance has been constantly raising since several “walls” were hit

In the near future, it will be become the paradigm on all aspects of computing

Page 11: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 11

The Transition

Most consumer electronics will have some form of parallel architecture inside of them by next year (2007)

Reasons for the ChangeAn evolutionary change in computing due to:

TechnologyTechnology

ApplicationsApplications

ArchitectureArchitecture

EconomicsEconomics

Decrease in feature size Allowing more components into a chip

Effectively organizing components to maximize uses of resources and minimizing damaging size effects

Find Cost Effective ways to get the desired performance out of the given Hardware / Software combo

More and more performance and power hungry applications

Page 12: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 12

Applications Requirements

Demand for more cycles = More sophisticated Hardware

Wide Range of Performance Demands

Audio Processing = Real time response with an allowed threshold of errorBusiness Loads = A given quanta of time with no error allowed

Application and parallel computer: Obtain a speed up in application runtime

Productive Parallel Systems

Current Systems work on parallel concepts and designs (i.e. Desktop systems are Multithreaded)

Parallel Computing and computers are becoming ubiquitous as we speak

Page 13: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 13

Technology: An OverviewDecrease in Feature Size (Lambda)

Clock rates ~ proportional to in LambdaNumber of Transistors >= Lambda square

Performance: An increase of roughly 1000x in the last decadeThe fastest supercomputer in June 1996 (Tokyo's SR2201) was 220 GFLOPSThe fastest supercomputer now is 280 TFLOPS (IBM's eServer Blue Gene Solution)

and an increase of roughly 200x in the same decade with respect to clock frequency

Intel Pentium Pro at 150 ~ 200 Mhz in 1996 Intel Pentium D at 3.2 Ghz in 2006

Extra components: Parallelism V.S. Data locality: Fighting for Real State

Page 14: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 14

0

500000

1000000

1500000

2000000

2500000

3000000

3500000

4000000

Clock Growth from 1971 to 2004

Column C

Intel Microprocessor Family

Freq

uenc

y in

KH

zIntel: An Example of Clock

Frequency Growth

Growth has been steady until now!!!!

Page 15: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 15

Pentium M

Thermal Maps from the Pentium M obtained from simulated power density (left) and IREM measurement (right). Heat levels goes from black (lowest), red, orange, yellow and white (highest)

Figures courtesy of Dani Genossar and Nachum Shamir in their paper Intel ® Pentium ® M Processor Power Estimation, Bugdeting, Optimization and Validation published in the Intel Technical Journal, May 21, 2003

Page 16: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 16

Storage and Transistor Count Growth

Expected to reach one billion during this decade (2000)

Grow faster than clock rate: 40 % per year

StorageStorage

Transistor CountTransistor Count

Gap between storage and speed more pronounced

Larger memories = slower = Larger memory hierarchies (i.e. Caches, write / read buffers, etc)

Parallelism and Locality inside memory systems: Multi port memory, parallel caches, RAIDs, parallel disks with caching, etc

Page 17: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 17

Moore's LawThe complexity for minimum component costs has increased at a rate of roughly a factor of two per year ... Certainly over the short term this rate can be expected to continue, if not to increase. Over the longer term, the rate of increase is a bit more uncertain, although there is no reason to believe it will not remain nearly constant for at least 10 years. That means by 1975, the number of components per integrated circuit for minimum cost will be 65,000. I believe that such a large circuit can be built on a single wafer.

Gordon Moore's original statement. "Cramming more components onto integrated circuits", Electronics Magazine 19 April 1965

In Layman terms: The number of components on integrated circuits will roughly double every 18 months. With that, the complexity (effort) and the headcount should increase proportionally

Page 18: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 18

Architectural TrendsDesigned for performance

Higher Frequency == Higher Performance ?

Memory V.S. Processor

Architectural Trends: Hide Latencies at all cost!!!

Overlap Computation with Memory accesses [DMA]

Multithreaded execution and sharing of resources [SMT and HT technologies, MTA]

Give more chip real state to speculative execution [Branch prediction and prefetching]

Bring more used-data closer to the processor [memory hierarchies]

Power Problem? Go Multicore!!!!

Takes N time to finish a M size problem using T amount of power

x x

Takes N/2 + 2X time to finish a M size problem using T/2 amount of power per unit

Page 19: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 19

Technology Progress Overview

Processor speeds = much faster (around 1000x)

Memory (RAM) speeds are increasing too but at a slower rate (around 10x)

But Memory (RAM) dimensions have grown even faster than processor's speed (around 1,000,000x)

Computation is almost free but bandwidth is very expensive

Page 20: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 20

The Pentium Chip

Page 21: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 21

Intel Pentium 4Nine Years and Millions of Dollars Later

Page 22: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 22

Next GenThe Cell Chip Layout

Many of them, simpler and cheaper!!!

Page 23: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 23

The Dawn of Parallelism

• Parallel architectures are becoming more attractive

• Milestone: the introduction of Pentium D (2005) and Centrino Duo (2006)

• Future Projects: IBM PERCS project, Cray Eldorado, Sun Hero, IBM Cell project, etc ...

• All the factors listed contributed to this “epiphany” in computing technology.

• Parallelism can be exploited at many levels in many ways

Page 24: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 24

The World's FastestJapan Dominance

1993 1994 1995 19960

50

100

150

200

250

300

350

400

Top SuperComputers

Years

GFL

OP

S

Numerical Wind TunnelNumerical Wind Tunnel

CP PACS

192 GFLOPS

368 GFLOPS

Page 25: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 25

The World FastestUSA Takes the Lead

1993 1994 1995 1996 1997 1998 1999 2000 20010

1000

2000

3000

4000

5000

6000

7000

8000

Top SuperComputers

Years

GFL

OP

S

ASCI RedASCI Red

ASCI White SP Power3 375 ASCI White SP Power3 375 MhzMhz

1.3 TFLOPS

7.3 TFLOPS

Page 26: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 26

The World FastestJapan Second Wind

1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 20030

5000

10000

15000

20000

25000

30000

35000

40000

Top SuperComputers

Years

GFL

OP

S

EARTH SimulatorEARTH Simulator35 TFLOPS

Page 27: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 27

The World's Fastestand again...

1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 20060

50000

100000

150000

200000

250000

300000

Top SuperComputers

Years

GFL

OP

S

BlueGene L BetaBlueGene L Beta

BlueGene L eServer BlueGene L eServer SolutionSolution

70 TFLOPS

280 TFLOPS

Page 28: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 28

The World's FastestBlueGeneL eServer Solution

• 65536 Dual Processors arrange in a 32 x 32 x 64 3D torus network.

• Global Tree structure for fast reduction and broadcast operations over all nodes

• A I/O node per 64 nodes– Inside a 64 group: Tree structure connections between I/O

node and computation nodes with an aggregate bandwidth of 2.1 GB/s

– Across 64 groups: Torus like connections

• Total Memory: 32 TeriBytes• Total Power Consumption: 1.5 MegaWatts

Page 29: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 29

The World's FastestBlueGeneL eServer Solution

Page 30: Principles of Parallel Architecture

8/30/2006 eleg652-010-06F 30

The Next Step

• So what is next?• Multicore, System on a chip, PIM, etc

– Simpler, colder, cheaper...• Intel Pentium D and Centrino Duo• AMD Opteron• The DARPA HPCS Project• IBM, Cray and SUN Multicore chips: CELL,

Cyclops, BlueGene, • Alternatives: Clearspeed [Programmable

Co-Processors], etc...