Lecture4

17
High Performance Computing Jawwad Shamsi Lecture #4 25 th January 2010

description

 

Transcript of Lecture4

Page 1: Lecture4

High Performance Computing

Jawwad ShamsiLecture #4

25th January 2010

Page 2: Lecture4

Recap

• Pipelining• Super Scalar Execution– Dependencies• Branch• Data• Resource

• Effect of Latency and Memory• Effect of Parallelism

Page 3: Lecture4

Flynn’s taxanomy

It distinguishes multiprocessor computers according to data and instruction

• the dimensions of Instruction and Data• SISD: Single Instruction Single data (Uniprocessor)• SIMD: Single Instruction Multiple data (Vector

Processing)• MISD: Multiple Instruction Single date• MIMD: Multiple Instruction Multiple data (SMP,

cluster, NUMA)

Page 4: Lecture4

MIMD

• MIMD– Shared Memory (tightly coupled)• SMP (Symmetric Multiprocessing)• Non-Uniform Memory access

– Distributed Memory (loosely coupled)• Clusters

Page 5: Lecture4

Taxonomy of Parallel Processor Architectures

Page 6: Lecture4

Shared Address Space

• Shared Memory• Distributed Memory

Page 7: Lecture4

SMP

• Two or more similar processors• Same main memory and I/O• Can perform similar operations• Share access to I/O devices

Page 8: Lecture4

Multiprogramming and Multiprocessing

Page 9: Lecture4

SMP Advantages

• Performance• Availability• Incremental growth• Scaling

Page 10: Lecture4

Block Diagram of Tightly Coupled Multiprocessor

Page 11: Lecture4

Cache Coherence

• Multiple copies of cache can maintain different data– Protocols?

Page 12: Lecture4

Processor Design: Modes of Parallelism

• Two ways to increase parallelism– Superscaling• Instruction level parallelism

– Threading• Thread level parallelism

– Concept of Multithreaded processors» May or may not be different than OS level mult-threading

• Temporal Multi-threading (also called implicit)– Instructions from only one thread

• Simultaneous Multi-threading (explicit)– Instructions from more than one thread can be executed

Page 13: Lecture4

Scalar Processor Approaches

• Single-threaded scalar– Simple pipeline – No multithreading

• Interleaved multithreaded scalar– Easiest multithreading to implement– Switch threads at each clock cycle– Pipeline stages kept close to fully occupied– Hardware needs to switch thread context between cycles

• Blocked multithreaded scalar– Thread executed until latency event occurs– Would stop pipeline– Processor switches to another thread

Page 14: Lecture4

Clusters

• Alternative to SMP• High performance• High availability• Server applications

• A group of interconnected whole computers• Working together as unified resource• Illusion of being one machine• Each computer called a node

Page 15: Lecture4

Cluster Benefits

• Absolute scalability• Incremental scalability• High availability• Superior price/performance

Page 16: Lecture4

Cluster Configurations - Standby Server, No Shared Disk

Page 17: Lecture4

Cluster v. SMP• Both provide multiprocessor support to high demand

applications.• Both available commercially

– SMP for longer• SMP:

– Easier to manage and control– Closer to single processor systems

• Scheduling is main difference• Less physical space• Lower power consumption

• Clustering:– Superior incremental & absolute scalability– Superior availability

• Redundancy