Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

25
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors

Transcript of Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

Page 1: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

Frank CasilioComputer Engineering

May 15, 1997

Multithreaded Processors

Page 2: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio2Computer Engineering

Problems with MultiProcessors

• Memory Latency

• Context Switching Time

• Communication/Synchronization Latency

• Cache Coherence• Writes To Memory

• Poor Programming Model

Page 3: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio3Computer Engineering

Motivation

• Reduce/Tolerate Memory Latency

• General Purpose Machine

• Scalability

• Shared Memory

• Simpler Programming Model

Page 4: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio4Computer Engineering

Typical Ways To Reduce Latency

• On-Chip Cache

• Shortens Round Trip To Memory

• Fast Buses & Networks

• Hardware Synchronization

• Prefetching

Page 5: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio5Computer Engineering

Multi-Threading: The Concept

• Support For Multiple Concurrent Hardware Contexts

• Tolerates Latency Instead of Reducing It

• Swap Contexts During Latencies

• Experimental Systems Have Existed Since The 50’s• Only 2 Commercial Systems Ever Produced

• HEP• Tera MTA

Page 6: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio6Computer Engineering

Parameters That Effect Efficiency

• Number Of Contexts Supported

• Switching Overhead

• Run Length (Granularity)

• Average Latency To Be Hidden

Page 7: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio7Computer Engineering

Switching Theory

• Determines How Often Contexts Switch

• Two Different Types

• Fine Grained• Coarse Grained

• Directly Related to Cost

Page 8: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio8Computer Engineering

Fine Grained Switching

• Switches Contexts Every Cycle

• Many Long Latencies Operations Tolerated

• Requires More Contexts• Workload Requirements

• Can Simplify Overall Processor Complexity

Page 9: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio9Computer Engineering

Coarse Grained Switching

• Switches Contexts After A Couple Of Cycles• Has Problems With Sporadic Latencies

• Requires Less Contexts

• Requires More Complex Processors

Page 10: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio10Computer Engineering

The TERA MTA

• First Commercial Multithreaded Machine Since 1978

• Uniform Shared Memory

• Scalable

• Direct Relationship b/w PE’s & Throughput

• Fine Grained Architecture

Page 11: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio11Computer Engineering

The Tera MTA Cont’d

• Torodial Interconnection

• 12 Million Dollar Base System

• 16-256 Processor Versions

Page 12: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio12Computer Engineering

Processor Characteristics

• Support For 128 Threads

• 16 Protection Domains

• 333 MHz Nominal Speed

• 0 Context Switching Overhead!!!

• 1 GFLOP Peak Performance

Page 13: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio13Computer Engineering

Processor Characteristics Cont’d

• Load-Store Architecture• 3 Addressing Modes

• 31 64-bit GPR’s

• 3 Operations Per Instruction• 1 Memory Reference• 1 Arithmetic Operation• 1 Control (i.e.. Branch)

• 6KW Of Power Dissipation Per Processor

Page 14: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio14Computer Engineering

Interconnection Network

• 3-D Torus Contains 3p/2 nodes

• Packet Switching

• 3 Cycles of Latency Per Node

• Messages Are Assigned Random Priorities

• 164 Bit Packets• 64 Bits Are Data• 2.67 GB/s Bandwidth In Each Direction

• 2 HIPPI Channels / Processor For Net Connection

Page 15: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio15Computer Engineering

Memory

• 8, 16, 32 and 64 Bit Addressable

• 4 Bits per Word Of Access State For Synchronization

• Memory Units Equipped With Error Correcting Code

• Memory Usage In Random To All Banks

• Either 2p or 4p Units, Interleaved 64 Ways

• 16 MB DRAM Chips

Page 16: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio16Computer Engineering

Input / Output• Maximum Strategy Gen5 XL RAID

• Sustained Bandwidth of 130 MB/s

• At Least p/16 Disk Arrays Are Required

• System Capacity of 300p GB

• 20p MB/s In Each Direction

Page 17: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio17Computer Engineering

Operating System

• Distributed Parallel Version Of Unix• Highly Concurrent Version Of Berkeley

• Allows Systems To Run p Tasks Truly Parallel

• Streams Are Dynamically Created w/o OS Intervention

• Processes Are Broken Up Into Tasks By OS

• Two Tier Scheduler Provides Better Resource Allocation• PL Scheduler• PB Scheduler

Page 18: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio18Computer Engineering

Software / Languages

• Implicit And Explicit Parallelism Is Allowed

• Automatic Parallelization Of:• C, C++ & Fortran By The Compiler

• High Degree of Cray Compatibility

• Easy To Program b/c Of Architecture

Page 19: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio19Computer Engineering

System Performance

• 3.84-12.8 Times Performance Of Cray T90/32

• 1K x 1K Matrix Multiple in 50 ms

• Integer Sort of 100M Keys in 36 ms

Page 20: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio20Computer Engineering

Conclusion

• Proven Effectiveness

• Logical Step For Multiprocessor Computers

• Still Very Pricey

• Allow General Purpose Workload

• Scalable

• Shared Memory

Page 21: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio21Computer Engineering

Questions?

Page 22: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio22Computer Engineering

Instruction Pipeline

Page 23: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio23Computer Engineering

Breakdown Of A Task

Task

Tea

m

Tea

m

Tea

m

Tea

m

VPVPVPVPVPVPVPVP

Page 24: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio24Computer Engineering

Page 25: Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1997 Frank Casilio25Computer Engineering

Deciding The Of Number Contexts