CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf ·...

46
Bhagi Narahari, GWU CS 339 Spring 2000 CS 339 -Topics in Computer Architecture: Instruction Level Parallel Processors and Embedded Systems

Transcript of CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf ·...

Page 1: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

CS 339 -Topics in ComputerArchitecture: InstructionLevel Parallel Processorsand Embedded Systems

Page 2: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Course Outline: Part I

• ILP Architectures• Overview of Technology Trends - Narahari 1/27/00

­ fabrication technology and implications on processors• Instruction Level Parallel (ILP) Processors 2/3/00

­ Overview of ILP: superscalar, VLIW/EPIC- Narahari 2/3­ The HPL-PD architecture - Yul Williams 2/10­ Intel Itanium Processor - Yogesh Chobe

• Compiler Optimization for ILP Processors 2/17­ Introduction to Compiler Optimization - Narahari 2/17­ The Trimaran Infrastructure- Ajay Jayaray 2/24­ ILP Compiler techniques - 3/2/00

­ Overview- Narahari 3/9/00­ Region Formation - Yul Williams/ Ghita 3/16/00­ Scheduling - Narahari/ Yogesh Chobe 3/30/00

­ Parallelizing C compilers - Jason Mader

Page 3: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Course Outline: Part II

• Embedded Systems• Introduction - Narahari 3/30/00• Software support for Embedded Systems 3/30/00

­ Issues and Requirements: Narahari 3/30/00­ Compiler Support Challenges: Narahari 4/6/00­ Power Optimization: Brian Crilly 4/13/00­ Validation: Brad Taylor 4/20/00­ Compiler optimization for power: Renato Levy 4/13/00

• Embedded devices and processors - ? 4/27/00• Future architectures 4/27/00

­ Reconfigurable processors - Brian Schott­ Software support challenges

• Project Reports: 5/4/2000!!!

Page 4: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Course Requirements

• Completely Project Based­ Term paper and paper presentations­ Term research project

• Readings• Materials on Web• Lab resources provided by HACC lab• Additional resources by SEAS CF

­ some by NCAC/Jason!

Page 5: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Projects

• EPIC Architectures -- Intel’s IA-64 Project­ Optimizing assembler for EPIC Processors­ Learn and Use Trimaran­ Implement specific components and procedures­ Enhance optimizing compiler­ If all works, then use Intel’s Itanium Software Development

Kit• Power Aware Computing Toolset

­ modelling power consumption on a processor­ building a simulator that models power consumption­ compiler techniques for optimizing power consumption

• Validation techniques in compilers ?• Video on demand/Set top boxes ?• Universal Parallel C (UPC) ?• More details next class!!!

Page 6: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Lecture 1: Hardware Vs. Software

Hardware

­ Medium to compute functions

Software

­ Functions to compute

Page 7: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Functions to compute

• Programming language

• Turing Machines, Recursive Functions

Page 8: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Hardware Media

More complex functions DSP, ASIC

Network Processors

Instruction setsAdd, Multiply, Branch,...

Net lists

Gates

Assembly

Page 9: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Cache OptimizationsAlgorithmic Strategies

for compile-timeoptimizations

(Micro) Architecture Challenges

Micro-architecture Hardware Support

"Hardware support must scale

­ (e.g. HPL- PD)

"Eg. Clock dilation

"Sensitive to hidden hardware costs

ISA

Page 10: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Focusing on the ISA slice

• Can have any combinations of instructions

• CISC­ Instructions are short programs

• RISC of interest to this course­ Instructions are few cycles­ Composed to get the same effect as single CISC

instructions­ Why bother?

Page 11: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

The Backdrop

• Who will program these machines?­ Programmers

• What do they expect?­ Performance (till now)

• How?­ Write HLL program and compile

• Automatic Compilation is key­ Short prototyping cycles­ “Assembly-like” performance

Page 12: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

A study of mainframes in the 1970s at IBM revealed that (even) sophisticated optimizing compilers typically used about 10% of the (compiler) instruction set

Page 13: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Question

• Why have complex instructions?

­ Automatic usage is key

­ Limits to proliferation if we depend on hand-coded assembly

Page 14: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Reduced Instruction Set Computing

• Build this intuition into an ISA that a compiler can use

• Reinvest silicon in­ easy to engineer designs­ performance

� e.g. pipeline registers

• Captured in ISA­ examples will be discussed as case studies

Page 15: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

By Contrast

Traditional DSPs

­ Engineering Complexity

­ Very difficult (CISC like) to compile for

Page 16: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

This Course

• Philosophy : Today hardware is designed hand inhand with software used to compile (automatically)

• Gives a snapshot of current state of the art• Compiler / ISA sweet-spots• Discuss issues in building both sides• More of an architecture and compilers course

Page 17: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Furthermore

• Extrapolates beyond current technology to whereprocessors and their compilers are headed next

• Next major step is a dynamically redefinable ISA

• Will study reconfigurable processors towards the endof this course

Page 18: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Technology Trends

• Fabrication• Architecture• Application• Compilation/Software Support

Page 19: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Motivation

• Demands of Embedded Computing impactingdesiderata

­ Faster, cheaper processors­ Shorter times to market

• Poor scalability of superscalars­ complex control units

• EPIC / VLIW­ Simpler architectures­ Known compilation technology

Page 20: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

More Motivation

• FPGA / Reconfigurable logic

­ Fine grained parallelism

­ Explicit control over micro-architectural features

­ Fast static communication

Page 21: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Trends In Technology,Applications,Architectures

Page 22: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Technology and Application Trends

• Feature size and the effect on signal delay

• Cost of verification and test of new designs

• The new media application shift

• Chip density and ILP

Page 23: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

0

5

10

15

20

25

30

35

40

650 500 350 250 180 130 100

Feature Size (nm)

Del

ay (

ps)

Gate Delay (ps)

Interconnect Delay (ps) Cu & Low k

Interconnect Delay (ps) Al & SiO2

Delay vs. Feature Size

1999

Bohr, M. T., “Interconnect Scaling - The Real Limiter To High Performance ULSI”, Proceedings ofthe IEEE International Electron Devices, pages 241-242.

Page 24: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Impact Of Decreasing Feature Size

• Interconnect delay greater has impact than gate delays

• “...wires are not keeping pace with scaling of otherfeatures. … In fact, for CMOS processes below 0.25micron ... an unacceptably small percentage of the diewill be reachable during a single clock cycle.”

• “Architectures that require long-distance, rapidinteraction will not scale well ...”­ “Will Physical Scalability Sabotage Performance Gains?”

D.Matzke, Chief architect TI, IEEE Computer (9/97)

Page 25: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

As Wire Delays Become Significant...

• Focus on architectures that

­ do not involve long distance communication

­ distribute control and data processing logic

Page 26: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Technology and Application Trends

• Feature size and the effect on signal delay

• Cost of verification and test of new designs

• The new media application shift

• Chip density and ILP

Page 27: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Verification And Test

• With increasing chip complexity, verification and testcosts form a significant component of the overall cost

­ Based on trends in previous slide

• Scaling current superscalar architectural techniques islikely to exacerbate the test and verification cost factor

• Long testing process will also affect time to market

Page 28: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Impact of Rising Verification And Test Costs

• Keep the architecture simple and regular

­ move complex decision making logic fromprocessor to higher level tools (compiler)

Page 29: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Technology and Application Trends

• Feature size and the effect on signal delay

• Cost of verification and test of new designs

• The new media application shift

• Chip density and ILP

Page 30: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Today’s Computational Requirements

[Dubey, IBM HotChips’97 Tutorial]

Application Domain

Real-time groupware, video conferencinghigh quality video for online interactive catalogs, streaming a/vworkgroup collaboration with 3-d graphicsvideo authoring, telegames with video/3-d graphicsspontaneous speech recognitiondigital library and media miningbroadband conferencingelectronic commerce with strong encryption

Equivalent# ofpentiums

2 410152020-3030+50+

• “…media processing will become the dominant force in computerarchitecture and microprocessor design.”

­ “How Multimedia Workloads Will Change Processor Design”,Diefendorff & Dubey, IEEE Computer (9/97)

• “…media processing will become the dominant force in computerarchitecture and microprocessor design.”

­ “How Multimedia Workloads Will Change Processor Design”,Diefendorff & Dubey, IEEE Computer (9/97)

Page 31: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Application Trends Summary

• Real-time processing• Packed 8-, 16-, and 32-bit integer data• Continuous data streams• Fine grain parallelism• Long integer arithmetic, table look-ups• Common kernels (small code size)• Low temporal reuse• High spatial locality

Page 32: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

The Impact of Media Application Trends

• Simple regular architectures are desirable­ scope for lots of MIMD processing­ tuned for media kernels­ need newer caching technology

­ requirements of predictability, high throughput­ low temporal reuse, high spatial reuse

Page 33: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Technology and Application Trends

• Feature size and the effect on signal delay

• Cost of verification and test of new designs

• The new media application shift

• Chip density and ILP

Page 34: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

0

200

400

600

800

1000

1200

1400

1600

1997 1999 2001 2003 2006 2009 2012

Year

MPU Transistors/chip (M)

DRAM Bits/chip (G)

Transistors / Chip

50 pentiums

Page 35: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Available instruction-level parallelism[Wall’93, DECWRL]

0

10

20

30

40

50

60

70

80

90

100

egre

sedd

yacc

eco

grr

met

alvi

comp

dodu

espr

fppp

gcc1

hydr

li mdlj

ora

swm

tomc

Application

ILP

Perfect Model

Superb Model

Good Model

Page 36: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

From Previous Two Slides...

• Lots of hardware parallelism available­ can accommodate approx. 50 pentiums on one die in 6 years

However,

• Conventional architectures and compilation­ cannot expose enough parallelism in applications­ even the “superb” model yields an ILP < 10 on average

• Need for new architectures and compilation techniques!

Page 37: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

What Is The Response ElsewhereTo All This?

Page 38: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Architecture Research Approaches

• Past approaches­ better instruction fetch/issue­ improved instruction processing­ better prediction (branches, aliases)­ statically scheduled variants of VLIW

• Novel (different) approaches­ Reconfigurable processors­ IRAM and variants­ Simultaneous multi-threading­ On-chip multi-processing

Page 39: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Two Noteworthy Directions

• Reconfigurable Processors­ let compiler handle everything­ no commitment to a particular architecture­ compiler generates architecture and code for it

• Explicitly Controlled Architectures­ simplify architectures as much as possible­ architectural template is a known, conventional one­ compiler handles a lot of processor’s decision making

­ explicitly control issue, scheduling, allocation

­ Explicitly Parallel Instruction Computing (EPIC)­ subset of explicitly controlled architectures

Page 40: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Frontend and Optimizer

Determine Dependences

Determine Independences

Bind Operations to Function Units

Bind Transports to Busses

Determine Dependences

Bind Transports to Busses

Execute

Superscalar

Dataflow

Indep. Arch.

VLIW

TTA

Compiler Hardware

Determine Independences

Bind Operations to Function Units

B. Ramakrishna Rau and Joseph A. Fisher. Instruction-level parallel: History overview, and perspective.The Journal of Supercomputing, 7(1-2):9-50, May 1993.

Compiler vs. Processor

Page 41: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Reconfigurable Computing:A Summary of Achievements

• Fastest RSA decryption­ 600Kb/s, 512b keys [DEC PRL]

• DNA sequence matching­ 100x faster than MPP’s, Cray3 [Splash, SRC]

• Filters on FPGA’s 10x faster than DSP’s­ [Xilinx,Altera application notes]

• Processor emulation­ [Butts IEEE CICC’95, Varghese IEEE Trans. VLSI’93]

• Hardware reuse­ [Multifunction PCMCIA, Wireless Video Coding UCLA]

Page 42: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Where can reconfigurabilitymake a difference?

• Applications requiring non-standard data-path­ FFT,DCT,CORDIC

• Static data, adaptive precision­ constant co-efficient filters, encryption-decryption

• Fault tolerance, real-time threat sensitive adaptation­ defense communication systems

• High-performance multifunction portables­ PDA’s, cellular phones, wearable computers

• Regular, fine-grained parallel processing­ signal/image processing (e.g. pattern recognition)

Page 43: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

What are the hurdles?

• Poor compilation times­ lack of correspondence between standard IR and final configurations­ place and route inherently complex

• Additional runtime overheads­ large configuration size implies high reconfiguration costs­ this also implies context switches are very costly

• Lack of convenient abstract models, language support­ models for algorithm development (e.g. RMESH, USC-MAARC)­ models for compiler targets (ReaCT-ILP)­ language support for hardware structural information

­ but not as complex as HDL’s

Page 44: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

State Of Compilation Technology

• Compilation for incremental architectures

­ well known technology

­ but bottleneck of “conventional compilation”

• Compilation for radically different architectures

­ no known efficient and automatic compilation

­ potential for breaking through the bottleneck

Page 45: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Simple ASIC

Complex ASIC

RaPiD

FPGA

GARP

DPGA

SuperSpeculative

RAW

TRACE (Multiscalar)

SMT

VECTOR

MultiChipCVH

SuperScalar

SimplePipelined/Embedded

EPIC/VLIW

0 4 16 32 64 128-512 1K-10K 100K-1M >1M

TTA

Adaptive EPIC

Early x86

Para

llelis

m

Approximate instruction packet size

Dataflow

What can be efficiently compiled for today?

Page 46: CS 339 -Topics in Computer Architecture: Instruction Level …narahari/cs339/intro.pdf · 2000-02-03 · Application Domain Real-time groupware, video conferencing high quality video

Bhagi Narahari, GWU CS 339 Spring 2000

Trends InTechnology, Applications,Architectures

( What can we infer? )

• Design/verification costs­ Simple, regular architectures

• Signal delays­ Shorter connections ; local interactions

• Media processing­ High throughput, highly compute intensive processing, many

integer types­ Not enough ILP through standard compilation­ Customized/special purpose compilation?­ Adaptive architectures?

Adaptive Explicitly Parallel Instruction Computing?Adaptive Explicitly Parallel Instruction Computing?