ECE 5745 Complex Digital ASIC Design Course Overview · ECE 5745 Complex Digital ASIC Design Course...

53
ECE 5745 Complex Digital ASIC Design Course Overview Christopher Batten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece5745

Transcript of ECE 5745 Complex Digital ASIC Design Course Overview · ECE 5745 Complex Digital ASIC Design Course...

ECE 5745 Complex Digital ASIC DesignCourse OverviewChristopher Batten

School of Electrical and Computer EngineeringCornell University

http://www.csl.cornell.edu/courses/ece5745

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

RTL

Devices

ISA

PL

Algorithm

�Arch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course?

I Activity 1: Evaluation of Integer Multiplier

I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips

I Activity 2: Brainstorming for Sorting Accelerator

ECE 5745 Course Overview 2 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

The Computer Engineering Stack

Technology

Application

Gap too large to bridge in one step(but there are exceptions e.g., magnetic compass)

In its broadest definition, computer architecture is thedesign of the abstraction/implementation layers that allow us to

execute information processing applications efficientlyusing available manufacturing technologies

ECE 5745 Course Overview 3 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

The Computer Engineering Stack

In its broadest definition, computer architecture is thedesign of the abstraction/implementation layers that allow us to

execute information processing applications efficientlyusing available manufacturing technologies

CircuitsDevices

Programming LanguageAlgorithm

Com

pute

r Arc

hite

ctur

e

Technology

Application

Register-Transfer Level

Instruction Set ArchitectureMicroarchitecture

Operating System

Gate Level

ECE 5745 Course Overview 3 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

What is Computer Architecture?

In its broadest definition, computer architecture is thedesign of the abstraction/implementation layers that allow us to

execute information processing applications efficientlyusing available manufacturing technologies

CircuitsDevices

Programming LanguageAlgorithm

Com

pute

r Arc

hite

ctur

e

Technology

Application

Register-Transfer Level

Instruction Set ArchitectureMicroarchitecture

Operating System

Gate Level

Operating System

Gate LevelRegister-Transfer Level

Instruction Set ArchitectureMicroarchitecture

Application Requirements • Suggest how to improve architecture • Provide revenue to fund development

Technology Constraints • Restrict what can be done efficiently • New technologies make new arch possible

Architecture provides feedback to guideapplication and technology research directions

ECE 5745 Course Overview 4 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Key Metrics in Computer Architecture

P

I$

D$

Network

Network

Network

P

I$

P

I$

P

I$

D$ D$ D$

I Primary Metrics. Execution time (cycles/task). Energy (Joules/task). Cycle time (ns/cycle). Area (µm2)

I Secondary Metrics. Performance (ns/task). Average power (Watts). Peak power (Watts). Cost ($). Design complexity. Reliability. Flexibility

Discuss qualitative first-order analysisfrom ECE 4750 on board

ECE 5745 Course Overview 5 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Unanswered Questions from ECE 4750

P

I$

D$

Network

Network

Network

Xcel P

I$

D$ D$ D$

Xcel

AcceleratedInstructions

I How can we quantitatively evaluatearea, cycle time, and energy?

I How do we actually implementprocessors, memories, andnetworks in a real chip?

I How should we implement/analyzeapplication-specific accelerators?

. Very loosely coupledmemory-mapped accelerators

. More tightly coupled co-processoraccelerators

. Specialized instructions andfunctional units

ECE 5745 Course Overview 6 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

ASIC: Application-Specific Integrated Circuit

P

I$

D$

Network

Network

Network

Xcel P

I$

D$ D$ D$

Xcel

AcceleratedInstructions

Performance (Tasks per Second)

SimpleProc

Processor PowerConstraint

Ener

gy (J

oule

s pe

r Tas

k)

C

B

AAAA

Superscalarw/ Deeper

Pipelines

Out-of-Order Superscalar

Superpipelined

D

EEMulticore FFMulticore E

AAAA

Specialized Accelerators

ECE 5745 Course Overview 7 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

ASIC: Application-Specific Integrated Circuit

P

I$

D$

Network

Network

Network

Xcel P

I$

D$ D$ D$

Xcel

AcceleratedInstructions

Performance (Tasks per Second)

Ener

gy E

ffici

ency

(Tas

ks p

er J

oule

)

SimpleProcessor

Design PowerConstraint

High-PerformanceArchitectures

EmbeddedArchitectures

DesignPerformanceConstraint

Flexibil

ity vs.

Specia

lizatio

nCustom

ASICLess FlexibleAccelerator

More FlexibleAccelerator

ECE 5745 Course Overview 8 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Goal for ECE 5745 is to answer these questions!

P

I$

D$

Network

Network

Network

Xcel P

I$

D$ D$ D$

Xcel

AcceleratedInstructions

I How can we quantitatively evaluatearea, cycle time, and energy?

I How do we actually implementprocessors, memories, andnetworks in a real chip?

I How should we implement/analyzeapplication-specific accelerators?

. Very loosely coupledmemory-mapped accelerators

. More tightly coupled co-processoraccelerators

. Specialized instructions andfunctional units

ECE 5745 Course Overview 9 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Full Custom Design vs. Standard-Cell Design

L01 –Introduction 22

6.884 –S

pring 20052 Feb 2005

Full Custom D

esignDesigner is free to do anything, anywhere

–though each design team

usually imposes som

e disciplineM

ost time consum

ing design style–

Reserved for very high performance or very high volum

e devices (Intel m

icroprocessors, RF power amps for cellphones)

Requires complete custom

ization of all layers of wafer

Piece of full-custom m

ultiplier array, 1.0Pm

2-metal

Full-custom layoutin 1.0µm w/ 2 metal

layers

I Full-Custom Design (ECE 4740)

. Designer is free to do anything, anywhere; thoughteam usually imposes some design discipline

. Most time consuming design style; reserved forvery high performance or very high volume chips(Intel microprocessors, RF power amps forcellphones)

I Standard-Cell Design (ECE 5745)

. Fixed library of “standard cells” and SRAMmemory generators

. Register-transfer-level description is automaticallymapped to this library of standard cells, then thesecells are placed and routed automatically

. Enables agile hardware design methodology

ECE 5745 Course Overview 10 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Standard-Cell Design Methodology

L01 – Introduction 246.884 – Spring 2005 2 Feb 2005

Standard Cell ASICs• Also called Cell-Based ICs (CBICs)• Fixed library of cells plus memory generators• Cells can be synthesized from HDL, or entered in schematics• Cells placed and routed automatically• Requires complete set of custom masks for each design• Currently most popular hard-wired ASIC type (6.884 will use this)

Mem1 Mem

2

Cells arranged in rows

Generated memory arrays

L01 – Introduction 256.884 – Spring 2005 2 Feb 2005

Standard Cell DesignCells have standard height but vary in widthDesigned to connect power, ground, and wells by abutment

VDD Rail

GND Rail

Clock Rail

Cell I/O on M2Power

Rails in M1

Clock Rail (not typical)

NAND2 Flip-flop

Well Contact under Power Rail

Ripple carry adder with carrychain highlighted

ECE 5745 Course Overview 11 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Standard-Cell Design Methodology

Design in HDL

HDLSimulator

Place&Route

Switching Activity

PowerAnalysis

Synthesis

Standard Cells

Gate-Level Model

Layout

Area (�m2)

Execution Time(cycles/task)

Cycle Time (ns)Energy (J/task)

Area (�m2)Cycle Time (ns)Energy (J/task)

Energy (J/task)

ECE 5745 Course Overview 12 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Example Standard-Cell Chip PlotMotivation Architectural Patterns Scale VT Core Maven VT Core Evaluation

Single-Lane Vector-Thread Unit w/ 256 Registers

MIT CSAIL Christopher Batten 32 / 42ECE 5745 Course Overview 13 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

What is Complex Digital ASIC Design?

Circuits

Devices

Programming Language

Algorithm

Com

pute

r A

rch

itect

ure

Technology

Application

Register-Transfer Level

Instruction Set Architecture

Microarchitecture

Operating System

Gate Level

Circuits

Register-Transfer Level

Microarchitecture

Gate Level

layout ready for fabrication

Complex digital ASIC design isthe process of

quantitatively exploring thearea, cycle time, execution time, and

energy trade-offs

of various

automated standard-cell CAD tools

and then to transform the mostpromising design to

using

application-specific accelerators(and general-purpose proc+mem+net)

ECE 5745 Course Overview 14 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

RTL

Devices

ISA

PL

Algorithm

�Arch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course?

I Activity 1: Evaluation of Integer Multiplier

I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips

I Activity 2: Brainstorming for Sorting Accelerator

ECE 5745 Course Overview 15 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Course Structure

Part 1ASIC Design

Overview

Part 2Digital CMOS

Circuits

Part 3CAD Algorithms

P P

MM

PrereqComputer

Architecture

ECE 5745 Course Overview 16 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Part 1: ASIC Design Overview

P P

MM

Topic 1Hardware

DescriptionLanguages

Topic 2CMOS Devices

Topic 3CMOS Circuits

Topic 4Full-Custom

DesignMethodology

Topic 5Automated

DesignMethodologies

Topic 7Clocking, Power Distribution,

Packaging, and I/O

Topic 8Testing and Verification

Topic 6Closing

theGap

ECE 5745 Course Overview 17 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Part 2: Digital CMOS Circuits

Topic 10

Sequential

StateTopic

11

Interconnect

Topic 9

Combinational

Logic

ECE 5745 Course Overview 18 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Part 3: CAD Algorithms

RTL to LogicSynthesis

TechnologyIndependent

Synthesis

TechnologyDependentSynthesis

x = a'bc + a'bc'y = b'c' + ab' + ac

x = a'by = b'c' + ac

Placement

DetailedRouting

GlobalRouting

Topic 12Synthesis Algorithms

Topic 13Physical Design Automation

ECE 5745 Course Overview 19 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Five-Week Design Project

P

I$

D$

Network

Network

Network

Xcel P

I$

D$ D$ D$

Xcel

AcceleratedInstructions

Performance (Tasks per Second)

Ener

gy E

ffici

ency

(Tas

ks p

er J

oule

)

SimpleProcessor

Design PowerConstraint

High-PerformanceArchitectures

EmbeddedArchitectures

DesignPerformanceConstraint

Flexibil

ity vs.

Specia

lizatio

nCustom

ASICLess FlexibleAccelerator

More FlexibleAccelerator

ECE 5745 Course Overview 20 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

RTL

Devices

ISA

PL

Algorithm

�Arch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course?

I Activity 1: Evaluation of Integer Multiplier

I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips

I Activity 2: Brainstorming for Sorting Accelerator

ECE 5745 Course Overview 21 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Design starts have been decreasing ...

1995 2000 2005 2010 20140

2

4

6

8

10

12

Num

ber o

f ASI

C S

tarts

(in

thou

sand

s)

3% / year decline

Todd Austin, “Preparing for a Post Moore’s Law World,” MICRO-48 Keynote

ECE 5745 Course Overview 22 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

... and design costs are increasing ...Costs of Latest Nodes Are Skyrocketing

0

20

40

60

80

100

120

140

0.5u 0.35u 0.25u 0.18u 0.13u 90nm 65nm 45nm 28nm 20nm

Cost&to

&Market&($&million)

Silicon&Technology&Node

Mask&CostsS/W&Development&and&TestingH/W&Design&and&Verification

Source: International Business Strategies, T. Austin

22

$88M

$120M

$500K

Todd Austin, “Preparing for a Post Moore’s Law World,” MICRO-48 Keynote

ECE 5745 Course Overview 23 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

... but a renaissance in chip design is coming!

I End of Moore’s law = chip design in established tech nodes can flourishI “If silicon is the canvas we paint on, engineers can do more than just shrink

the canvas.” David Brooks, HarvardZvi Or-Bach, “FPGAs as ASIC Alternatives: Past & Future,” EE Times, April 2014.

ECE 5745 Course Overview 24 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Plenty of companies are making chips ...

ECE 5745 Course Overview 25 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

... even some surprising ones!

Google’s TPUI Custom chip for accelerating

Google’s TensorFlow library

I Tightly integrated into Google’sdata centers

SiFive’s HiFive 1I First commercial RISC-V SoC

I Basic RV32IMAC microcontroller

ECE 5745 Course Overview 26 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Cross-Layer Interaction is Critical

CircuitsDevices

Programming LanguageAlgorithm

Com

pute

r Arc

hite

ctur

e

Technology

Application

Register-Transfer Level

Instruction Set ArchitectureMicroarchitecture

Operating System

Gate LevelCircuits

Register-Transfer LevelMicroarchitecture

Gate Level

Most significant impact on area,cycle time, execution time, andenergy is usually at architecturaland microarchitectural levels

ECE 5745 Course Overview 27 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Course Motivation: Comp Arch Research Perspective

Transistors(Thousands)

MIPSR2K

IntelP4

1975 1980 1985 1990 1995 2000 2005 2010 2015

100

101

102

103

104

105

106

107

DECAlpha 21264

TypicalPower (W)

Frequency(MHz)

SPECintPerformance

~9%/year

~15%/year

Numberof Cores

Intel 48-Core Prototype

?

AMD 4-Core Opteron

ECE 5745 Course Overview 28 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Cross-Layer Interaction is Critical

CircuitsDevices

Programming LanguageAlgorithm

Com

pute

r Arc

hite

ctur

e

Technology

Application

Register-Transfer Level

Instruction Set ArchitectureMicroarchitecture

Operating System

Gate LevelCircuits

Register-Transfer LevelMicroarchitecture

Gate Level

Need to quantitatively understandarea, cycle time, and energytrade-offs to be able to createnew revolutionary architectures that tackle the most challenging physical design issues

ECE 5745 Course Overview 29 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Course Motivation: Circuits Research Perspective

Your Digital Circuit Here

Fighter Airplane~100,000 parts

Your Analog Circuit Here

Intel Sandy Bridge E2.27 Billion transistors

ECE 5745 Course Overview 30 / 48

• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

Cross-Layer Interaction is Critical

CircuitsDevices

Programming LanguageAlgorithm

Com

pute

r Arc

hite

ctur

e

Technology

Application

Register-Transfer Level

Instruction Set ArchitectureMicroarchitecture

Operating System

Gate LevelCircuits

Register-Transfer LevelMicroarchitecture

Gate Level

Circuit-level researchers need toappreciate the system-levelcontext for their subsystems;cross-layer interaction can generate some of the most exciting research ideas

ECE 5745 Course Overview 31 / 48

Complex Digital ASIC Design • Activity 1 • Case Study: Scalar vs. Vector Processors Activity 2

RTL

Devices

ISA

PL

Algorithm

�Arch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course?

I Activity 1: Evaluation of Integer Multiplier

I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips

I Activity 2: Brainstorming for Sorting Accelerator

ECE 5745 Course Overview 32 / 48

Complex Digital ASIC Design • Activity 1 • Case Study: Scalar vs. Vector Processors Activity 2

Fixed-Latency Iterative Multiplier Datapath

b_regb_mux_sel

32b

a_rega_mux_sel

32b

result_reg

result_mux_sel

result_en

32b

32b

32b

>>

<<

add_mux_sel

req_msg.areq_msg

32b resp_msg

req_msg.b

b_lsb

0

ECE 5745 Course Overview 33 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

RTL

Devices

ISA

PL

Algorithm

�Arch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course?

I Activity 1: Evaluation of Integer Multiplier

I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips

I Activity 2: Brainstorming for Sorting Accelerator

ECE 5745 Course Overview 34 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Scalar Processors with Multithreading

Programmer'sLogicalView

�T0

Memory

�T1 �T2 �T3 �T4 �Ti

ExecutionResources

ControlInstructions

MemoryInstructions

ArchitecturalRegisters

ECE 5745 Course Overview 35 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Scalar Processors with Multithreading

Programmer'sLogicalView

�T0

Memory

�T1 �T2 �T3 �T4 �Ti

loop: load a, a_ptr load b, b_ptr add c, a, b store c, c_ptr a_ptr++ b_ptr++ c_ptr++ branch

Vector-Vector Add Masked Filterloop: load a, a_ptr a_ptr++ branch a = 0 op0 op1 ... branch

ECE 5745 Course Overview 35 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Scalar Processors with Multithreading

Programmer'sLogicalView

�T0

Memory

�T1 �T2 �T3 �T4 �Ti

TypicalCoreMicro-Architecture

Ener

gy

Performance

MIMD�T0�T1

�T2�T3

Instr Memory

Data Memory

Multi-threadedCores

ECE 5745 Course Overview 35 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Vector-SIMD Processors

Programmer'sLogicalView

CT0elm 0 elm 1 elm 2 elm 3

Memory

CTjelm i

ArchitecturalVector Registerwith 4 Elements

Vector-SIMDArithmetic

Instructions

Vector-SIMDMemory

Instructions

ECE 5745 Course Overview 36 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Vector-SIMD Processors

Programmer'sLogicalView

CT0elm 0 elm 1 elm 2 elm 3

Memory

CTjelm i

loop: vload A, a_ptr vload B, b_ptr vadd C, A, B vstore C, c_ptr a_ptr++ b_ptr++ c_ptr++ branch

loop: vload A, a_ptr vset F, A = 0 vop0 under flag F vop1 under flag F ... a_ptr++ branch

Vector-Vector Add Masked Filter

ECE 5745 Course Overview 36 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Vector-SIMD Processors

Programmer'sLogicalView

CT0elm 0 elm 1 elm 2 elm 3

Memory

CTjelm i

TypicalCoreMicro-Architecture 0

213

Instruction Memory

CP VIU

VMU

Data Memory

Vect

or L

anes

Ener

gy

Performance

MIMD

Vector-SIMD

ECE 5745 Course Overview 36 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Quantitative Area EvaluationMotivation Architectural Patterns Scale VT Core Maven VT Core Evaluation

Single-Lane Vector-Thread Unit w/ 256 Registers

MIT CSAIL Christopher Batten 32 / 42ECE 5745 Course Overview 37 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Quantitative Area Evaluation

Quad-Corew/ Vertical

Multithreading

Vector-SIMD

(8 elm/lane)

1 thre

ad

2 thre

ads

4 thre

ads

8 thre

ads

1 c

ore

w/

4 lanes

4 c

ore

s w

/ 1

lane e

a

data cache

instruction cache

control processor

integer functional units

floating-point functional units

memory units

register file

control logic

Single-core multi-lane designreduces area by 15%

Multi-core single-lane designincreases area by 20%

1.75

1.25

1.00

0.50

0.00

0.25

0.75

1.50

Norm

alized A

rea

ECE 5745 Course Overview 38 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Quantitative Performance and Energy Evaluation

0.8 1.0 1.2 1.4 1.6Normalized Tasks / SecondNo

rmal

ized

Ener

gy /

Task

1.61.41.21.00.80.60.4

25

20

15

10

5

0Ener

gy /

Task

(�J)

Multithreaded Multicore(increasing number ofthreads per core)

Quad-Corew/ Vertical

Multithreading

1 th

read

2 th

read

s4

thre

ads

8 th

read

s

Performance reduction with increasingthreads due to increased cycle time and

thread management overhead onfine-grain loops

Single-Lane Vector(increasing vlen)

Vector-SIMD

(4 core w/ 1 lane)

data cacheinst cachecontrol procint func unitsmemory unitsregister filecontrol logic

1 el

m/la

ne2

elm

/lane

4 el

m/la

ne8

elm

/lane

leakage

ECE 5745 Course Overview 39 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Simple RISC Processor ASIC

SP Datapath SP Regfile

RAMSubbank

(2KB)

RAMSubbank

(2KB)

RAMSubbank

(2KB)

RAMSubbank

(2KB)

AH

IPC

on

tro

ller

SP Control

RAM Interface

VCO

Figure 2: STC1 chip plot and die photo. On the chip plot, the Exec Unit is colored blue, and theFetch Unit is colored green and purple.

HostPC

PLX9050

ATB0Controller

PowerSupplies

ProbePoints

PCI

SDRAM

STC1

Host ATB0 Daughter Card

Figure 3: Test System Block Diagram.The test system includes the ATB0test baseboard, a daughter card withthe STC1 chip, and a host computerto control the test system.

5

ECE 5745 Course Overview 40 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Simple RISC Processor ASIC

1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 41 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

750 ------------- . . . . . . . . . . . . . . . . . . . . . . . . .740 . . . . . . . . . . . . . . . . . . . . . . . . .730 . . . . . . . . . . . . . . . . . . . . . . . . .720 . . . . . . . . . . . . . . . . . . . . . . . . .710 . . . . . . . . . . . . . . . . . . . . . . * * .700 ------------- . . . . . . . . . . . . . . . . . . . . * * * * .690 . . . . . . . . . . . . . . . . . . . * * * * * .680 . . . . . . . . . . . . . . . . . * * * * * * . .670 . . . . . . . . . . . . . . . . * * * * * * * . .660 . . . . . . . . . . . . . . . . * * * * * * * . .650 ------------- . . . . . . . . . . . . . . * * * * * * * * * * .640 . . . . . . . . . . . . . . * * * * * * * * * * .630 . . . . . . . . . . . . . * * * * * * * * * * . .620 . . . . . . . . . . . . * * * * * * * * * * * * .610 . . . . . . . . . . . * . * * * * * * * * * * . .600 ------------- . . . . . . . . . . . * * * * * * * * * * * * . .590 . . . . . . . . . . * * * * * * * * * * * * * * .580 . . . . . . . . . . * * * * * * * * * * * * * . .570 . . . . . . . . . * * * * * * * * * * * * * * * *560 . . . . . . . . . * * * * * * * * * * * * * * * *550 ------------- . . . . . . . * * * * * * * * * * * * * * * * * .540 . . . . . . . * * * * * * * * * * * * * * * * . *530 . . . . . . . * * * * * * * * * * * * * * * * * *520 . . . . . . * * * * * * * * * * * * * * * * * * *510 . . . . . . * * * * * * * * * * * * * * * * * * *500 ------------- . . . . . * * * * * * * * * * * * * * * * * * * *490 . . . . * * * * * * * * * * * * * * * * * * * * *480 . . . . * * * * * * * * * * * * * * * * * * * * *470 . . . * * * * * * * * * * * * * * * * * * * * * *460 . . . * * * * * * * * * * * * * * * * * . * * * *450 ------------- . . . * * * * * * * * * * * * * * * * * * * * * *440 . . * * * * * * * * * * * * * * * * * * * * * * *430 . . * * * * * * * * * * * * * * * * * * * * * * *420 . . * * * * * * * * * * * * * * * * * * * * * * *410 . * * * * * * * * * * * * * * * * * * * * * * * *400 ------------- . * * * * * * * * * * * * * * * * * * * * * * * *390 . . . . . * * * * *380 . . . . . * * * * *370 . . . . . * * * * *360 . . . . . * * * * *350 --- . . . . * * * * * *340 . . . . * * * * * *330 . . . . * * * * * *320 . . . * * * * * * *310 . . . * * * * * * *300 --- . . . * * * * * * *290 . . . * * * * * * *280 . . * * * * * * * *270 . . * * * * * * * *260 . . * * * * * * * *250 --- . . * * * * * * * *240 . * * * * * * * * *230 . * * * * * * * * *220 . * * * * * * * * *210 . * * * * * * * * *200 --- * * * * * * * * * *190 * * * * * * * * * *180 * * * * * * * * * *

Figure 5: Shmoo plot for awide voltage range. The hori-zontal axis plots supply volt-age in mV, and the verticalaxis plots frequency in MHz.A * indicates correct opera-tion at that operating pointand a dot indicates failure.The shmoo plot shows com-bined results from two di�er-ent chips.

la t9, input_endforever:

la t8, input_startla t7, output

loop:lw t0, 0(t8)addu t8, 4bgez t0, possubu t0, $0, t0

pos:sw t0, 0(t7)addu t7, 4bne t8, t9, loopb forever

Figure 6: Test program for measuringtypical power consumption. The innerloop calculates the absolute values ofintegers in an input array and storesthe results to an output array. The in-put array had 10 integers during test-ing, and the program repeatedly runsthis inner loop forever.

7

I RISC processor w/ 8 KB SRAMI TSMC 0.18 µm processI 1.7⇥ 2.1 mmI 610K TransistorsI 450 MHz at 1.8 V

ECE 5745 Course Overview 41 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Scale Vector-Thread Processor ASIC

ECE 5745 Course Overview 42 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Scale Vector-Thread Processor ASIC

TSMC 0.18µm • 7.14 Million Transistors • 16.6 mm2 Core Area

CP

Vector Lane 0

Vector Lane 1

Vector Lane 2

Cache Control

Cache Tag CAMs

CacheDataRAM

Banks VMU

VIU

XBAR

Vector Lane 3

ECE 5745 Course Overview 43 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

Scale Energy vs. Performance Results

Motivation Vector-Threading Scale VT Processor Maven VT Processor Future Research

Energy-Efficiency of the Scale VT Processor

MIT CSAIL Christopher Batten 24 / 39ECE 5745 Course Overview 44 / 48

Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2

BRG Test Chip 1

Taped-out Layout for BRGTC1

HostInterface

debu

g

RISCCore

SortAccel

Memory Arbitration Unit

SRAMBank(2KB)

SRAMBank(2KB)

SRAMBank(2KB)

SRAMBank(2KB)

diff clk (+)diff clk (�)

singleended clk

rese

t

CtrlReghost2chip

chip2host

LVDSRecv

clkdiv

clk treeresettree

clk

out

The testing platform enables running smalltest programs on BRGTC1 to compare the

performance and energy of pure-software kernelsversus the HLS-generated sorting accelerator

2x2mm 1.3M transistors in IBM 130nmRISC processor, 16KB SRAMHLS-generated accelerators

Static Timing Analysis Freq. @ 246 MHz

Post-Silicon Evaluation Strategy

Taped out in March 2016, returned in Fall 2016Currently packaging and testingLVDS

divi

ded

clk

out

LVDS

clk

out

ECE 5745 Course Overview 45 / 48

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors • Activity 2 •

RTL

Devices

ISA

PL

Algorithm

�Arch

Technology

Application

OS

Gates

Circuits

Complex Digital ASIC Design

I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course?

I Activity 1: Evaluation of Integer Multiplier

I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips

I Activity 2: Brainstorming for Sorting Accelerator

ECE 5745 Course Overview 46 / 48

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors • Activity 2 •

Brainstorming for Sorting Accelerator

I$

D$

Arbiter

XcelP

Performance (Tasks per Second)

Ener

gy E

ffici

ency

(Tas

ks p

er J

oule

)

SoftwareBaseline

Sort array of integers stored in memoryProc sends Xcel base address and sizeProc waits for Xcel to finish sorting

I Software Baseline. Insertion sort O(n2). 20% of dynamic

instructions are lw/sw

I Hardware Accelerator

. What is ideal speedupassuming we still useinsertion sort?

. What if we use adifferent algorithm?

. Sketch hardware impl ofa sorting accelerator

. Where will acceleratorbe in the energy/perfspace?

ECE 5745 Course Overview 47 / 48

Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2

RTL

Devices

ISA

PL

Algorithm

�Arch

Technology

Application

OS

Gates

Circuits

Take-Away Points

I Complex digital ASIC design is the process ofquantitatively exploring the area, cycle time,execution time, and energy trade-offs ofgeneral-purpose and application-specific designsusing automated standard-cell CAD tools and thento transform the most promising design to layoutready for fabrication

I Course can provide useful experience withcross-layer interaction for students interested inpursuing research in computer architecture orcircuits or students interested in pursuing a careerin in industry development of ASICs

ECE 5745 Course Overview 48 / 48