Post on 25-May-2018
ECE 5745 Complex Digital ASIC DesignCourse OverviewChristopher Batten
School of Electrical and Computer EngineeringCornell University
http://www.csl.cornell.edu/courses/ece5745
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
RTL
Devices
ISA
PL
Algorithm
�Arch
Technology
Application
OS
Gates
Circuits
Complex Digital ASIC Design
I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course?
I Activity 1: Evaluation of Integer Multiplier
I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips
I Activity 2: Brainstorming for Sorting Accelerator
ECE 5745 Course Overview 2 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
The Computer Engineering Stack
Technology
Application
Gap too large to bridge in one step(but there are exceptions e.g., magnetic compass)
In its broadest definition, computer architecture is thedesign of the abstraction/implementation layers that allow us to
execute information processing applications efficientlyusing available manufacturing technologies
ECE 5745 Course Overview 3 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
The Computer Engineering Stack
In its broadest definition, computer architecture is thedesign of the abstraction/implementation layers that allow us to
execute information processing applications efficientlyusing available manufacturing technologies
CircuitsDevices
Programming LanguageAlgorithm
Com
pute
r Arc
hite
ctur
e
Technology
Application
Register-Transfer Level
Instruction Set ArchitectureMicroarchitecture
Operating System
Gate Level
ECE 5745 Course Overview 3 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
What is Computer Architecture?
In its broadest definition, computer architecture is thedesign of the abstraction/implementation layers that allow us to
execute information processing applications efficientlyusing available manufacturing technologies
CircuitsDevices
Programming LanguageAlgorithm
Com
pute
r Arc
hite
ctur
e
Technology
Application
Register-Transfer Level
Instruction Set ArchitectureMicroarchitecture
Operating System
Gate Level
Operating System
Gate LevelRegister-Transfer Level
Instruction Set ArchitectureMicroarchitecture
Application Requirements • Suggest how to improve architecture • Provide revenue to fund development
Technology Constraints • Restrict what can be done efficiently • New technologies make new arch possible
Architecture provides feedback to guideapplication and technology research directions
ECE 5745 Course Overview 4 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Key Metrics in Computer Architecture
P
I$
D$
Network
Network
Network
P
I$
P
I$
P
I$
D$ D$ D$
I Primary Metrics. Execution time (cycles/task). Energy (Joules/task). Cycle time (ns/cycle). Area (µm2)
I Secondary Metrics. Performance (ns/task). Average power (Watts). Peak power (Watts). Cost ($). Design complexity. Reliability. Flexibility
Discuss qualitative first-order analysisfrom ECE 4750 on board
ECE 5745 Course Overview 5 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Unanswered Questions from ECE 4750
P
I$
D$
Network
Network
Network
Xcel P
I$
D$ D$ D$
Xcel
AcceleratedInstructions
I How can we quantitatively evaluatearea, cycle time, and energy?
I How do we actually implementprocessors, memories, andnetworks in a real chip?
I How should we implement/analyzeapplication-specific accelerators?
. Very loosely coupledmemory-mapped accelerators
. More tightly coupled co-processoraccelerators
. Specialized instructions andfunctional units
ECE 5745 Course Overview 6 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
ASIC: Application-Specific Integrated Circuit
P
I$
D$
Network
Network
Network
Xcel P
I$
D$ D$ D$
Xcel
AcceleratedInstructions
Performance (Tasks per Second)
SimpleProc
Processor PowerConstraint
Ener
gy (J
oule
s pe
r Tas
k)
C
B
AAAA
Superscalarw/ Deeper
Pipelines
Out-of-Order Superscalar
Superpipelined
D
EEMulticore FFMulticore E
AAAA
Specialized Accelerators
ECE 5745 Course Overview 7 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
ASIC: Application-Specific Integrated Circuit
P
I$
D$
Network
Network
Network
Xcel P
I$
D$ D$ D$
Xcel
AcceleratedInstructions
Performance (Tasks per Second)
Ener
gy E
ffici
ency
(Tas
ks p
er J
oule
)
SimpleProcessor
Design PowerConstraint
High-PerformanceArchitectures
EmbeddedArchitectures
DesignPerformanceConstraint
Flexibil
ity vs.
Specia
lizatio
nCustom
ASICLess FlexibleAccelerator
More FlexibleAccelerator
ECE 5745 Course Overview 8 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Goal for ECE 5745 is to answer these questions!
P
I$
D$
Network
Network
Network
Xcel P
I$
D$ D$ D$
Xcel
AcceleratedInstructions
I How can we quantitatively evaluatearea, cycle time, and energy?
I How do we actually implementprocessors, memories, andnetworks in a real chip?
I How should we implement/analyzeapplication-specific accelerators?
. Very loosely coupledmemory-mapped accelerators
. More tightly coupled co-processoraccelerators
. Specialized instructions andfunctional units
ECE 5745 Course Overview 9 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Full Custom Design vs. Standard-Cell Design
L01 –Introduction 22
6.884 –S
pring 20052 Feb 2005
Full Custom D
esignDesigner is free to do anything, anywhere
–though each design team
usually imposes som
e disciplineM
ost time consum
ing design style–
Reserved for very high performance or very high volum
e devices (Intel m
icroprocessors, RF power amps for cellphones)
Requires complete custom
ization of all layers of wafer
Piece of full-custom m
ultiplier array, 1.0Pm
2-metal
Full-custom layoutin 1.0µm w/ 2 metal
layers
I Full-Custom Design (ECE 4740)
. Designer is free to do anything, anywhere; thoughteam usually imposes some design discipline
. Most time consuming design style; reserved forvery high performance or very high volume chips(Intel microprocessors, RF power amps forcellphones)
I Standard-Cell Design (ECE 5745)
. Fixed library of “standard cells” and SRAMmemory generators
. Register-transfer-level description is automaticallymapped to this library of standard cells, then thesecells are placed and routed automatically
. Enables agile hardware design methodology
ECE 5745 Course Overview 10 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Standard-Cell Design Methodology
L01 – Introduction 246.884 – Spring 2005 2 Feb 2005
Standard Cell ASICs• Also called Cell-Based ICs (CBICs)• Fixed library of cells plus memory generators• Cells can be synthesized from HDL, or entered in schematics• Cells placed and routed automatically• Requires complete set of custom masks for each design• Currently most popular hard-wired ASIC type (6.884 will use this)
Mem1 Mem
2
Cells arranged in rows
Generated memory arrays
L01 – Introduction 256.884 – Spring 2005 2 Feb 2005
Standard Cell DesignCells have standard height but vary in widthDesigned to connect power, ground, and wells by abutment
VDD Rail
GND Rail
Clock Rail
Cell I/O on M2Power
Rails in M1
Clock Rail (not typical)
NAND2 Flip-flop
Well Contact under Power Rail
Ripple carry adder with carrychain highlighted
ECE 5745 Course Overview 11 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Standard-Cell Design Methodology
Design in HDL
HDLSimulator
Place&Route
Switching Activity
PowerAnalysis
Synthesis
Standard Cells
Gate-Level Model
Layout
Area (�m2)
Execution Time(cycles/task)
Cycle Time (ns)Energy (J/task)
Area (�m2)Cycle Time (ns)Energy (J/task)
Energy (J/task)
ECE 5745 Course Overview 12 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Example Standard-Cell Chip PlotMotivation Architectural Patterns Scale VT Core Maven VT Core Evaluation
Single-Lane Vector-Thread Unit w/ 256 Registers
MIT CSAIL Christopher Batten 32 / 42ECE 5745 Course Overview 13 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
What is Complex Digital ASIC Design?
Circuits
Devices
Programming Language
Algorithm
Com
pute
r A
rch
itect
ure
Technology
Application
Register-Transfer Level
Instruction Set Architecture
Microarchitecture
Operating System
Gate Level
Circuits
Register-Transfer Level
Microarchitecture
Gate Level
layout ready for fabrication
Complex digital ASIC design isthe process of
quantitatively exploring thearea, cycle time, execution time, and
energy trade-offs
of various
automated standard-cell CAD tools
and then to transform the mostpromising design to
using
application-specific accelerators(and general-purpose proc+mem+net)
ECE 5745 Course Overview 14 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
RTL
Devices
ISA
PL
Algorithm
�Arch
Technology
Application
OS
Gates
Circuits
Complex Digital ASIC Design
I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course?
I Activity 1: Evaluation of Integer Multiplier
I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips
I Activity 2: Brainstorming for Sorting Accelerator
ECE 5745 Course Overview 15 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Course Structure
Part 1ASIC Design
Overview
Part 2Digital CMOS
Circuits
Part 3CAD Algorithms
P P
MM
PrereqComputer
Architecture
ECE 5745 Course Overview 16 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Part 1: ASIC Design Overview
P P
MM
Topic 1Hardware
DescriptionLanguages
Topic 2CMOS Devices
Topic 3CMOS Circuits
Topic 4Full-Custom
DesignMethodology
Topic 5Automated
DesignMethodologies
Topic 7Clocking, Power Distribution,
Packaging, and I/O
Topic 8Testing and Verification
Topic 6Closing
theGap
ECE 5745 Course Overview 17 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Part 2: Digital CMOS Circuits
Topic 10
Sequential
StateTopic
11
Interconnect
Topic 9
Combinational
Logic
ECE 5745 Course Overview 18 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Part 3: CAD Algorithms
RTL to LogicSynthesis
TechnologyIndependent
Synthesis
TechnologyDependentSynthesis
x = a'bc + a'bc'y = b'c' + ab' + ac
x = a'by = b'c' + ac
Placement
DetailedRouting
GlobalRouting
Topic 12Synthesis Algorithms
Topic 13Physical Design Automation
ECE 5745 Course Overview 19 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Five-Week Design Project
P
I$
D$
Network
Network
Network
Xcel P
I$
D$ D$ D$
Xcel
AcceleratedInstructions
Performance (Tasks per Second)
Ener
gy E
ffici
ency
(Tas
ks p
er J
oule
)
SimpleProcessor
Design PowerConstraint
High-PerformanceArchitectures
EmbeddedArchitectures
DesignPerformanceConstraint
Flexibil
ity vs.
Specia
lizatio
nCustom
ASICLess FlexibleAccelerator
More FlexibleAccelerator
ECE 5745 Course Overview 20 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
RTL
Devices
ISA
PL
Algorithm
�Arch
Technology
Application
OS
Gates
Circuits
Complex Digital ASIC Design
I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course?
I Activity 1: Evaluation of Integer Multiplier
I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips
I Activity 2: Brainstorming for Sorting Accelerator
ECE 5745 Course Overview 21 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Design starts have been decreasing ...
1995 2000 2005 2010 20140
2
4
6
8
10
12
Num
ber o
f ASI
C S
tarts
(in
thou
sand
s)
3% / year decline
Todd Austin, “Preparing for a Post Moore’s Law World,” MICRO-48 Keynote
ECE 5745 Course Overview 22 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
... and design costs are increasing ...Costs of Latest Nodes Are Skyrocketing
0
20
40
60
80
100
120
140
0.5u 0.35u 0.25u 0.18u 0.13u 90nm 65nm 45nm 28nm 20nm
Cost&to
&Market&($&million)
Silicon&Technology&Node
Mask&CostsS/W&Development&and&TestingH/W&Design&and&Verification
Source: International Business Strategies, T. Austin
22
$88M
$120M
$500K
Todd Austin, “Preparing for a Post Moore’s Law World,” MICRO-48 Keynote
ECE 5745 Course Overview 23 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
... but a renaissance in chip design is coming!
I End of Moore’s law = chip design in established tech nodes can flourishI “If silicon is the canvas we paint on, engineers can do more than just shrink
the canvas.” David Brooks, HarvardZvi Or-Bach, “FPGAs as ASIC Alternatives: Past & Future,” EE Times, April 2014.
ECE 5745 Course Overview 24 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Plenty of companies are making chips ...
ECE 5745 Course Overview 25 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
... even some surprising ones!
Google’s TPUI Custom chip for accelerating
Google’s TensorFlow library
I Tightly integrated into Google’sdata centers
SiFive’s HiFive 1I First commercial RISC-V SoC
I Basic RV32IMAC microcontroller
ECE 5745 Course Overview 26 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Cross-Layer Interaction is Critical
CircuitsDevices
Programming LanguageAlgorithm
Com
pute
r Arc
hite
ctur
e
Technology
Application
Register-Transfer Level
Instruction Set ArchitectureMicroarchitecture
Operating System
Gate LevelCircuits
Register-Transfer LevelMicroarchitecture
Gate Level
Most significant impact on area,cycle time, execution time, andenergy is usually at architecturaland microarchitectural levels
ECE 5745 Course Overview 27 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Course Motivation: Comp Arch Research Perspective
Transistors(Thousands)
MIPSR2K
IntelP4
1975 1980 1985 1990 1995 2000 2005 2010 2015
100
101
102
103
104
105
106
107
DECAlpha 21264
TypicalPower (W)
Frequency(MHz)
SPECintPerformance
~9%/year
~15%/year
Numberof Cores
Intel 48-Core Prototype
?
AMD 4-Core Opteron
ECE 5745 Course Overview 28 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Cross-Layer Interaction is Critical
CircuitsDevices
Programming LanguageAlgorithm
Com
pute
r Arc
hite
ctur
e
Technology
Application
Register-Transfer Level
Instruction Set ArchitectureMicroarchitecture
Operating System
Gate LevelCircuits
Register-Transfer LevelMicroarchitecture
Gate Level
Need to quantitatively understandarea, cycle time, and energytrade-offs to be able to createnew revolutionary architectures that tackle the most challenging physical design issues
ECE 5745 Course Overview 29 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Course Motivation: Circuits Research Perspective
Your Digital Circuit Here
Fighter Airplane~100,000 parts
Your Analog Circuit Here
Intel Sandy Bridge E2.27 Billion transistors
ECE 5745 Course Overview 30 / 48
• Complex Digital ASIC Design • Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
Cross-Layer Interaction is Critical
CircuitsDevices
Programming LanguageAlgorithm
Com
pute
r Arc
hite
ctur
e
Technology
Application
Register-Transfer Level
Instruction Set ArchitectureMicroarchitecture
Operating System
Gate LevelCircuits
Register-Transfer LevelMicroarchitecture
Gate Level
Circuit-level researchers need toappreciate the system-levelcontext for their subsystems;cross-layer interaction can generate some of the most exciting research ideas
ECE 5745 Course Overview 31 / 48
Complex Digital ASIC Design • Activity 1 • Case Study: Scalar vs. Vector Processors Activity 2
RTL
Devices
ISA
PL
Algorithm
�Arch
Technology
Application
OS
Gates
Circuits
Complex Digital ASIC Design
I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course?
I Activity 1: Evaluation of Integer Multiplier
I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips
I Activity 2: Brainstorming for Sorting Accelerator
ECE 5745 Course Overview 32 / 48
Complex Digital ASIC Design • Activity 1 • Case Study: Scalar vs. Vector Processors Activity 2
Fixed-Latency Iterative Multiplier Datapath
b_regb_mux_sel
32b
a_rega_mux_sel
32b
result_reg
result_mux_sel
result_en
32b
32b
32b
>>
<<
add_mux_sel
req_msg.areq_msg
32b resp_msg
req_msg.b
b_lsb
0
ECE 5745 Course Overview 33 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
RTL
Devices
ISA
PL
Algorithm
�Arch
Technology
Application
OS
Gates
Circuits
Complex Digital ASIC Design
I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course?
I Activity 1: Evaluation of Integer Multiplier
I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips
I Activity 2: Brainstorming for Sorting Accelerator
ECE 5745 Course Overview 34 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Scalar Processors with Multithreading
Programmer'sLogicalView
�T0
Memory
�T1 �T2 �T3 �T4 �Ti
ExecutionResources
ControlInstructions
MemoryInstructions
ArchitecturalRegisters
ECE 5745 Course Overview 35 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Scalar Processors with Multithreading
Programmer'sLogicalView
�T0
Memory
�T1 �T2 �T3 �T4 �Ti
loop: load a, a_ptr load b, b_ptr add c, a, b store c, c_ptr a_ptr++ b_ptr++ c_ptr++ branch
Vector-Vector Add Masked Filterloop: load a, a_ptr a_ptr++ branch a = 0 op0 op1 ... branch
ECE 5745 Course Overview 35 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Scalar Processors with Multithreading
Programmer'sLogicalView
�T0
Memory
�T1 �T2 �T3 �T4 �Ti
TypicalCoreMicro-Architecture
Ener
gy
Performance
MIMD�T0�T1
�T2�T3
Instr Memory
Data Memory
Multi-threadedCores
ECE 5745 Course Overview 35 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Vector-SIMD Processors
Programmer'sLogicalView
CT0elm 0 elm 1 elm 2 elm 3
Memory
CTjelm i
ArchitecturalVector Registerwith 4 Elements
Vector-SIMDArithmetic
Instructions
Vector-SIMDMemory
Instructions
ECE 5745 Course Overview 36 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Vector-SIMD Processors
Programmer'sLogicalView
CT0elm 0 elm 1 elm 2 elm 3
Memory
CTjelm i
loop: vload A, a_ptr vload B, b_ptr vadd C, A, B vstore C, c_ptr a_ptr++ b_ptr++ c_ptr++ branch
loop: vload A, a_ptr vset F, A = 0 vop0 under flag F vop1 under flag F ... a_ptr++ branch
Vector-Vector Add Masked Filter
ECE 5745 Course Overview 36 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Vector-SIMD Processors
Programmer'sLogicalView
CT0elm 0 elm 1 elm 2 elm 3
Memory
CTjelm i
TypicalCoreMicro-Architecture 0
213
Instruction Memory
CP VIU
VMU
Data Memory
Vect
or L
anes
Ener
gy
Performance
MIMD
Vector-SIMD
ECE 5745 Course Overview 36 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Quantitative Area EvaluationMotivation Architectural Patterns Scale VT Core Maven VT Core Evaluation
Single-Lane Vector-Thread Unit w/ 256 Registers
MIT CSAIL Christopher Batten 32 / 42ECE 5745 Course Overview 37 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Quantitative Area Evaluation
Quad-Corew/ Vertical
Multithreading
Vector-SIMD
(8 elm/lane)
1 thre
ad
2 thre
ads
4 thre
ads
8 thre
ads
1 c
ore
w/
4 lanes
4 c
ore
s w
/ 1
lane e
a
data cache
instruction cache
control processor
integer functional units
floating-point functional units
memory units
register file
control logic
Single-core multi-lane designreduces area by 15%
Multi-core single-lane designincreases area by 20%
1.75
1.25
1.00
0.50
0.00
0.25
0.75
1.50
Norm
alized A
rea
ECE 5745 Course Overview 38 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Quantitative Performance and Energy Evaluation
0.8 1.0 1.2 1.4 1.6Normalized Tasks / SecondNo
rmal
ized
Ener
gy /
Task
1.61.41.21.00.80.60.4
25
20
15
10
5
0Ener
gy /
Task
(�J)
Multithreaded Multicore(increasing number ofthreads per core)
Quad-Corew/ Vertical
Multithreading
1 th
read
2 th
read
s4
thre
ads
8 th
read
s
Performance reduction with increasingthreads due to increased cycle time and
thread management overhead onfine-grain loops
Single-Lane Vector(increasing vlen)
Vector-SIMD
(4 core w/ 1 lane)
data cacheinst cachecontrol procint func unitsmemory unitsregister filecontrol logic
1 el
m/la
ne2
elm
/lane
4 el
m/la
ne8
elm
/lane
leakage
ECE 5745 Course Overview 39 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Simple RISC Processor ASIC
SP Datapath SP Regfile
RAMSubbank
(2KB)
RAMSubbank
(2KB)
RAMSubbank
(2KB)
RAMSubbank
(2KB)
AH
IPC
on
tro
ller
SP Control
RAM Interface
VCO
Figure 2: STC1 chip plot and die photo. On the chip plot, the Exec Unit is colored blue, and theFetch Unit is colored green and purple.
HostPC
PLX9050
ATB0Controller
PowerSupplies
ProbePoints
PCI
SDRAM
STC1
Host ATB0 Daughter Card
Figure 3: Test System Block Diagram.The test system includes the ATB0test baseboard, a daughter card withthe STC1 chip, and a host computerto control the test system.
5
ECE 5745 Course Overview 40 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Simple RISC Processor ASIC
1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 3 3 3 3 3 3 3 3 3 3 41 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
750 ------------- . . . . . . . . . . . . . . . . . . . . . . . . .740 . . . . . . . . . . . . . . . . . . . . . . . . .730 . . . . . . . . . . . . . . . . . . . . . . . . .720 . . . . . . . . . . . . . . . . . . . . . . . . .710 . . . . . . . . . . . . . . . . . . . . . . * * .700 ------------- . . . . . . . . . . . . . . . . . . . . * * * * .690 . . . . . . . . . . . . . . . . . . . * * * * * .680 . . . . . . . . . . . . . . . . . * * * * * * . .670 . . . . . . . . . . . . . . . . * * * * * * * . .660 . . . . . . . . . . . . . . . . * * * * * * * . .650 ------------- . . . . . . . . . . . . . . * * * * * * * * * * .640 . . . . . . . . . . . . . . * * * * * * * * * * .630 . . . . . . . . . . . . . * * * * * * * * * * . .620 . . . . . . . . . . . . * * * * * * * * * * * * .610 . . . . . . . . . . . * . * * * * * * * * * * . .600 ------------- . . . . . . . . . . . * * * * * * * * * * * * . .590 . . . . . . . . . . * * * * * * * * * * * * * * .580 . . . . . . . . . . * * * * * * * * * * * * * . .570 . . . . . . . . . * * * * * * * * * * * * * * * *560 . . . . . . . . . * * * * * * * * * * * * * * * *550 ------------- . . . . . . . * * * * * * * * * * * * * * * * * .540 . . . . . . . * * * * * * * * * * * * * * * * . *530 . . . . . . . * * * * * * * * * * * * * * * * * *520 . . . . . . * * * * * * * * * * * * * * * * * * *510 . . . . . . * * * * * * * * * * * * * * * * * * *500 ------------- . . . . . * * * * * * * * * * * * * * * * * * * *490 . . . . * * * * * * * * * * * * * * * * * * * * *480 . . . . * * * * * * * * * * * * * * * * * * * * *470 . . . * * * * * * * * * * * * * * * * * * * * * *460 . . . * * * * * * * * * * * * * * * * * . * * * *450 ------------- . . . * * * * * * * * * * * * * * * * * * * * * *440 . . * * * * * * * * * * * * * * * * * * * * * * *430 . . * * * * * * * * * * * * * * * * * * * * * * *420 . . * * * * * * * * * * * * * * * * * * * * * * *410 . * * * * * * * * * * * * * * * * * * * * * * * *400 ------------- . * * * * * * * * * * * * * * * * * * * * * * * *390 . . . . . * * * * *380 . . . . . * * * * *370 . . . . . * * * * *360 . . . . . * * * * *350 --- . . . . * * * * * *340 . . . . * * * * * *330 . . . . * * * * * *320 . . . * * * * * * *310 . . . * * * * * * *300 --- . . . * * * * * * *290 . . . * * * * * * *280 . . * * * * * * * *270 . . * * * * * * * *260 . . * * * * * * * *250 --- . . * * * * * * * *240 . * * * * * * * * *230 . * * * * * * * * *220 . * * * * * * * * *210 . * * * * * * * * *200 --- * * * * * * * * * *190 * * * * * * * * * *180 * * * * * * * * * *
Figure 5: Shmoo plot for awide voltage range. The hori-zontal axis plots supply volt-age in mV, and the verticalaxis plots frequency in MHz.A * indicates correct opera-tion at that operating pointand a dot indicates failure.The shmoo plot shows com-bined results from two di�er-ent chips.
la t9, input_endforever:
la t8, input_startla t7, output
loop:lw t0, 0(t8)addu t8, 4bgez t0, possubu t0, $0, t0
pos:sw t0, 0(t7)addu t7, 4bne t8, t9, loopb forever
Figure 6: Test program for measuringtypical power consumption. The innerloop calculates the absolute values ofintegers in an input array and storesthe results to an output array. The in-put array had 10 integers during test-ing, and the program repeatedly runsthis inner loop forever.
7
I RISC processor w/ 8 KB SRAMI TSMC 0.18 µm processI 1.7⇥ 2.1 mmI 610K TransistorsI 450 MHz at 1.8 V
ECE 5745 Course Overview 41 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Scale Vector-Thread Processor ASIC
ECE 5745 Course Overview 42 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Scale Vector-Thread Processor ASIC
TSMC 0.18µm • 7.14 Million Transistors • 16.6 mm2 Core Area
CP
Vector Lane 0
Vector Lane 1
Vector Lane 2
Cache Control
Cache Tag CAMs
CacheDataRAM
Banks VMU
VIU
XBAR
Vector Lane 3
ECE 5745 Course Overview 43 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
Scale Energy vs. Performance Results
Motivation Vector-Threading Scale VT Processor Maven VT Processor Future Research
Energy-Efficiency of the Scale VT Processor
MIT CSAIL Christopher Batten 24 / 39ECE 5745 Course Overview 44 / 48
Complex Digital ASIC Design Activity 1 • Case Study: Scalar vs. Vector Processors • Activity 2
BRG Test Chip 1
Taped-out Layout for BRGTC1
HostInterface
debu
g
RISCCore
SortAccel
Memory Arbitration Unit
SRAMBank(2KB)
SRAMBank(2KB)
SRAMBank(2KB)
SRAMBank(2KB)
diff clk (+)diff clk (�)
singleended clk
rese
t
CtrlReghost2chip
chip2host
LVDSRecv
clkdiv
clk treeresettree
clk
out
The testing platform enables running smalltest programs on BRGTC1 to compare the
performance and energy of pure-software kernelsversus the HLS-generated sorting accelerator
2x2mm 1.3M transistors in IBM 130nmRISC processor, 16KB SRAMHLS-generated accelerators
Static Timing Analysis Freq. @ 246 MHz
Post-Silicon Evaluation Strategy
Taped out in March 2016, returned in Fall 2016Currently packaging and testingLVDS
divi
ded
clk
out
LVDS
clk
out
ECE 5745 Course Overview 45 / 48
Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors • Activity 2 •
RTL
Devices
ISA
PL
Algorithm
�Arch
Technology
Application
OS
Gates
Circuits
Complex Digital ASIC Design
I Course goal, structure, motivation. What is the goal of the course?. How is the course structured?. Why should students want to take this course?
I Activity 1: Evaluation of Integer Multiplier
I Case Study: Scalar vs. Vector Processors. Example design-space exploration. Example real ASIC chips
I Activity 2: Brainstorming for Sorting Accelerator
ECE 5745 Course Overview 46 / 48
Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors • Activity 2 •
Brainstorming for Sorting Accelerator
I$
D$
Arbiter
XcelP
Performance (Tasks per Second)
Ener
gy E
ffici
ency
(Tas
ks p
er J
oule
)
SoftwareBaseline
Sort array of integers stored in memoryProc sends Xcel base address and sizeProc waits for Xcel to finish sorting
I Software Baseline. Insertion sort O(n2). 20% of dynamic
instructions are lw/sw
I Hardware Accelerator
. What is ideal speedupassuming we still useinsertion sort?
. What if we use adifferent algorithm?
. Sketch hardware impl ofa sorting accelerator
. Where will acceleratorbe in the energy/perfspace?
ECE 5745 Course Overview 47 / 48
Complex Digital ASIC Design Activity 1 Case Study: Scalar vs. Vector Processors Activity 2
RTL
Devices
ISA
PL
Algorithm
�Arch
Technology
Application
OS
Gates
Circuits
Take-Away Points
I Complex digital ASIC design is the process ofquantitatively exploring the area, cycle time,execution time, and energy trade-offs ofgeneral-purpose and application-specific designsusing automated standard-cell CAD tools and thento transform the most promising design to layoutready for fabrication
I Course can provide useful experience withcross-layer interaction for students interested inpursuing research in computer architecture orcircuits or students interested in pursuing a careerin in industry development of ASICs
ECE 5745 Course Overview 48 / 48