Burleson, UMASS1 Using System-on-a- Chip as a Vehicle for VLSI Design Education Andrew Laffely and...
-
date post
19-Dec-2015 -
Category
Documents
-
view
214 -
download
0
Transcript of Burleson, UMASS1 Using System-on-a- Chip as a Vehicle for VLSI Design Education Andrew Laffely and...
Burleson, UMASS 1
Using System-on-a-Chip as a Vehicle for VLSI Design Education
Andrew Laffely and Wayne BurlesonElectrical and Computer EngineeringUniversity of Massachusetts Amherst{alaffely,burleson}@ecs.umass.edu
This material is based upon work supported by the National Science Foundation under Grant No. 9988238 and SRC Tasks 766 and 1075
Burleson/UMASS 2
Challenges in VLSI Education
• Advancing Processing Technology • Higher level design tools• Realistic yet tractable design
projects • Preparation for jobs in
semiconductor and other sectors.• Making best use of faculty/student
time and university resources
Burleson/UMASS 3
ECE 559/659: VLSI Design Project (10 grads, 20 seniors)
• Learn design process for a complex VLSI in deep sub-micron CMOS
• Learn VLSI design skills and tools, including working in teams
• Learn about a particular application component and its VLSI implementation
• Learn to present formal design reviews using oral, written, graphical and web-based techniques
Course Objectives:
Burleson/UMASS 4
Key Aspects of the Course• aSoC (home-grown SoC platform)
• Provides a unifying framework to class• Allows for subdivision but inter-relation of projects• Interesting cutting edge architecture based on NSF-
and SRC-funded research at UMASS and elsewhere• Covers many aspects of VLSI Design• Realistic constraints on area, timing, power and I/O
• Graduate and undergraduate teamwork• Graduate students provide leadership, motivation and
experience• Commercial tools and design flow• Review-based evaluation
• Oral and web-based reports for 4 different reviews: proposal, feasibility, implementation,
integration
Burleson/UMASS 5
Adaptive System-on-a-Chip (aSoC)
• Tiled architecture with mesh interconnect• Point to point
communication pipeline
• Allows for heterogeneous cores• Differing sizes, clock
rates, voltages• Low-overhead core
interface for • On-chip bus substitute
for streaming applications
• Based on static scheduling• Fast and predictable
Proc
Tile
MultiplierFPGA
Multiplier
ctrl
SouthCore
West
North
East
CommunicationInterface
Burleson/UMASS 6
Communication Interface
• Custom design to maximize speed and reduce power• Core-ports• Crossbar• Controller• Instruction
memory• Local frequency
and voltage supply
Core
Core-ports
DecoderLocal
Frequency& Voltage
North to South & East
Instruction Memory
PC
Controller
North
South
East
West
Local Config.
North
South
East
West
Inputs Outputs
Crossbar
Burleson/UMASS 7
Class Projects
• SoC Infrastructure1,3
• Communication Interface
• Interconnect3
• Power Distribution• Clock System• Power
Management
• Cores• Motion estimation
for video encoding2,3
• AES Cryptography3
• Cache2,3
• Huffman Coding• 3D Graphics1,2,3
• Discrete Cosine Transform2,3
• Smart Card2,3
1 Used in PhD Dissertation 2 Used in Masters Thesis3 Used in Publications
Burleson/UMASS 8
Design Flowhttp://vsp2.ecs.umass.edu/vspg/658/TA_Tools/design_flow.html
• Architecture to Layout • Architecture: Block diagram of system and behavioral
description• Logic: Gate level or schematic description• Circuit: Transistor sizing• Layout: Floorplanning, clock and power distribution
• Tools• VerilogXL: behavioral representation• VTVT: standard cell library• Synopsys: standard cell gate level netlist generation• Silicon Ensemble: standard cell netlist to layout• Cadence LayoutPlus: schematic and layout design• NCSU CDK: design and extraction rules• Cadence Layout vs. Schematic: layout verification• HSPICE: circuit simulator
Burleson/UMASS 10
Advanced Signaling Techniques (building on SRC-funded work)
Differential current sensing
Booster Insertion
Multi-level current signalingPhase coding
Burleson/UMASS 11
Circuit Level Simulation (HSPICE)Evaluating Subsystems with realistic models
• Capacitance, resistance and inductance• Process variations• Process generations
Burleson/UMASS 12
Interconnect Characterization:Comparing delay and power of signaling techniques for different tile sizes at 250nm, 180nm, 130nm, 100n
Burleson/UMASS 13
Voltage Scaling Approach• Core-ports
• Single buffer for each stream to cross clock/voltage barrier between core and interface
• Reading/Writing success rates indicate core utilization
• Input blocked: Core too slow
• Output blocked: Core too fast
• Controller • Interprets core-port
success rates to adjust local clock and voltage Interconnect
Buffer
InputCore-port
OutputCore-port
Core
Clockand
SupplyController
LocalVdd
LocalClock
Blocked
Blocked
ProcessingPipeline
Burleson/UMASS 14
Vdd Selection Criteria
0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20
2
4
6
8
10
12
Voltage
NormalizedDelay
0.73
• As Vdd decreases delay increases exponentially
• Use curve to match available clock frequencies to voltages
• The voltage and frequency change reduces power by 79%, 96%, and 98.7% • P = C(Vdd)2f
Normalized Core Critical Path Delay vs. Vdd
Max Speed
1/4 Speed
1/2 Speed
1/8 Speed
1.16
Burleson/UMASS 15
Clock Distribution
64 tile aSoC 70nm 100nm 130nm 180nm
Chip Area (9.24mm)2 (13.3mm)2 (17.2mm)2 (23.8mm)2
Frequency 5 GHz 2 GHz 1 GHz 0.5 GHz
Power 126 mW 240 mW 445 mW 784 mW
Mean Skew 41 ps 50 ps 92 ps 70.6 ps
Percent Skew
21 % 10 % 9 % 4 %
Tile• Tiled architecture extends life
of globally synchronous systems
• Precise H-tree implementation• Load is small and equal at
each branch• Skew can be reduced by 70%
with advanced deskew circuits1
1 S. Tan et al. “Clock Generation and Distribution for the First IA-64 Microprocessor” IEEE JSSC, Nov. 2000
Burleson/UMASS 16
Power Distribution
64 tile aSoC
Vh Vmh Vml Vl
Voltage 1.8V 1.16V 0.73V 0.6V
Current per Core
110mA 25mA 13mA 7mA
Total Power 12.1 W 1.86 W 607 mW 269 mW
• Heterogeneous cores may require multiple power supply voltages
• Tile structure enables uniform interwoven grid
• Larger grid for higher current demands
• Reduced resistance• Higher capacitance
Gnd
Vh
Vl
Vml
Vmh
Burleson/UMASS 17
Architecture Evaluation(Motion Estimation)
• Array-based architecture • Pipelined ME
• Parameterized search window size• Full search• Choose 16x16
or 8x8 windows• Reduce power
AddressGeneration
Unit ProcessingElement
Array
Memory
FIFOs
Burleson/UMASS 18
Modify Existing Designs• Take existing Verilog code or hardware and improve
or change functionality (e.g. add motion estimation algorithms, provide AES key-length flexibility)
• Evaluate changes in performance and overhead
- Old PE Layout - New PE Layout
Burleson/UMASS 19
Conclusions• Advancing Process Technology
• Target .18u for affordable fab but also do scaling studies• Higher level design tools
• Combine synthesis and custom techniques• Realistic yet tractable design projects
• Re-use existing projects and provide unifying themes • Preparation for jobs in semiconductor and other sectors.
• Focus on system design and appropriate levels of abstraction• Teach how to learn new tools
• Making best use of faculty/student time and university resources • Leverage research• Combine grad and undergrad• Re-use materials, tools