Burleson, UMASS1 Using System-on-a- Chip as a Vehicle for VLSI Design Education Andrew Laffely and...

19
Burleson, UMASS 1 Using System-on-a-Chip as a Vehicle for VLSI Design Education Andrew Laffely and Wayne Burleson Electrical and Computer Engineering University of Massachusetts Amherst {alaffely,burleson}@ecs.umass.edu This material is based upon work supported by the National Science Foundation under Grant No. 9988238 and SRC Tasks 766 and 1075
  • date post

    19-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Burleson, UMASS1 Using System-on-a- Chip as a Vehicle for VLSI Design Education Andrew Laffely and...

Burleson, UMASS 1

Using System-on-a-Chip as a Vehicle for VLSI Design Education

Andrew Laffely and Wayne BurlesonElectrical and Computer EngineeringUniversity of Massachusetts Amherst{alaffely,burleson}@ecs.umass.edu

This material is based upon work supported by the National Science Foundation under Grant No. 9988238 and SRC Tasks 766 and 1075

Burleson/UMASS 2

Challenges in VLSI Education

• Advancing Processing Technology • Higher level design tools• Realistic yet tractable design

projects • Preparation for jobs in

semiconductor and other sectors.• Making best use of faculty/student

time and university resources

Burleson/UMASS 3

ECE 559/659: VLSI Design Project (10 grads, 20 seniors)

• Learn design process for a complex VLSI in deep sub-micron CMOS

• Learn VLSI design skills and tools, including working in teams

• Learn about a particular application component and its VLSI implementation

• Learn to present formal design reviews using oral, written, graphical and web-based techniques

Course Objectives:

Burleson/UMASS 4

Key Aspects of the Course• aSoC (home-grown SoC platform)

• Provides a unifying framework to class• Allows for subdivision but inter-relation of projects• Interesting cutting edge architecture based on NSF-

and SRC-funded research at UMASS and elsewhere• Covers many aspects of VLSI Design• Realistic constraints on area, timing, power and I/O

• Graduate and undergraduate teamwork• Graduate students provide leadership, motivation and

experience• Commercial tools and design flow• Review-based evaluation

• Oral and web-based reports for 4 different reviews: proposal, feasibility, implementation,

integration

Burleson/UMASS 5

Adaptive System-on-a-Chip (aSoC)

• Tiled architecture with mesh interconnect• Point to point

communication pipeline

• Allows for heterogeneous cores• Differing sizes, clock

rates, voltages• Low-overhead core

interface for • On-chip bus substitute

for streaming applications

• Based on static scheduling• Fast and predictable

Proc

Tile

MultiplierFPGA

Multiplier

ctrl

SouthCore

West

North

East

CommunicationInterface

Burleson/UMASS 6

Communication Interface

• Custom design to maximize speed and reduce power• Core-ports• Crossbar• Controller• Instruction

memory• Local frequency

and voltage supply

Core

Core-ports

DecoderLocal

Frequency& Voltage

North to South & East

Instruction Memory

PC

Controller

North

South

East

West

Local Config.

North

South

East

West

Inputs Outputs

Crossbar

Burleson/UMASS 7

Class Projects

• SoC Infrastructure1,3

• Communication Interface

• Interconnect3

• Power Distribution• Clock System• Power

Management

• Cores• Motion estimation

for video encoding2,3

• AES Cryptography3

• Cache2,3

• Huffman Coding• 3D Graphics1,2,3

• Discrete Cosine Transform2,3

• Smart Card2,3

1 Used in PhD Dissertation 2 Used in Masters Thesis3 Used in Publications

Burleson/UMASS 8

Design Flowhttp://vsp2.ecs.umass.edu/vspg/658/TA_Tools/design_flow.html

• Architecture to Layout • Architecture: Block diagram of system and behavioral

description• Logic: Gate level or schematic description• Circuit: Transistor sizing• Layout: Floorplanning, clock and power distribution

• Tools• VerilogXL: behavioral representation• VTVT: standard cell library• Synopsys: standard cell gate level netlist generation• Silicon Ensemble: standard cell netlist to layout• Cadence LayoutPlus: schematic and layout design• NCSU CDK: design and extraction rules• Cadence Layout vs. Schematic: layout verification• HSPICE: circuit simulator

Burleson/UMASS 9

aSoC Implementation and Integration

3000

2500

.18 TSMC technology Full custom

Burleson/UMASS 10

Advanced Signaling Techniques (building on SRC-funded work)

Differential current sensing

Booster Insertion

Multi-level current signalingPhase coding

Burleson/UMASS 11

Circuit Level Simulation (HSPICE)Evaluating Subsystems with realistic models

• Capacitance, resistance and inductance• Process variations• Process generations

Burleson/UMASS 12

Interconnect Characterization:Comparing delay and power of signaling techniques for different tile sizes at 250nm, 180nm, 130nm, 100n

Burleson/UMASS 13

Voltage Scaling Approach• Core-ports

• Single buffer for each stream to cross clock/voltage barrier between core and interface

• Reading/Writing success rates indicate core utilization

• Input blocked: Core too slow

• Output blocked: Core too fast

• Controller • Interprets core-port

success rates to adjust local clock and voltage Interconnect

Buffer

InputCore-port

OutputCore-port

Core

Clockand

SupplyController

LocalVdd

LocalClock

Blocked

Blocked

ProcessingPipeline

Burleson/UMASS 14

Vdd Selection Criteria

0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 20

2

4

6

8

10

12

Voltage

NormalizedDelay

0.73

• As Vdd decreases delay increases exponentially

• Use curve to match available clock frequencies to voltages

• The voltage and frequency change reduces power by 79%, 96%, and 98.7% • P = C(Vdd)2f

Normalized Core Critical Path Delay vs. Vdd

Max Speed

1/4 Speed

1/2 Speed

1/8 Speed

1.16

Burleson/UMASS 15

Clock Distribution

64 tile aSoC 70nm 100nm 130nm 180nm

Chip Area (9.24mm)2 (13.3mm)2 (17.2mm)2 (23.8mm)2

Frequency 5 GHz 2 GHz 1 GHz 0.5 GHz

Power 126 mW 240 mW 445 mW 784 mW

Mean Skew 41 ps 50 ps 92 ps 70.6 ps

Percent Skew

21 % 10 % 9 % 4 %

Tile• Tiled architecture extends life

of globally synchronous systems

• Precise H-tree implementation• Load is small and equal at

each branch• Skew can be reduced by 70%

with advanced deskew circuits1

1 S. Tan et al. “Clock Generation and Distribution for the First IA-64 Microprocessor” IEEE JSSC, Nov. 2000

Burleson/UMASS 16

Power Distribution

64 tile aSoC

Vh Vmh Vml Vl

Voltage 1.8V 1.16V 0.73V 0.6V

Current per Core

110mA 25mA 13mA 7mA

Total Power 12.1 W 1.86 W 607 mW 269 mW

• Heterogeneous cores may require multiple power supply voltages

• Tile structure enables uniform interwoven grid

• Larger grid for higher current demands

• Reduced resistance• Higher capacitance

Gnd

Vh

Vl

Vml

Vmh

Burleson/UMASS 17

Architecture Evaluation(Motion Estimation)

• Array-based architecture • Pipelined ME

• Parameterized search window size• Full search• Choose 16x16

or 8x8 windows• Reduce power

AddressGeneration

Unit ProcessingElement

Array

Memory

FIFOs

Burleson/UMASS 18

Modify Existing Designs• Take existing Verilog code or hardware and improve

or change functionality (e.g. add motion estimation algorithms, provide AES key-length flexibility)

• Evaluate changes in performance and overhead

- Old PE Layout - New PE Layout

Burleson/UMASS 19

Conclusions• Advancing Process Technology

• Target .18u for affordable fab but also do scaling studies• Higher level design tools

• Combine synthesis and custom techniques• Realistic yet tractable design projects

• Re-use existing projects and provide unifying themes • Preparation for jobs in semiconductor and other sectors.

• Focus on system design and appropriate levels of abstraction• Teach how to learn new tools

• Making best use of faculty/student time and university resources • Leverage research• Combine grad and undergrad• Re-use materials, tools