Experiences Using a Novel Python-Based Hardware Modeling...

1
Experiences Using a Novel Python-Based Hardware Modeling Framework for Computer Architecture Test Chips Christopher Torng, Moyang Wang, Bharath Sudheendra, Nagaraj Murali, Suren Jayasuriya, Shreesha Srinath, Taylor Pritchard, Robin Ying, and Christopher Batten School of Electrical and Computer Engineering Cornell University 1 Abstract This poster will describe a taped-out 2×2 mm 1.3 M-transistor test chip in IBM 130 nm designed using our new Python-based hardware modeling framework. The goal of our tapeout was to demonstrate the ability of this framework to enable Agile hardware design flows. Specifically, our approach has two pieces: I Unify all simulation (behavioral, cycle-level timing, RTL, and gate-level) within a single-language development framework I Integrate this framework with highly automated standard-cell design flows For small teams working on small computer architecture test chips for research or as part of an Agile hardware design flow, such an approach can enable rapid design iteration from RTL to layout, shortening the time to tapeout despite limited manpower. 2 The PyMTL Hardware Modeling Framework Computer architects have long traded off simulation time and accuracy by leveraging multiple modeling abstractions including functional-level (FL), cycle-level (CL), and register-transfer-level (RTL) modeling. However, each of these uses distinct modeling languages, modeling patterns, and modeling tools, creating a large impediment to vertically integrated, iterative refinement of a design from algorithm, to exploration, to implementation. FL CL RTL Modeling Productivity Efficiency Hardware Languages Level Level Description (PLL) (ELL) (HDL) MATLAB/Python C/C++ Verilog/VHDL Modeling Functional: Object Oriented: Concurrent-Structural: Patterns Data Structures, Classes, Methods, Combinational Logic, Algorithms Ticks and/or Clocked Logic, Events Port Interfaces Modeling Third-party Computer Simulator Tools Algorithm Architecture Generators, Packages and Simulation Synthesis Tools, Toolboxes Frameworks Verification Tools PyMTL is a unified and highly productive framework for FL, CL, and RTL modeling based on a common high-productivity language (Python2.7). What is PyMTL? Functional-Level Modeling (FL) - Behavior Cycle-Level Modeling (CL) - Behavior - Cycle-Approximate - Analytical Area, Energy, Timing Register-Transfer-Level Modeling (RTL) - Behavior - Cycle-Accurate Timing - Gate-Level Area, Energy, Timing PyMTL Hardware Modeling Framework [*] [*] D. Lockhart, G. Zibrat, and C. Batten. PyMTL: A Unified Framework for Vertically Integrated Computer Architecture Research. Int’l Symp. on Microarchitecture, Dec 2014. What Does PyMTL Enable? Verilog' RTL' Model' FL Model CL Model RTL Model Verilog RTL Model Test Harness Test Harness Test Harness Test Harness FL Model CL Model RTL Model Verilog RTL Model 1. Incremental refinement from algorithm to hardware implementation 2. Automated testing and integration of PyMTL-generated Verilog 3. Multi-level co-simulation of FL, CL, and RTL models 4. Construction of highly parameterized RTL chip generators 3 PyMTL for Computer Architecture Test Chips Why Build Computer Architecture Test Chips? There are many reasons to build computer architecture test chips in the contexts of both academia and industry. Key Aspect in Agile Hardware Design I Rapid design iteration I Reduces cost of validation I ”Building the right thing” Benefits Research I Builds research credibility I Highly reliable power and energy estimates for new architecture techniques C++ FPGA ASIC flow Tape-in Small tape-out Big tape-out * Adapted from Yunsup Lee IEEE Micro 2016 Design Methodologies: Large Chips vs. Small Chips Large-scale commercial chips and small computer architecture test chips have different design methodologies with different production targets. Large-Scale Commercial Chips I High-volume and high-yield production I Overcome design challenges with large teams Computer Architecture Test Chips I Low-volume and reasonable-yield production I Overcome design challenges despite small teams and limited resources Small teams that build computer architecture test chips with highly productive development frameworks can shorten the time to tapeout. PyMTL in Agile Hardware Design One such framework can be created by combining the PyMTL framework with a highly automated standard-cell design flow, potentially enabling small teams to push RTL changes to layout with a validated gate-level netlist within a day. FL Simulation CL Simulation RTL Simulation Post-Synthesis Gate-Level Simulation Post-Place-and-Route Gate-Level Simulation Synthesis Floorplanning Power Routing Placement Clock Tree Synthesis Routing Power Analysis DRC RCX LVS Transistor-Level Sim Tape Out PyMTL Framework Highly Automated Standard-Cell Design Flow PyMTL provides the unified design and testing environment for not only FL, CL, and RTL models but also for post-synthesis and post-PAR gate-level models. Combining with a highly automated standard-cell design flow allows small teams to meet low-volume and reasonable-yield production targets while shortening the time to tapeout. 4 PyMTL-Integrated Tapeout Methodology IBM 130nm SRAM Compiler PyMTL RTL of Full Design Verilog RTL of Full Design C-Based Accelerator Tested in C Verilog Gate- Level Simulator PyMTL-Driven Testing Framework Standard Cell Front-End Views SRAM Specification Post-PAR Gate-Level Netlist Post-Synthesis Gate-Level Netlist IBM 130nm PDK Synopsys vcat utility GDS of SRAM Macros SRAM Macros Front-End Views Full-Custom LVDS Receiver GDS & LEF Standard / Pad Cell Back-End Views GDS of Full Design DRC-Clean GDS of Full Design FPGA Logic w/ Full Design Verilog RTL of Accelerator PyMTL Verilog Import Tapeout-Ready GDS of Full Design PyMTL to Verilog Translator Synopsys Design Compiler Synopsys IC Compiler Calibre DRC Commercial Xilinx-Based FPGA Tools Commercial HLS Tool HLS FPGA ASIC PyMTL Calibre LVS PyMTL Simulator w/ Unit Tests and Assembly Test Suite VCD Traces Mature full-featured software testing tools PyMTL FL / CL Models Verilog RTL Modules Specially Annotated for FPGA Synthesis PyMTL Verilog Import Detailed Methodology Using PyMTL for Agile Hardware Design 5 PyMTL in Practice: BRG Test Chip 1 BRG Test Chip 1 (BRGTC1) was designed using the PyMTL hardware modeling framework. Our methodology uses PyMTL for: I Design (FL/CL/RTL) I Composition with a commercial high-level synthesis (HLS) toolflow I As a unified testing environment The testing environment drives: I FL/CL/RTL simulation I FPGA emulation I Post-synthesis/PAR gate-level simulation Host Interface debug RISC Core Sort Accel Memory Arbitration Unit SRAM Bank (2KB) SRAM Bank (2KB) SRAM Bank (2KB) SRAM Bank (2KB) diff clk (+) diff clk () single ended clk reset Ctrl Reg host2chip chip2host LVDS Recv clk div clk tree reset tree clk out LVDS divided clk out LVDS clk out Testing Plans After Fabrication I Xilinx ZC706 FPGA development board for FPGA prototyping I Custom-designed FMC mezzanine card for ASIC test chips The testing platform enables running test programs on BRGTC1 to compare the perfor- mance/energy of pure-software kernels versus the HLS-generated sorting accelerator. Chip Plot Taped Out: March 2016 Expected Return: Fall 2016 A2×2 mm 1.3 M-transistor RISC processor in IBM 130 nm, 16 KB SRAM, and HLS-generated accelerators. Static Timing Analysis Freq. @ 246 MHz 6 Acknowledgments Chip fabrication was made possible by the MOSIS Educational Program. This work was supported in part by NSF CAREER Award #1149464, NSF XPS Award #1337240, NSF CRI Award #1512937, and equipment/tool/IP donations from Intel, Synopsys, Cadence, Mentor Graphics, Xilinx, and ARM. We acknowledge and thank Mark Buckler for EDA toolflow development and Ivan Bukreyev for advice on full-custom design. Publication: Student Poster at the 28th Symposium on High Performance Chips (HotChips-28), Aug. 2016. URL: http://www.csl.cornell.edu/ ~ cbatten/pdfs/torng-brgtc1-hotchips2016.pdf Contact Author: Christopher Torng, , [email protected]

Transcript of Experiences Using a Novel Python-Based Hardware Modeling...

Page 1: Experiences Using a Novel Python-Based Hardware Modeling ...moyang/pdfs/torng-brgtc1-poster-hotchi… · modeling tools, creating a large impediment to vertically integrated, iterative

Experiences Using a NovelPython-Based Hardware Modeling Frameworkfor Computer Architecture Test Chips

Christopher Torng, Moyang Wang, Bharath Sudheendra,Nagaraj Murali, Suren Jayasuriya, Shreesha Srinath,Taylor Pritchard, Robin Ying, and Christopher Batten

School of Electrical and Computer EngineeringCornell University

1 Abstract

This poster will describe a taped-out 2×2 mm 1.3 M-transistor test chip in IBM130 nm designed using our new Python-based hardware modelingframework. The goal of our tapeout was to demonstrate the ability of thisframework to enable Agile hardware design flows.

Specifically, our approach has two pieces:I Unify all simulation (behavioral, cycle-level timing, RTL, and gate-level)

within a single-language development frameworkI Integrate this framework with highly automated standard-cell design flows

For small teams working on small computer architecture test chips forresearch or as part of an Agile hardware design flow, such an approach canenable rapid design iteration from RTL to layout, shortening the time totapeout despite limited manpower.

2 The PyMTL Hardware Modeling Framework

Computer architects have long traded off simulation time and accuracy byleveraging multiple modeling abstractions including functional-level (FL),cycle-level (CL), and register-transfer-level (RTL) modeling. However,each of these uses distinct modeling languages, modeling patterns, andmodeling tools, creating a large impediment to vertically integrated, iterativerefinement of a design from algorithm, to exploration, to implementation.

FL CL RTLModeling Productivity Efficiency HardwareLanguages Level Level Description

(PLL) (ELL) (HDL)MATLAB/Python C/C++ Verilog/VHDL

Modeling Functional: Object Oriented: Concurrent-Structural:Patterns Data Structures, Classes, Methods, Combinational Logic,

Algorithms Ticks and/or Clocked Logic,Events Port Interfaces

Modeling Third-party Computer SimulatorTools Algorithm Architecture Generators,

Packages and Simulation Synthesis Tools,Toolboxes Frameworks Verification Tools

PyMTL is a unified and highly productive framework for FL, CL, and RTLmodeling based on a common high-productivity language (Python2.7).

What is PyMTL?

Functional-Level Modeling (FL)

- Behavior

Cycle-Level Modeling (CL)

- Behavior

- Cycle-Approximate

- Analytical Area, Energy, Timing

Register-Transfer-Level Modeling (RTL)

- Behavior

- Cycle-Accurate Timing

- Gate-Level Area, Energy, Timing

PyMTL

Hardware Modeling

Framework [*]

[*] D. Lockhart, G. Zibrat, and C. Batten.

PyMTL: A Unified Framework

for Vertically Integrated

Computer Architecture Research.

Int’l Symp. on Microarchitecture, Dec 2014.

What Does PyMTL Enable?

Verilog'RTL'

Model'

FL

ModelCL

Model

RTL

Model

Verilog

RTL

Model

Test

Harness

Test

Harness

Test

Harness

Test

Harness

FL

Model

CL

Model

RTL

Model

Verilog

RTL

Model

1. Incremental refinement from algorithm

to hardware implementation

2. Automated testing and integration of

PyMTL-generated Verilog

3. Multi-level co-simulation of

FL, CL, and RTL models

4. Construction of highly parameterized

RTL chip generators

3 PyMTL for Computer Architecture Test Chips

Why Build Computer Architecture Test Chips?

There are many reasons to build computer architecture test chips in thecontexts of both academia and industry.

Key Aspect in Agile Hardware Design

I Rapid design iterationI Reduces cost of validationI ”Building the right thing”

Benefits Research

I Builds research credibilityI Highly reliable power and energy

estimates for new architecturetechniques

C++

FPGA

ASIC flow

Tape-in

Small tape-out

Big tape-out

* Adapted from Yunsup Lee

IEEE Micro 2016

Design Methodologies: Large Chips vs. Small Chips

Large-scale commercial chips and small computer architecture test chipshave different design methodologies with different production targets.

Large-Scale Commercial Chips

I High-volume and high-yieldproduction

I Overcome design challenges withlarge teams

Computer Architecture Test Chips

I Low-volume and reasonable-yieldproduction

I Overcome design challengesdespite small teams and limitedresources

Small teams that build computer architecture test chips with highlyproductive development frameworks can shorten the time to tapeout.

PyMTL in Agile Hardware Design

One such framework can be created by combining the PyMTL framework witha highly automated standard-cell design flow, potentially enabling small teamsto push RTL changes to layout with a validated gate-level netlist within a day.

FL Simulation

CL Simulation

RTL Simulation

Post-Synthesis

Gate-Level Simulation

Post-Place-and-Route

Gate-Level Simulation

Synthesis

Floorplanning

Power Routing

Placement

Clock Tree Synthesis

Routing

Power Analysis

DRC RCXLVS

Transistor-Level Sim

Tape Out

PyMTL

Framework

Highly Automated

Standard-Cell Design Flow

PyMTL provides the unified design and testing environment for not onlyFL, CL, and RTL models but also for post-synthesis and post-PAR gate-levelmodels. Combining with a highly automated standard-cell design flowallows small teams to meet low-volume and reasonable-yield productiontargets while shortening the time to tapeout.

4 PyMTL-Integrated Tapeout Methodology

IBM 130nm

SRAM Compiler

PyMTL RTL of Full Design

Verilog RTL

of Full Design

C-Based

Accelerator

Tested in C

Verilog Gate-

Level Simulator

PyMTL-Driven

Testing Framework

Standard Cell

Front-End Views

SRAM Specification

Post-PAR

Gate-Level Netlist

Post-Synthesis

Gate-Level Netlist

IBM 130nm PDK

Synopsys vcat utilityGDS of

SRAM Macros

SRAM Macros

Front-End Views

Full-Custom

LVDS Receiver

GDS & LEF

Standard / Pad Cell

Back-End Views

GDS of Full Design

DRC-Clean GDS

of Full Design

FPGA Logic

w/ Full Design

Verilog RTL of

Accelerator

PyMTL Verilog Import

Tapeout-Ready GDS

of Full Design

PyMTL to

Verilog Translator

Synopsys

Design Compiler

Synopsys

IC Compiler

Calibre DRC

Commercial

Xilinx-Based

FPGA Tools

Commercial

HLS Tool

HL

SF

PG

AA

SIC

PyMTL

Calibre LVS

PyMTL Simulator

w/ Unit Tests and

Assembly Test Suite

VCD

Traces

Mature full-featured

software testing tools

PyMTL FL / CL Models

Verilog RTL Modules

Specially Annotated

for FPGA Synthesis

PyMTL Verilog Import

Detailed Methodology Using PyMTL for Agile Hardware Design

5 PyMTL in Practice: BRG Test Chip 1

BRG Test Chip 1 (BRGTC1) wasdesigned using the PyMTL hardwaremodeling framework.

Our methodology uses PyMTL for:

I Design (FL/CL/RTL)I Composition with a commercial

high-level synthesis (HLS) toolflowI As a unified testing environment

The testing environment drives:

I FL/CL/RTL simulationI FPGA emulationI Post-synthesis/PAR gate-level

simulation

HostInterface

deb

ug

RISCCore

SortAccel

Memory Arbitration Unit

SRAMBank(2KB)

SRAMBank(2KB)

SRAMBank(2KB)

SRAMBank(2KB)

diff clk (+)

diff clk (−)

singleended clk

rese

t

CtrlReghost2chip

chip2host

LVDSRecv

clkdiv

clk treeresettree

clk

ou

t

LVD

S

div

ided

clk

ou

t

LVD

Scl

k o

ut

Testing Plans After Fabrication

I Xilinx ZC706 FPGA development board forFPGA prototyping

I Custom-designed FMC mezzanine card forASIC test chips

The testing platform enables running test programs on BRGTC1 to compare the perfor-mance/energy of pure-software kernels versus the HLS-generated sorting accelerator.

Chip Plot

Taped Out: March 2016 Expected Return: Fall 2016

A 2×2 mm 1.3 M-transistor RISC processor in IBM 130 nm,16 KB SRAM, and HLS-generated accelerators.

Static Timing Analysis Freq. @ 246 MHz

6 Acknowledgments

Chip fabrication was made possible by the MOSIS Educational Program. This work was supported in part by NSF CAREER Award #1149464, NSF XPS Award#1337240, NSF CRI Award #1512937, and equipment/tool/IP donations from Intel, Synopsys, Cadence, Mentor Graphics, Xilinx, and ARM. We acknowledgeand thank Mark Buckler for EDA toolflow development and Ivan Bukreyev for advice on full-custom design.

Publication: Student Poster at the 28th Symposium on High Performance Chips (HotChips-28), Aug. 2016. URL: http://www.csl.cornell.edu/~cbatten/pdfs/torng-brgtc1-hotchips2016.pdf Contact Author: Christopher Torng, , [email protected]