Post on 24-Dec-2015
An IntegratedDebugging Environment for
Reprogrammable Hardware Systems
Kevin CameraHayden So
Bob Brodersen
Berkeley Wireless Research CenterUniversity of California, Berkeley
AADEBUG 2005
2
Outline
Motivation Existing platform Existing design/verification flow Proposed solution Environment features Walkthrough Implementation strategy
3
Application Domain
Direct-mapped, reprogrammable hardware systems
FPGA-based signalprocessing andsupercomputingarrays
4
FPGA Computing Benefits
Superior power, computation, and cost efficiency than any processor-based solution, due to direct mapping of algorithms
XC2VP70-7 C6415T-1G
Computation Rate(Gop/s) 72 4
Power Efficiency(Gop/s/W) 2.72 1.84
Price/Performance(Mop/s/$) 31.0 14.81
Chang, Wawrzynek, Brodersen; ISCA ‘05
5
BEE2: 2nd Berkeley Emulation Engine
(5) Xilinx V2P100 per board ~100K logic cells 2 PowerPC405 cores 444 dedicated multipliers 1MB on-chip SRAM 3.125Gb/s duplex links
(4) DDR2 banks per FPGA 72 bits per bank with ECC Up to 12.8 (DDR400) or 17
(533DDR) GB/s bandwidth Up to 4GB capacity
6
BEE Design Flow
Design entry is in the Matlab/Simulink environment Graphical, library based; also allows custom HDL
Typical FPGA path to physical implementation HDL synthesis and place and route Hierarchy is flattened in each pass (non-modular flow)
Netlist
Design
Place andRoute
VerifyHardware
7
Complexity,Accuracy
Design Verification Methods
High-level functional simulation
HDL/RTL simulation
Native FPGA execution
8
High-level Functional Simulation
Design executionin Matlab/Simulink
Intended to becorrect byconstruction
Fastest software-based simulation
Powerful and convenient algorithm exploration
9
Drawbacks of High-level Simulation
Even with high level of abstraction, vastly slower than hardware Trend is worsening with
increased FPGA capacity Doesn’t cover any side-
effects or requirements of the backend tool chain
2E-06
38.8
0
5
10
15
20
25
30
35
40
45
50
sec
HW SW
10
HDL/RTL Simulation
Varying levelsof accuracy
Access toarbitraryinternal signals
But, simulation speed is even slower Parameterization/Iteration is much harder
11
Native FPGA Execution
Runs at full speed of hardware
Three tools for on-FPGA testing:
Xilinx ChipScope Pro
System Generator HW-in-the-loop
Good old-fashioned signal probing
12
Xilinx ChipScope Pro
Inserts BRAM cores into design and binds to JTAG
Captures selected signals and provides trigger conditions
Signals of interest must be chosen in advance
Captured state is limited by available BRAM
Any changes require tool flow re-iteration
13
System Generator HW-in-the-loop
Allows hardware itself to accept and process data from Simulink via JTAG
Arbitrary number of data elements can be accessed as “ports”
Very powerful tool, but features limited process control
14
Hands-on Hardware Debugging
Most accurate method for finding timing-related bugs in a “production” system
Tradeoffs are all too well-known:
Complex equipment Limited probing pins A priori signal output Limited input options
15
Drawback of On-FPGA Execution
Place and route time is a major bottleneck
Complete run is needed for every design change
Increasingly problematic due to larger FPGA capacity
0
10
20
30
40
50
60
70
80
90
100
min
Synthesis 0.28333 1.26667 3.4
Place and Route 3.85 35.4 90.46667
PFB (3805)PFB x4 (15,301)
PFB x8 (30,601)
16
Proposed Solution
Enable extensive debugging and design exploration functionality directly on the hardware platform Vastly superior execution time for today’s
large-scale computing challenges Exploit the spatial resources of the hardware
to assist in debugging Essentially a -g switch to the hardware
design flow Minimize or eliminate iterations through
implementation flow
17
Caveats
Final timing of design will not be preserved Critical path will definitely be increased,
but 106 is a lot of headroom Timing-driven implementation still needed once
verification is complete Significantly more FPGA capacity and memory
will be needed Acceptable for scalable BEE-like platforms and
for modular, tiled algorithms
18
Essential Features of Environment
1. Robustly parameterized library components with soft configuration
Design exploration without tool iterations
2. Readily accessible variable contents Reading and writing of any values by user
3. Complete user-driven control over process execution
Single-step, bursts, breakpoints, assertions
19
1: Parameterized Library
Library components provide configuration parameters as inputs, which can be set by variables Allows runtime modification of function properties,
including precision, range, and latency Enables design-space exploration at hardware speed,
plus correction of configuration errors without re-implementation
• Number of bits
• Saturate / Wrap
• Binary point position
• Microarchitecture
20
2: Data Management
Ability to dynamically observe any variable’s value at the user’s request
Ability to overwrite a variable’s value at runtime and continue operation
Ability to rewind system state within the bounds of buffer capacity
21
2: Data Management Requirements
Too expensive to re-implement the hardware to expose new data All variables are streamed into local and off-
chip storage, such as DRAM and disks
Unlike software, hardware is highly parallel, and often deeply pipelined Memory requirements could be extreme Can be offset by hierarchical memory
architecture and/or periodic sampling
22
3: Process Control
Inherit the most useful features of software debuggers like GDB Cycle-by-cycle (single-step) execution Breakpoints (either state dependent, or fixed
cycle count)
Implemented using multiple clock domains and clock buffer control Already available for use on BEE2
23
Walkthrough: Design
Use specialized libraries to provide soft configuration
Integrates directly into the existing BEE2 tool flow
24
Walkthrough: Tagging
User tags signals of interest with debugging testpoints Defines a
variable name Defines other
parameters of interest for data observation
Also includes breakpoints and assertions
25
Walkthrough: Stitching
“Stitcher” updates the design before entering back-end tool flow Inserts logic as
needed for debug functions
Instantiates PowerPC core and master controller
Adds underlying connections to route data
26
Walkthrough: Runtime
User can monitor variables and control process execution from remote client Embedded
PowerPC software provides a thin service layer
Client is fully integrated with Matlab and Simulink input description
27
Control FPGA
User FPGA
PPC
UserDesign
Inserted Logic
ClockBufferLogic
Network 100MHz
User Defined(~1-10MHz)
ControlBreakpointinterrupt
Single-step
Clockdomains
DRAM
Control Architecture on BEE2
28
Stitching
Stitcher traverses the design hierarchy and: Replaces debugging component placeholders
with necessary logic Creates a simple route from all variables to
off-chip storage devices During execution, the stitcher records:
A mapping between variable names and their physical variable unit in hardware
The latency within the variable routing network
29
Variable Control Unit (VCU)
Inserted in place of each variable block in design
Automatically implied for every state variable in a state machine
Combination of local buffers and off-chip DRAM Exact memory
allocation is subject to experimentation
30
Debug Controller (DC)
Interface between all variable and assertion instances, the runtime user shell, and process control “services”
Regulates the system clock both for exceptions and to prevent variable storage overflows
31
Runtime Shell Examples
load Initialize or reset a design
halt Stop the design as soon as possible
runfor Run the design for a number of cycles
cont Run the design until the next exception
break View, enable, or change a breakpoint
view View a variable’s value or history
set Override a variable’s value or source
rewind Rewind the system state by n cycles
32
Future Work
Complete infrastructure for BEE2
Extensive experiments with variable memory Efficient methods for variable routing Storage requirements and hierarchy Time/Space tradeoffs for periodic sampling
Generalize framework to define concepts such as variable priorities, multiple debug levels, and extensions to text-based languages
Questions?