New Strategies for System Level Design Daniel Gajski Center for Embedded Computer Systems (CECS)...
-
Upload
steven-mcdonald -
Category
Documents
-
view
236 -
download
0
Transcript of New Strategies for System Level Design Daniel Gajski Center for Embedded Computer Systems (CECS)...
New Strategies for
System Level Design
Daniel Gajski
Center for Embedded Computer Systems (CECS)
University of California, Irvine
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Overview
• Introduction
• Issues
• Models
• Platforms
• Tools
• Benefits
• Conclusion
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Closing the System Gap
Specs
Algorithms
Capture &Simulate
Specs
Algorithms
Describe &Synthesize
ExecutableSpec
Algorithms
Specify, Explore& Refine
Architecture
Network
SW/HW
Logic
Physical
SW? SW?
Design
Logic
Physical
Design
Logic
Physical
Manufacturing Manufacturing Manufacturing
1960's 1980's 2000's
Functionality
Simulate Simulate
Describe
Algorithms
Connectivity
Protocols
Performance
Timing
System Gap
Real gap: behavior and structure (semantics and syntax)
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Simulation Based Methodology
case X is when X1=> . . . when X2=>
Finite state machine
Controller
3.415
2.715
--
--
Look-up table
Memory
Simuletable but not synthesizable or verifiable
Ambiguous semantics of hardware/system level languages
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
In Search of a Solution
a*(b+c) = a*b + a*c
Algebra: < objects, operations>
Arithmetic algebra allows creation of expressions and equivalences
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Model Algebra
B1
B2 B3
B1
B2 B3
PE1 PE2
=
Model algebra: <objects, compositions>
Model algebra allows creation of models and model equivalences
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Specify-Explore-Refine Methodology
System specificationmodel
Cycle accurateimplementation model
SER
Intermediate models
Design decisions
Model refinement
Replacement or re-composition
FPGA board
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
How many models?
Minimal set for any methodology
(3 is enough)
• System specification model (application designers)
• Transaction-level model (system designers)
• Pin&Cycle accurate model (implementation designers)
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Three Models (with Respect to OSI)
Pin / Cycle Accurate Model
Transaction Level Model
Specification Model
7. Application
6. Presentation
5. Session
4. Transport
3. Network
2b. Link + Stream
2a. Media Access Ctrl
2a. Protocol
1. Physical
7. Application
6. Presentation
5. Session
4. Transport
3. Network
2b. Link + Stream
2a. Media Access Ctrl
2a. Protocol
1. Physical
Address lines
Data lines
Control lines
TLM
Spec
P/CAM
Source: G Schirner
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
System Specification
Bri
dg
e
v1C
1B1 B2
CPU Mem
HW
B3
IP
B4
C2
System Definition = (Partial) Platform + (Partial) Spec
Arb
ite
r
Communication• Channels (in C)• Variables (in C)
Computation • Behaviors (in C)
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Transaction-Level Model (TLM)
CPU Bus
B1 B2
OS
B4
CP
UMem
IP
B3
HW
HALDrivers
IP Bus
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Pin/Cycle Accurate Model (P/CAM)C
PU
Mem
Bridge
HW IP
Arbiter
HALRTOS
EXE
P/CAM is downloaded automatically for fast prototyping
with FPGAs or ASIC design
IC
Program
Source: D. Gajski et al.
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
How many components?
Minimal set for any design
(4 is enough?)
• Processing element (PE)
• Memory
• Transducer / Bridge
• Arbiter
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
General System Model
Bus2Bus1 Bus3
Arbiter 1
PE 1.1
Transducer2-3
Arbiter 2
Arbiter 3
PE 1.2
Memory 1
PE 2.1(Master)
PE 3.1
Memory 3
Interrupt1.1
Interrupt2.1
PE 2.2(Slave)
Transducer1-2 Interrupt2.2
Interrupt3.1
Interrupt3.2
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Transducer Model
Processor1<clk1>
Processor2<clk2>
FSMD1<clk1>
FSMD2<clk2>
Transducer
Ready1Ack_ready1
Ready2Ack_ready2
Memory1 Memory2
PE1 PE2
Addr bus1 Addr bus2Data bus1 Data bus2
Interrupt2Interrupt1
Data1 Data2
Queue<clk3>
Source: H. Cho
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Processing Element: NISC technology
RF / Scratch pad
ALU MUL
Status
PC
AG
CW
offset
status
address
const
CMem
Memory
B2
B1
B3
Data forwarding
Datapath pipelining
Controller pipelining
Programmable controller
Multi-cycle units
Datapath Pipelined units
• Direct compilation of C to HW (fastest possible execution)• Statically and dynamically reconfigurable (anytime, anywhere)• Designed for manufacturability (solving timing closure)
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
General System Design Environment
Model A
Model B
Estimationtool
Synthesistool
Componentlibrary
Simulationtool
GUI Verifytool
ti
Refinementtool
Transforms:t1t2 . . .tn
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
How many tools?
Minimal set for any methodology
(2 is enough?)
• Front-End (for application developers)
– Input: C, C++, Mathlab, UML, …
– Output: TLM
• Back-End (for SW/HW system designers)
– Input : TLM
– Output: Pin/Cycle accurate Verilog/VHDL
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
DecisionUser
Interface (DUI)
ValidationUser
Interface (VUI)
TIMED
Create
Map
Compile
Replace
Select
Partition
Compiler
Debugger
Stimulate
Verify
ES Environment
Application Tools : Compilers/Debuggers Commercial Tools : FPGA, ASIC
CYCLE ACCURATE
Verify
Simulate
Check
Compile
ESE Front – End
System Capture + Platform Development
SW Development + HW Development
ESE Back – End
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Benefit: Spec-to-Prototype in 1 Week
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Does it work?
• Intuitively it does– Well defined models, rules, transformations, refinements– Worked in the past: layout, logic, RTL?– System level complexity simplified
• Proof of concept demonstrated– Embedded System Environment (ESE)– Automatic model generation– Model synthesis and verification– Universal IP technology (NISC)– Productivity gains greater then 1000
• Benefits– Large productivity gains– Easy design management – Easy derivatives– Shorter TTM
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Design flow with NISC technology
offset
status
const
address
CW
PC CMem
AG
statu
s
B1B2
ALU Memory
RF
MUL
B3
const
status
RF
OR
ALUAR
MemDR
offset
CMem
CW
PC
AG P
bL
Sum
Add
aL
Mul
for(int i=0; i<8; i++) for(int j=0; j<8; j++){ sum=0; for(int k=0; k<8; k++) sum = sum + A[i][k] ×B[k][j]; C[i][j] = sum;
}
for(int i=0; i<8; i++) for(int j=0; j<8; j++){ sum=0; for(int k=0; k<8; k++) sum = sum + A[i][k] ×B[k][j]; C[i][j] = sum;
}
for(int i=0; i<8; i++) for(int j=0; j<8; j++){ i8 = i × 8; sum = *(A + i8) × *(B + j); sum += *(A + i8 + 1) × *(B + 8 + j); ... C[i][j] = sum;
}
for(int i=0; i<8; i++) for(int j=0; j<8; j++){ i8 = i × 8; sum = *(A + i8) × *(B + j); sum += *(A + i8 + 1) × *(B + 8 + j); ... C[i][j] = sum;
}
NISC Compiler
Results
Code Refinement
NISC Refinement
NISC
Application
NISC
Application NISC
CompilerResults
Iterative design & refinement
Source: M. Reshadi
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
5.3X 2.1X 11.6X 2.5X
0.83X 1.3X 0 2.1X
1.25X NA NA NA
DCT with NISC technology
0
0.2
0.4
0.6
0.8
1
1.2
1.4
MIPS NMIPS CDCT1 CDCT2 CDCT3 CDCT4 CDCT5 CDCT6 CDCT7 Manual
Execution Time Power Energy Area
10X 1.3X 12.8X 3X
Performance Power saving Energy saving Area reduction
NMIPS vs. MIPS
CDCT3 vs. NMIPS
CDCT7 vs. NMIPS
CDCT7 vs. ManualSource: B. Gorjiara
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
MP3 on Xillinx with ESE
• Area• % of FPGA slices and BRAMS
• Performance• Time to decode 1 frame of MP3 data
0102030405060708090
100
SW+0 SW+1 SW+2 SW+4
Design Points
% c
hip
uti
lizati
on
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
seco
nd
s %Slices
%BRAMs
Exec. time
Source: S. Abdi
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
MP3 on Xillinx with ESE using NISC
• Area• NISC uses fewer FPGA slices and more BRAMs than manual HW
• Performance• NISC comparable to manual HW and much faster than SW
0102030405060708090
100
SW+0 SW+1 SW+2 SW+4 NISC+0
Design Points
% c
hip
uti
liza
tio
n
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
seco
nd
s %Slices
%BRAMs
Exec. time
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Development Time with ESE
0
10
20
30
40
50
60
70
Spec. TLM RTL Board
models
per
son
-day
s SW+0
SW+1
SW+2
SW+4
Manual Development Time
• Model Development time• Includes time for C, TLM and RTL Verilog coding and debugging
• ESE drastically cuts RTL and Board development time
ESE
0
10
20
30
40
50
60
70
Spec. TLM RTL Board
models
per
son
-day
s SW+0
SW+1
SW+2
SW+4
Source: S. Abdi
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Validation Time with ESE
0
1
2
3
4
5
6
7
8
9
10
Spec. TLM RTL Boardmodels
seco
nds SW+0
SW+1
SW+2
SW+4
Validation Time
• Simulation time measured on 3.3 GHz processor• Emulation time measured on board with Timer• ESE cuts validation time from hours to seconds
ESE
0
1
2
3
4
56
7
8
9
10
Spec. TLM RTL Boardmodels
seco
nds
SW+0
SW+1
SW+2
SW+4
X
ho
urs
18.06 hrs17.71 hrs17.56 hrs15.93 hrs
Source: S. Abdi
Copyright 2006 Daniel D. GajskiVLSIDAT 2006
Conclusions
• Extreme makeover is necessary for a new paradigm, where
– SW = HW = SOC = Embedded Systems– Simulation based flow is not acceptable– Design methodology is based on scientific principles
• Model algebra is enabling technology for– System design, modeling and simulation– System synthesis, verification, and test
• What is next?– Change of mind– Application oriented EDA– Looking for early adapters
Thank You
Daniel Gajski
Center for Embedded Computer Systems (CECS)
www.cecs.uci.edu