EE382V: Embedded System Design and...
Transcript of EE382V: Embedded System Design and...
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 1
EE382V:Embedded System Design and Modeling
Andreas GerstlauerElectrical and Computer Engineering
University of Texas at [email protected]
Lecture 6 – System-Level Synthesis & Refinement
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 2
Lecture 6: Outline
• Synthesis
• System-level synthesis process
• Refinement
• Modeling flow
• The SpecC methodology
• System-on-Chip Environment (SCE)
• Tool flow
• Setup and tutorial
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 2
Synthesis
• Gajski’s Y-Chart
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 3
Behavior StructureSystem Synthesis
Processor
Logic
• Gajski’s Y-Chart
Synthesis
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 4
Behavior
Structure
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 3
• Gajski’s Y-Chart• Platform-based design (the other Y Chart)
Synthesis
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 5
Behavior
Structure
ArchitectureConstraints
Behavior /Function
• X-Chart
Synthesis
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 6
Behavior
Structure
ArchitectureConstraints
Behavior /Function
Quality
Specification
Implementation
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 4
• X-Chart• System-Level Synthesis
Synthesis
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 7
Synthesis
Specification Model
RefinementDecisionMaking
Hardware/Software Synthesis
ArchitectureBehavior
Platforms, IP Databases
Implementation Model
QualityStructure Performance Estimates
Transaction-Level Model
(TLM)
Model of Computation
(MoC)
Source: A. Gerstlauer, C. Haubelt, A. Pimentel, et al., “Electronic System-Level Synthesis Methodologies,“ TCAD, 2009.
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 8
Application Specification (MoC)
v1
C1
P1 P2
P3 P4
C2
Computation • Processes
Communication• Channels
• Variables
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 5
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 9
Platform Architecture Template
Components:• Processors
• Memories
• IPs, custom HW
• Buses, bridges
CPU1 Mem
HW CPU2
Arb
iter
Bri
dg
e
IP
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 10
OS
OS
System Definition
Bri
dg
e
v1
C1
P1 P2
CPU Mem
HW
P3
CPU2
P4
C2
System Definition = Application + Platform + Mapping
Arb
iter
Mapping decisions:• Allocation
• Partitioning
• Scheduling
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 6
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 11
Refined Transaction-Level Model
Bus1
P1 P2
OSC
PU
1
Mem
CPU2
P3
HW
Bus2
TX
P4
OS
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 12
Backend Hardware/Software Synthesis
Bus1
CP
U1
HW IP CPU2
Bus2
Compile
RTOS/ Driver
Synthesis
HALRTOS
EXEP2
OS
HAL TX
Program
P1
HALRTOS
EXE
Program
Processes in C
C-to-RTLSynthesis
P3
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 7
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 13
Pin-/Cycle-Accurate Implementation Model
CP
U1
Mem
Interface
HW IP
Arbiter
HALRTOS
EXEIC
Program
HALRTOS
EXE
Program
CPU2
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 14
Lecture 6: Outline
Synthesis
System-level synthesis process
• Refinement
• Modeling flow
• The SpecC methodology
• System-on-Chip Environment (SCE)
• Tool flow
• Setup and tutorial
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 8
Hardware vs. Software
• Double-roof model
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 15
system
component
logic
task
instructionarchitecture
RTL
gate
Hardware
Arch
ISA
Software
Implementation
Specification
Source: A. Gerstlauer, C. Haubelt, A. Pimentel, et al., “Electronic System-Level Synthesis Methodologies,“ TCAD, 2009.
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 16
Computation vs. Communication
Computation
Co
mm
un
icat
ion
A B
C
D F
Un-timed
Approximate-timed
Cycle-timed
Un-timed
Approximate-timed
A. System specification modelB. Timed functional modelC. Transaction-level model (TLM)D. Pin-/bus cycle-accurate model (P/BCAM)E. Computation cycle-accurate model (CCAM)F. Cycle-accurate model (CAM)
E
Cycle-timed
System design flow Path from model A to model F
Design methodology and modeling flow Set of models and transformations between models
Source: L. Cai, D. Gajski. “Transaction level modeling: An overview”, ISSS 2003
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 9
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 17
Abstraction Levels
Temporal orderLow abstraction
High abstraction
Implementation Detail
Spatial order
physical layout
unstructured
Structure
real time
untimed
Timing
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 18
Top-Down Design Flow
Implementation
Architecture
Specification
Logic Design
Product planning
Structure
pure functional
bus functional
RTL / ISA
gates
requirements
Timing
untimed
timing accurate
cycle accurate
gate delays
constraints
System Design
Processor Design
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 10
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 19
SpecC Design Methodology
Implementation model
Communication model
Specification model
Logic design
Product planning
pure functional
partitioned
bus functional
RTL / IS
requirements
untimed
scheduled
timing accurate
cycle accurate
constraints
Computation model
Processor design
Communication design
Computation design
Structure Timing
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 20
SpecC Design Methodology
untimed
estimated timing
timing accurate
cycle accurate
constraints
pure functional
transaction level
bus functional
RTL / IS
requirements
Specification model
Algor.IP
Proto.IP
Computation model
Communication refinement
Comp.IP
Implementation model
Softwaresynthesis
Interfacesynthesis
Hardwaresynthesis
RTOSIP
RTLIP
Computation refinement
Capture
Communication model
Product planning
Logic designStructure Timing
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 11
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 21
SpecC Design Methodology
System design Validation flow
Specification model
Algor.IP
Proto.IP
Computation model
Communication refinement
Communication model
Comp.IP
Estimation
ValidationAnalysis
Compilation Simulation model
Estimation
ValidationAnalysis
Compilation Simulation model
Estimation
ValidationAnalysis
Compilation Simulation model
Implementation model
Softwaresynthesis
Interfacesynthesis
Hardwaresynthesis
Backend Estimation
ValidationAnalysis
Compilation Simulation model
RTOSIP
RTLIP
Computation refinement
Capture
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 22
Specification Model
• High-level, abstract model
• Pure system functionality
• Algorithmic behavior
• No implementation details
• No implicit structure / architecture
• Behavioral hierarchy
• Untimed
• Executes in zero (logical) time
• Causal ordering
• Events only for synchronization
Specification model
Computation refinement
Computation model
Communication model
Implementation model
Communication refinement
Processor refinement
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 12
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 23
Specification Model Example
B2 B3
c2
B1
v1
B1
• Synthesizable specification model
• Hierarchical parallel-serial composition
• Communication through variables and standard channels
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 24
Computation Refinement
• PE allocation / selection
• Behavior partitioning
• Variable partitioning
• Scheduling
Specification model
Computation refinement
Computation model
Communication model
Implementation model
Communication refinement
Processor refinement
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 13
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 25
PE Allocation, Behavior Partitioning
• Allocate PEs
• Partition behaviors
• Globalize communicationB2 B3
c2
B1
v1
B1
Additional level of hierarchy to model PE structure
PE1
PE2
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 26
Model after Behavior Partitioning
B3B2
B1B1 PE1
c2
v1
B13rcv
B34snd
B13snd
B34rcv
cb13
cb34
PE2
Synchronization to preserve execution order/semantics
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 14
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 27
• Map global variables to local memories
Variable Partitioning
Shared memory vs. message passing implementation
B3
B13rcv
B34snd
B2
B1B1
B13snd
B34rcv
PE1
c2
v1
cb13
cb34
PE2
v1 v1
• Communicate data over message-passing channels
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 28
Model after Variable Partitioning
B3
B13rcv
B34snd
B2
B1B1
B13snd
B34rcv
PE1
c2
v1
cb13
cb34
PE2
v1
Keep local variable copies in sync• Communicate updated values at synchronization points• Transfer control & data over message-passing channel
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 15
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 29
Timed Computation
• Execution time of behaviors
• Estimated target delay / timing budget
• Granularity
• Behavior / function / basic-block level
Annotate behaviors
• Simulation feedback
• Synthesis constraints
behavior B2( in int v1, ISend c2 ) {
void main(void) { waitfor( delay1 );
c2.send( );
}};
waitfor( B2_DELAY1 );
waitfor( B2_DELAY2 );
1
5
10
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 30
Scheduling
• Static scheduling– Fixed behavior execution order
– Flattened behavior hierarchy
Serialize behavior execution on components
B2
B1B1
B13snd
B34rcv
PE1
• Dynamic scheduling
– Pool of tasks
– Scheduler, abstracted OS
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 16
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 31
Computation Model Example
B3
B13rcv
B34snd
B2
B1B1
B13snd
B34rcv
PE1
c2
v1
cb13
cb34
PE2
v1
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 32
Computation Model
• Component structure/architecture
• Top level of behavior hierarchy
• Behavioral/functional component view
• Behaviors grouped under top-level component behaviors
• Sequential behavior execution
• Timed
• Estimated execution delays
Specification model
Computation model
Communication model
Implementation model
Processor refinement
Computation refinement
Communication refinement
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 17
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 33
Communication Refinement
• Network allocation / protocol selection
• Channel partitioning
• Protocol stack insertion
• Inlining
Specification model
Computation model
Communication model
Implementation model
Processor refinement
Communication refinement
Computation refinement
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 34
Network Allocation / Channel Partitioning
B3
B34snd
B2
B1B1
B13snd
B34rcv
PE1
c2
v1
cb13
cb34
PE2
v1
B13rcv
• Allocate busses
• Partition channels
• Update communication
Additional level of hierarchy to model bus structure
Bus1
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 18
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 35
Model after Channel Partitioning
B3
B34snd
B2
B1B1
B13snd
B34rcv
PE1
v1
PE2
v1
B13rcv
c2
cb13
cb34
Bus1
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 36
Protocol Insertion
c2
cb13
cb34
Bus1
ProtocolLayer
Network
Layers
Bus1
• Insert protocol layer
• Bus protocol channel from database
• Create network layers
• Implement message-passing over bus protocol
• Replace bus channel
• Hierarchical combination of complete protocol stack
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 19
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 37
Model after Protocol Insertion
B2
B1B1
B13snd
B34rcv
PE1
v1
B3
B34snd
PE2
v1
B13rcv
Bus1
IBu
sSla
ve
IBu
sMas
ter
ready
addr[16]
data[32]
BusProtocol
Master Slave
IPro
toco
lSla
ve
IPro
toco
lMa
ste
r
ack
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 38
Inlining: Transaction-Level Model (TLM)
Bus1
IBu
sSla
ve
IBu
sMas
ter BusProtocol
PE1 PE2
ready
addr[16]
data[32]
IPro
toco
lSla
ve
IPro
toco
lMa
ste
r
ack
PE2PE2Bus
IBu
sSla
ve
PE1Bus
IBu
sMas
ter
PE1
• Create bus interfaces and drivers
read()/write() methods in C,
functionality + timing
IPro
toco
lSla
ve
IPro
toco
lMa
ste
r
BusProtocolTLM
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 20
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 39
BusProtocolTLM
Inlining: Pin-Accurate Model (PAM)
Bus1
IBu
sSla
ve
IBu
sMas
ter
PE1 PE2
PE2PE2BusPE1Bus
PE1
Transaction-LevelModel (TLM)
IPro
toco
lSla
ve
IPro
toco
lMa
ste
r
• Create bus interfaces and drivers
• Refine communication
BusProtocol
ready
addr[16]
data[32]
IPro
toco
lSla
ve
IPro
toco
lMa
ste
r
ackIB
usM
aste
r
IBu
sSla
ve
PE
2Pro
toco
l
IPro
toco
lSla
ve
PE
1Pro
toco
l
IPro
toco
lMa
ste
r
control
address[15:0]
data[31:0]
IBu
sSla
ve
IBu
sMas
ter
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 40
Communication Model Example
control
address[15:0]
data[31:0]
B3
B34snd
v1
B13rcv
B2
B1B1
B13snd
B34rcv
PE1
v1
PE2
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 21
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 41
Communication Model
• Component & bus structure/architecture
• Top level of hierarchy
• Bus-functional component models
• Timing-accurate bus protocols
• Behavioral component description
• Timed
• Estimated component delays
• Timing-accurate communication
Transaction-level model (TLM)
Pin-accurate model (PAM)
Bus cycle-accurate model (BCAM)
Specification model
Computation model
Communication model
Implementation model
Processor refinement
Communication refinement
Computation refinement
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 42
Processor Refinement
• Cycle-accurate implementation of PEs
• Hardware synthesis down to RTL
• Software synthesis down to IS
• Interface synthesis down to RTL/IS
Specification model
Computation model
Communication model
Implementation model
Processor refinement
Communication refinement
Computation refinement
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 22
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 43
Hardware Synthesis
• Schedule operations into clock cycles
• Define clock boundaries in leaf behavior C code
• Create FSMD model from scheduled C code– Controller + datapath
B3
B34snd
v1
B13rcv
PE2
PE2_CLK
PE2_CLK
PE2_CLK
Clock boundaries
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 44
Software Synthesis
• Implement behavior on processor instruction-set
• Code generation
• Compilation
B2
B1B1
B13snd
B34rcv
PE1
v1
Ff2MOVE r0, r1
SHL r3ADD r2, r3, r4INC r2
PUSH r1CALL Ff3POP r0
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 23
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 45
Interface Synthesis
• Implement communication on components
• Hardware bus interface logic
• Software bus drivers
PE2Bus
IBu
sSla
ve
PE
2Pro
toco
l
IPro
toco
lSla
ve
ready
ack
addr[15:0]
data[31:0]
PE1Bus
IBu
sMas
ter
PE
1Pro
toco
l
IPro
toco
lMa
ste
r
ready
ack
addr[15:0]
data[31:0]
S0
S1
S2
S3
S4
DRV
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 46
Implementation Model
Software processor Custom hardware
ready
ack
address[15:0]
data[31:0]
PE2
PE2_CLKPE1_CLK
OBJ
PORTA
PORTB
INTA
PORTC
PE1
Instruction Set Simulator (ISS)
S0
S1
S2
S3
S4
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 24
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 47
Implementation Model
• Cycle-accurate system description
• RTL description of hardware
– Behavioral/structural FSMD view
• Object code for processors
– Instruction-set co-simulation
• Clocked bus communication
– Bus interface timing based on PE clock
Specification model
Computation model
Communication model
Implementation model
Processor refinement
Communication refinement
Computation refinement
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 48
Lecture 6: Outline
Synthesis
System-level synthesis process
Refinement
Modeling flow
The SpecC methodology
• System-on-Chip Environment (SCE)
• Tool flow
• Setup and tutorial
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 25
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 49
Design Automation
• Synthesis = Decision making + model refinement
Successive, stepwise model refinement Layers of implementation detail
Refinement
Model n
DB
Model n+1
Specification model
Implementation model
Optim. algorithm
GUI
Design decisions
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 50
System-On-Chip Environment (SCE)
ArchnArchnTLMn
Impln
Spec
ImplnImpln
ISS
CP
U
Mem
IPHW
Bri
dg
e
Arb
iter CPU Bus DSP Bus
B5
DS
P
HALRTOS
B1
ISS
HALRTOS
B2,B3
B4
Synthesize target HW/SW
Compile onto MPSoC platform
Mem
IPHW
Bri
dg
e
CPU Bus DSP Bus
B3v1v2
B5B4
DSP
C4C2C1
OS + Drv
CPU
OS + Drv
Coren Coren
Coren Coren
Core1 Coren
B2B1
C3
Specification
System Design
SWDB
Systemmodels
CPUn.bin
Implementation Model
CE/BusModels
TLMnTLMnTLMi
Hardware Synthesis
Software Synthesis
RTLDB
RTLnRTLnRTLnISSnISSnISSn CPUn.binCPUn.bin
HWn.vHWn.vHWn.v
Design Decisions
Architecture Exploration
Scheduling Exploration
Network Exploration
Communication Synthesis
PE/OSModels
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 26
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 51
Abstract Programming Model
• Hierarchical process graph• Sequential processes
– ANSI C code
• Parallel-serial composition– Dependencies, Fork-join
• Abstract inter-process communication• Communication channels
– Message-passing, queues, etc.
• Shared variables
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 52
System Transaction-Level Model (TLM)
Generate MPSoC TLM simulation model
Fast and accurate for exploration and verification
• Compile onto multi-core/multi-processor platform
• Implement computation on processors/cores and busses
• Generate code for communication over bus network
Mem
IPHW
Bri
dg
e
CPU Bus DSP Bus
B3v1v2
B5B4
DSP
C4C2C1
OS + Drv
CPU
OS + Drv
Coren Coren
Coren Coren
Core1 Coren
B2B1
C3
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 27
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 53
System Implementation
• Synthesize hardware and software for each processor• High-level/behavioral RTL and interface synthesis
– Allocation, scheduling, binding
• Software synthesis and RTOS targeting– Code generation, firmware synthesis, cross-compile & link
CP
U
IPHW
Bri
dg
e
Arb
iter
DS
P
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 54
Cellphone Example: Hardware Platform
• 2 Subsystems• ARM7TDMI• Motorola DSP 56600k
• 4 Busses• AMBA AHB• DSP bus• IP & memory busses
• 2 Accelerator HW/IP blocks• DCT IP• Custom
codebook HW• 10 I/O HW blocks
DCT(IP)
TX
CPU(ARM7)
M1(SRAM)
M1Ctrl
I/O4(HW)
CoPro(HW)
DSP(DSP56k)
MBUS
BUS1 (AMBA AHB) BUS2 (DSP)
S
SS
S
M
S
M2/S
M1 M
Arb
ite
r1
Arb
IP BridgeM
S
DCTBus
S
I/O3(HW)
S
I/O2(HW)
S
I/O1(HW)
S
S
DMA(2753A)
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 28
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 55
Cellphone Example: Software Platform
• Operating systems• ARM7
• uCOS-II• DSP
• Custom, interrupt-driven multi-tasking kernel
DCT
TX
ARM7
M1Ctrl
I/O4
HW
DSP56k
MBUS
BUS1 (AMBA AHB) BUS2 (DSP)
Arb
ite
r1
IP Bridge
DCTBusI/O3I/O2I/O1
DMA
M1
uCOS-II Custom RTOS
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 56
Cellphone Example: Application
• Computation• ARM7
• MP3 Decoding• Jpeg Encoding
• DSP• GSM Transcoding
DCT
TX
ARM7
M1Ctrl
I/O4
HW
DSP56k
MBUS
BUS1 (AMBA AHB) BUS2 (DSP)
Arb
ite
r1
IP Bridge
DCTBusI/O3I/O2I/O1
DMA
M1
Enc DecJpeg
Codebk
SI BO BI SO
DCT
MP3
uCOS-II Custom RTOS
Application
Test code
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 29
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 57
Cellphone Example: Application
DCT
TX
ARM7
M1Ctrl
I/O4
HW
DSP56k
MBUS
BUS1 (AMBA AHB) BUS2 (DSP)
Arb
ite
r1
IP Bridge
DCTBusI/O3I/O2I/O1
DMA
M1
Enc DecJpeg
Codebk
SI BO BI SO
DCT
v1
C2
C1
C3
C4
C5
C6
C7
C8
C0
MP3
• Computation• ARM7
• MP3 Decoding• Jpeg Encoding
• DSP• GSM Transcoding
• Communication• Shared memory
– Camera frames• Message-passing
– MP3 and speech streaming
uCOS-II Custom RTOS
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 58
ComputationDesign
Design Methodology
Specification model
Algor.IP
Proto.IP
Computation model
Communication refinement
Communication model
Comp.IP
Estimation
ValidationAnalysis
Compilation Simulation model
Estimation
ValidationAnalysis
Compilation Simulation model
Estimation
ValidationAnalysis
Compilation Simulation model
Implementation model
Softwarecompilation
Interfacesynthesis
Hardwaresynthesis
Estimation
ValidationAnalysis
Compilation Simulation model
RTOSIP
RTLIP
Computation refinement
Capture
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 30
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 59
Computation Design (1)
• Architecture exploration• Allocation of Processing Elements (PE)
– Type and number of processors
– Type and number of custom hardware blocks
– Type and number of system memories
• Mapping to PEs– Map each behavior to a PE
– Map each complex channel to a PE
– Map each variable to a PE
Architecture model
Concurrent PEs
Abstract channels andmemory interfaces
IP1M1
B2. . . . . .. . . . . .
c2
c1
Mem
v1v3
PE2
B3
PE1
B1
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 60
Cellpone Example: Architecture Model
RcvData
Co-process
SpchOut
DCT
stripe[]
Decoder
JPEG Coder
SerOut
SpchIn
SerIn
DSP
Mem
DMA
HW
DCT_IP
BI
BO
SO
SICtrl
ARM7
DC
TA
dap
ter
Vocoder
MP3
Partitioning, synchronization, message-passing, RPC
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 31
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 61
Computation Design (2)
• Scheduling exploration
• Static scheduling of behaviors into sequential tasks– Group (flatten) behaviors into tasks
– Determine fixed execution order of behaviors in each task
• Dynamic scheduling of concurrent tasks by RTOS– Choose scheduling policy, i.e. round-robin or priority-based
– For each set of tasks, determine task priorities
Scheduled model
Abstract OS model insoftware PEs
IP1M1
B2. . . . . .. . . . . .
c2
c1
Mem
v1v3
PE2_OS
B3
OS Model
PE1
B1
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 62
Cellphone Example: Scheduled Model
RcvData
Co-process
SpchOut
DCT
stripe[]
Decoder
JPEG Coder
OSModel
SerOut
SpchIn
SerIn
DSPARM_OS
Mem
DMA
HW
DCT_IP
BI
BO
SO
SICtrl
ARM7
DSP_OS
DC
TA
dap
ter
Vocoder
MP3
OSModel
Scheduling, task refinement, OS model insertion
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 32
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 63
CommunicationDesign
Design Methodology
Specification model
Algor.IP
Proto.IP
Computation model
Communication refinement
Communication model
Comp.IP
Estimation
ValidationAnalysis
Compilation Simulation model
Estimation
ValidationAnalysis
Compilation Simulation model
Estimation
ValidationAnalysis
Compilation Simulation model
Implementation model
Softwarecompilation
Interfacesynthesis
Hardwaresynthesis
Estimation
ValidationAnalysis
Compilation Simulation model
RTOSIP
RTLIP
Computation refinement
Capture
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 64
Communication Design (1)
• Network exploration
• Allocation of system network– Type (protocols) and number of system busses
– Type and number of CEs (bridges and transducers, if applicable)
– System connectivity
• Routing of channels over busses– Map each communication channel to a system bus
(or an ordered list of busses, if applicable)
Network model
PEs + CEsMiddleware stacks
Point-to-point linksUntyped packet transfers
Untyped memory interfaces
M1Ctrl
TX
IP_TXMAC
intprotocol
IP1M1_LK
B2
PE2_OS
B3
OS Model
PE1
B1
L1 L2
Mem
char[512]
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 33
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 65
Cellphone Example: Network Model
ARM_OS
ARM7
DMA
DMA_HW
DCT
DCT_IP
M
S
DSP_OS
DSP
OSModel
HW
SI
SI_HW
BI
BI_HW
SO
SO_HW
BO
BO_HW
T
MSlinkBri
l
DC
TA
da
pte
r
link
DM
A
linkBri
linkBI
linkHW
linkSI
linkBO
linkSO
Mem
OSModel
Data conversion, channel merging
CE insertion, packeting, routing
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 66
Communication Design (2)
Maste
rPro
to Sla
veP
roto
mac m
ac
Mem
Pro
toco
l
IPP
roto
col
• Communication synthesis
• Assignment of bus parameters– Address mapping for each channel and memory interface
– Synchronization mechanism for each channel(dedicated or shared interrupts, polling)
– Transfer mode for each channel/interface (regular, burst, DMA)
Transaction-level model (TLM)
PEs + CEs + BussesProtocol stacks
Abstract bus channelsBus transactions + interrupts
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 34
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 67
Maste
rPro
to Sla
veP
roto
mac m
ac
Mem
Pro
toco
l
IPP
roto
col
Ma
sterP
roto S
lave
Pro
to
ma
c ma
c
Communication Design (2)
• Communication synthesis
• Assignment of bus parameters– Address mapping for each channel and memory interface
– Synchronization mechanism for each channel(dedicated or shared interrupts, polling)
– Transfer mode for each channel/interface (regular, burst, DMA)
Transaction-level model (TLM)
PEs + CEs + BussesProtocol stacks
Abstract bus channelsBus transactions + interrupts
Pin-accurate model (PAM)
Physical bus structureBit-accurate pins and wires
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 68
Cellphone Example: TLM
Synchronization, addressing, media acces
Arbitration, data slicing, interrupt handling
me
mm
ac
ARM_OS
ARM_HAL
AD
DR ARM7
DMA
ma
cm
em
DMA_HW
AD
DR
Mem
mem
Mem_HW
AD
DR
DCT
DCT_IP
cfMaster
cfSlave
ah
bP
roto
co
l
Bri
dg
e
DSP_OS
DS
P_
HA
L
DSP
AD
DR
OSModel
ma
c
intA intB intC intD
HW
ma
c
AD
DR
HW_HW
SI
ma
c
AD
DR
SI_HW
BI
mac
AD
DR
BI_HW
SO
ma
c
AD
DR
SO_HW
BO
ma
c
AD
DR
BO_HW
ma
c
AD
DR
T
dsp
Master
ds
pS
lave
dspProtocol
ma
c
AD
DR
pollPOLL_ADDR
pollPOLL_ADDR
pollPOLL_ADDR
poll POLL_ADDR
OSModel
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 35
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 69
Cellphone Example: Pin-Accurate Model
Implementation synthesis in backend tools
Interface and high-level synthesis on hardware side
Firmware, RTOS and C synthesis on the software side
ARM_BF
ISR
lPIC
ARM_OS
ARM_HALARM_HW
AD
DR ARM7
DMA
DMA_BF
AD
DR
AD
DR
Mem
Mem_BF
AD
DR
DCT
DCT_IP
Arbiter
T_BF
DS
P_
BF
PIC
DSP_OS
DS
P_H
AL DS
P_H
W
DSP
l
AD
DR
OSModel
ISR
HW
AD
DR
HW_BF
SI
SI_BF
BI
ADDR,POLL_ADDR
BI_BF
SO
SO_BF
BO
BO_BF
Bridge
AD
DR
AD
DR
ADDR,POLL_ADDR
ADDR,POLL_ADDR
ADDR,POLL_ADDR
OSModel
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 70
Cellphone Example: Single-Processor Results
• MP3 decoder on ARM
• 55 MP3 frames
• JPEG encoder on ARM
• 30 116x96 pictures
• GSM vocoder on DSP
• 163 speech frames encoded/decoded
TLM speed
• 2000 Mcycles/s peak
• 300-600 Mcycl/s sustained
TLM accuracy
• <3% average frame timing error 0
5
10
15
20
25
Spec. N-TLM P-TLM BFM ISS/RTL
Ave
rag
e E
rro
r [%
]
MP3
JPEG
GSM
0.01
0.1
1
10
100
1000
10000
100000
Spec. N-TLM P-TLM BFM ISS/RTL
Sim
ula
tio
n T
ime
[s]
MP3
JPEG
GSM
Spec. Net. TLM PAM Impl.
Spec. Net. TLM PAM Impl.
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 36
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 71
Cellphone Example: Multi-Processor Results
• Experimental setup
• 1.5 second MP3
• 640x480 picture
• 1.5 speech GSM
3s / 300M ARM cycles / 180M DSP cycles
0
5
10
15
20
25
30
35
Appl. Task FW TLM BFM ISS
Ave
rag
e E
rro
r [%
]
1
10
100
1000
10000
100000
Sim
ula
tio
n T
ime [
s]Avg. Error
Sim. Time
0
5
10
15
20
25
30
35
Appl. Task FW TLM BFM ISS
Ave
rag
e E
rro
r [%
]
1
10
100
1000
10000
100000
Sim
ula
tio
n T
ime [
s]Avg. Error
Sim. Time
• TLM simulation speed
• 300 Mcycles/s
• TLM accuracy
• <3% error
Prototyping and exploration with 100% fidelity
Spec Arch Net TLM PAM ISSSpec. Sched. Net. TLM PAM Impl.
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 72
SCE Exploration Results
• Suite of industrial size examples
ExampleSystem Architecture(masters → slaves)
Model size (LOC) Generation timeSpec Arch Net PAM
JPEG A1 CF→HW 1806 2732 2780 4642 1.01 s
Vocoder
A1 DSP→HW
7385
9594 9775 10679 4.77s
A2 DSP→HW1,HW2 9632 9913 10989 5.21s
A3 DSP→HW1,HW2,HW3 9659 9949 11041 5.76s
Mp3float
A1 CF→HW1
6900
28190 28204 29807 5.86s
A2CF→HW1,HW2,HW3HW1↔HW3HW2↔HW3
28275 28633 31172 6.18s
A3CF →HW1, HW2, HW3, HW4HW1↔HW3↔HW5HW2↔HW4↔HW5
28736 30202 32795 17.69s
Mp3fix
A1 ARM→2 I/O
13363
17131 17270 21593 3.85s
A2 ARM→2 I/O, LDCT, RDCT 18300 18564 23228 7.01s
A3ARM→2 I/O, LDCT, RDCTLDCT↔I/ORDCT↔I/O
18748 19079 24471 7.40s
Baseband A1DSP→HW, 4 I/O, TCF, DMA→Mem, BR, T, DMABR→DCT_IP
11481 17020 17685 21711 8.99s
Cellphone A1ARM→4 I/O, 2 DCT, TLDCT, RDCT→I/ODSP→HW, 4 I/O, T
16441 21936 22570 30072 9.49s
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 37
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 73
Lecture 6: Outline
Synthesis
System-level synthesis process
Refinement
Modeling flow
The SpecC methodology
• System-on-Chip Environment (SCE)
Tool flow
• Setup and tutorial
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 74
System-on-Chip Environment (SCE)
• Server and accounts
• ECE LRC Linux servers– Labs (ENS 507) or remote access (ssh)
• SpecC software (© by CECS, UCI)– /usr/local/packages/sce-20100908
– module load sce
• SCE tool set
• GUI– sce, sced, scchart
• Scripting– sce_allocate / sce_map / sce_schedule / …
• Documentation ($SPECC/doc)– SCE Manual (online via Help->Manual)
– SCE Tutorial (PDF or html)– SCE Specification Reference Manual (SpecRM.pdf)
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 38
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 75
SCE Tool Flow
• SCE Components:• Graphical frontend
(sce, scchart)
• Editor (sced)
• Compiler and simulator (scc)
• Profiling and analysis (scprof)
• Architecture refinement (scar)
• RTOS refinement (scos)
• Network refinement (scnr)
• Communication refinement (sccr)
• RTL refinement (scrtl)
• Software refinement (sc2c)
• Scripting interface (scsh)
• Tools and utilities ...
Architecture model
Specification model
Architecture Exploration
PE Allocation
Beh/Var/Ch Partitioning
Scheduled model
Scheduling Exploration
Static Scheduling
OS Task Scheduling
Network model
Network Exploration
Bus Network Allocation
Channel Mapping
Communication model
Communication Synth.
Bus Addressing
Bus Synchronization
PEDatabase
CEDatabase
BusDatabase
OSDatabase
GU
I / S
crip
ting
TLM
RTL Synthesis
Datapath Allocation
Scheduling/Binding
SW Synthesis
Code Generation
Compile and Link
RTLDB
SWDB
Implementation model
VerilogVerilog
VerilogBinary
BinaryBinary
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 76
SCE Main Window
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 39
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 77
SCE Source Editor
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 78
SCE Hierarchy Displays
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 40
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 79
SCE Compiler and Simulator
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 80
SCE Profiling and Analysis
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 41
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 81
GSM Vocoder Tutorial
• Enhanced Full Rate (EFR) standard (GSM 06.10, ETSI)• Lossy voice encoding/decoding for mobile telephony
– Speech synthesis model
– Input: speech samples @ 104 kbit/s
– Frames of 4 x 40 = 160 samples (4 x 5ms = 20ms of speech)
– Output: encoded bit stream @ 12.2 kbit/s (244 bits / frame)
• Timing constraint– 20ms per frame (total of 3.26s for sample speech file)
SpecC specification model
• 73 leaf, 29 hierarchical behaviors (9 par, 10 seq, 10 fsm)
• 9139 lines of SpecC code (~13000 lines of original C)
Short-termsynthesis filter
+Delay/ Adaptive codebook
10th-order LP filter
Speech
Fixed codebook
Long-termpitch filter
Residual pulses
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 82
Vocoder Specification Model
Filter memory
update
Closed-loop
pitch search
Algebraic (fixed)
pitch search
Linear prediction
(LP) analysis
Open loop
codebook search����
����
����
����
��
���
���
�����
�����
��
����
��
��������
��������
��
decode_12k2
Post_Filter
Bits2prm_12k2
Decode
LP parameters
4 subframes
bits
speech[160]
A(z)
synth[40]
synth[40]
prm[57]
prm[13]
decoder
��
��
��
��
��
����
����
���
���
����
����
����
�� ��
����
����
����
��
��
��
��
��������
��������
��������
��������
��������
��������
mem
ory
2x per frame
A(z)
2 subframes
prm2bits_12k2
pre_process
sample
prm[57]
bits
speech[160]
coder
vocoderspeech_in bits_in
speech_outbits_out
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 42
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 83
Vocoder Computation Model
res
Motorola DSP56600
Custom hardware
data
����������������
speech_in
Codebook
HW
prm_in���
���
���
��� Bits2PrmSpeech_In
cdbk_res
cdbk_data
���
���
��������
���
���
��������
��������
speech_out
��������
Speech_Out���
���
��������
���
���
��������
Prm2Bits���
���
���
���
prm_out
Post_Filter
Decode_12k2
D_lsp
Pre_process
LP_analysis
Open_loop
Closed_loop
Start_codebook
Wait_codebook
Update
res
data
speech
prm_in
prm
speechprm
prm_in
synth_out
speech_in
bits_out
DSP
bits_in
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 84
Vocoder Communication Model
nWR
Data[23:0]
intC
Addr[15:0]
MCS
nRD
Bus InterfaceBus Driver
Bus InterfaceBus Interface Bus Interface Bus Interface
HWCodebook
Speech_In Bits2Prm Prm2Bits Speech_Out
DSP
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 43
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 85
Vocoder Implementation Model
Cycle-accurate co-simulation
• DSP instruction-set simulator (ISS)• 70,500 lines of assembly code (running @ 60MHz)
• RTL SpecC for hardware• 45,000 lines of VHDL RTL code (running @ 100MHz)
nWR
Data[23:0]
intC
Addr[15:0]
MCS
nRD
CodebookFSMD
MotorolaDSP56600
ISSOBJ
DSP HW
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 86
Vocoder Results
• Productivity gain: 12 months vs. 1 hour = 2000x
Comm Impl
AutomatedUser / Refine
ManualModified
lines
Spec Arch
Arch Comm
3,275
914
30 min / < 2 min5~6 month
5 min / < 0.5 min
3~4 month
6,146
15 min / < 1 min
1~2 month
Refinement Effort
Total 50 min / < 4 min9~12 month10,355
Simulation Speed
1
10
100
1000
10000
100000
Spec Arch Comm Impl
Norm
aliz
ed T
im
Code Size
0
5000
10000
15000
20000
Spec Arch Comm Impl
Num
ber
of Lin
e
EE382V: Embedded Sys Dsgn and Modeling
Lecture 6
© 2014 A. Gerstlauer 44
EE382V: Embedded Sys Dsgn and Modeling, Lecture 6 © 2014 A. Gerstlauer 87
Lecture 6: Summary
• System-level synthesis
• X-chart
• System-level refinement
• SpecC methodology
• Flow of models at increasing level of detail
• System-on-Chip Environment (SCE)
• Automatic model generation