High-Level Synthesis, TLM Power State Machines, [-.33em ... · (mobile) embedded systems Background...
Transcript of High-Level Synthesis, TLM Power State Machines, [-.33em ... · (mobile) embedded systems Background...
I High-Level Synthesis, TLM Power State Machines,and advanced tracing for Virtual Platforms
Philipp A. [email protected]
OFFIS Institute for Information TechnologyR&D Division Transportation
16 March 2012
Quo Vadis, Virtual Platforms?QVVP’2012, Dresden
Philipp A. Hartmann QVVP’2012 16 March 2012
I 1 Motivation
ProblemI Power consumption becomes
increasingly important for
(mobile) embedded systems
BackgroundI Limited increase in battery capacity
I Improved power management needed1G
2G
2.5G
3G
4G
1
10
100
1.000
10.000
1.000.000
10.000.000
100.000
Performance Shannon‘s law2x in 8.5 months
Moore‘s law2x in 18 months
1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020
Memory accesstime2x in 12 years
Eveready`s law(Battery energydensity)2x in 10 years
Powerreduction
Algorithmiccomplexity
CPU-memorybandwidth
Source: Jan M. Rabaey
ChallengesI Complex, distributed HW/SW systems
I Heterogeneous hardware platforms
I Optimization needs real usage scenarios
Virtual Prototype based solutionI Full system simulation at high speed
I Block level power tracing
I Deterministic, repeatable scenarios
Philipp A. Hartmann QVVP’2012 16 March 2012
I 1 Motivation
ProblemI Power consumption becomes
increasingly important for
(mobile) embedded systems
BackgroundI Limited increase in battery capacity
I Improved power management needed1G
2G
2.5G
3G
4G
1
10
100
1.000
10.000
1.000.000
10.000.000
100.000
Performance Shannon‘s law2x in 8.5 months
Moore‘s law2x in 18 months
1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020
Memory accesstime2x in 12 years
Eveready`s law(Battery energydensity)2x in 10 years
Powerreduction
Algorithmiccomplexity
CPU-memorybandwidth
Source: Jan M. Rabaey
ChallengesI Complex, distributed HW/SW systems
I Heterogeneous hardware platforms
I Optimization needs real usage scenarios
Virtual Prototype based solutionI Full system simulation at high speed
I Block level power tracing
I Deterministic, repeatable scenarios
Philipp A. Hartmann QVVP’2012 16 March 2012
I 1 Motivation
ProblemI Power consumption becomes
increasingly important for
(mobile) embedded systems
BackgroundI Limited increase in battery capacity
I Improved power management needed1G
2G
2.5G
3G
4G
1
10
100
1.000
10.000
1.000.000
10.000.000
100.000
Performance Shannon‘s law2x in 8.5 months
Moore‘s law2x in 18 months
1980 1984 1988 1992 1996 2000 2004 2008 2012 2016 2020
Memory accesstime2x in 12 years
Eveready`s law(Battery energydensity)2x in 10 years
Powerreduction
Algorithmiccomplexity
CPU-memorybandwidth
Source: Jan M. Rabaey
ChallengesI Complex, distributed HW/SW systems
I Heterogeneous hardware platforms
I Optimization needs real usage scenarios
Virtual Prototype based solutionI Full system simulation at high speed
I Block level power tracing
I Deterministic, repeatable scenarios
Philipp A. Hartmann QVVP’2012 16 March 2012
I 2 Outline
1 COMPLEX Virtual Platform Estimation Flow
2 Power-Aware High-Level Synthesis
3 Non-invasive TLM Power State Machines
4 Advanced tracing for TLM Virtual Platforms
5 Conclusion
Philipp A. Hartmann QVVP’2012 16 March 2012
I 3 OutlineCOMPLEX Virtual Platform Estimation Flow
1 COMPLEX Virtual Platform Estimation Flow
Hardware/Software Task separation
Component-level Power & Timing Estimation
Virtual System Generation
2 Power-Aware High-Level Synthesis
3 Non-invasive TLM Power State Machines
4 Advanced tracing for TLM Virtual Platforms
5 Conclusion
Philipp A. Hartmann QVVP’2012 16 March 2012
I 4 Basic Approach / Enabling TechnologiesCOMPLEX Virtual Platform Estimation Flow
I Extra-functional model for timing and powerI Explicit separation of functional and extra-functional modelI Activity model for powerI Scalable physical/technology power model
for frequency, supply voltage, and temperature
I Automatic timing and power annotation techniquesI Embedded Software: Timing & Power annotation based on cross-compiled binaryI Custom Hardware: Timing & Power annotation from power aware HL-synthesisI Black-Box Hardware IP: Power State Machines instead of power annotation
I Scalable Timing and Power Tracing infrastructureI Timing and Power Tracing Streams per observable VP componentI Processing through filters (e.g. aggregation, averaging, selection)I Dynamic granularity (e.g. area of interest)
Philipp A. Hartmann QVVP’2012 16 March 2012
I 5 Virtual Platform Power and Timing Annotation FlowGeneral Overview
parallelapplicationdescription
a
applicationscenario
inputstimuli
bexec
uta
ble
spec
ific
atio
n userconstrainedHW/SW sep.& mapping
c
architecture/platform
description
d
I Executable specificationa) Task graph model of applicationb) Application scenario stimulic) Task to platform resource mappingd) Processing, Communication and
Memory blocks
I Estimation & model generation
e) Extraction of task’s behaviour
f) Power & Timing Estimation andback-annotation to input model
g) HW IP components withpower and timing information
h) Assemble Virtual Platform
I Simulationi) Executable power-aware VPj) Configurable tracing
Philipp A. Hartmann QVVP’2012 16 March 2012
I 5 Virtual Platform Power and Timing Annotation FlowGeneral Overview
parallelapplicationdescription
a
applicationscenario
inputstimuli
bexec
uta
ble
spec
ific
atio
n userconstrainedHW/SW sep.& mapping
c
architecture/platform
description
d
Hardware/Software task separation
e
I Executable specificationa) Task graph model of applicationb) Application scenario stimulic) Task to platform resource mappingd) Processing, Communication and
Memory blocks
I Estimation & model generation
e) Extraction of task’s behaviour
f) Power & Timing Estimation andback-annotation to input model
g) HW IP components withpower and timing information
h) Assemble Virtual Platform
I Simulationi) Executable power-aware VPj) Configurable tracing
Philipp A. Hartmann QVVP’2012 16 March 2012
I 5 Virtual Platform Power and Timing Annotation FlowGeneral Overview
parallelapplicationdescription
a
applicationscenarioinputstimuli
bexec
uta
ble
spec
ific
atio
n userconstrainedHW/SW sep.& mapping
c
architecture/platformdescription
d
Hardware/Software task separation
e
Hardware & Software estimationquick synthesis
functional, power & timingmodel generation
HW
tasks
estim
atio
n&
model
gen
erat
ion
SW
tasks
f
Pre-existing IP &Virtual componentModels (with PSM)
g
I Executable specificationa) Task graph model of applicationb) Application scenario stimulic) Task to platform resource mappingd) Processing, Communication and
Memory blocks
I Estimation & model generation
e) Extraction of task’s behaviourf) Power & Timing Estimation and
back-annotation to input modelg) HW IP components with
power and timing information
h) Assemble Virtual Platform
I Simulationi) Executable power-aware VPj) Configurable tracing
Philipp A. Hartmann QVVP’2012 16 March 2012
I 5 Virtual Platform Power and Timing Annotation FlowGeneral Overview
parallelapplicationdescription
a
applicationscenarioinputstimuli
bexec
uta
ble
spec
ific
atio
n userconstrainedHW/SW sep.& mapping
c
architecture/platformdescription
d
Hardware/Software task separation
e
Hardware & Software estimationquick synthesis
functional, power & timingmodel generation
HW
tasks
estim
atio
n&
model
gen
erat
ion
SW
tasks
f
Pre-existing IP &Virtual componentModels (with PSM)
g
timing & power aware executablevirtual system prototype in SystemC
isim
ula
tion
j time
dyn
amic
pow
er
virtual system generator withTLM2 interface synthesis
h
mapping
information
BAC++
BAC++
I Executable specificationa) Task graph model of applicationb) Application scenario stimulic) Task to platform resource mappingd) Processing, Communication and
Memory blocks
I Estimation & model generation
e) Extraction of task’s behaviourf) Power & Timing Estimation and
back-annotation to input modelg) HW IP components with
power and timing informationh) Assemble Virtual Platform
I Simulationi) Executable power-aware VPj) Configurable tracing
Philipp A. Hartmann QVVP’2012 16 March 2012
I 6 Hardware/Software Task Mapping and SeparationCOMPLEX Virtual Platform Estimation Flow
Bus
ASIC
Arbiter
CPU IP-core
MEM
Executable task model Architecture/Resource model
a d
I Starting fromI Executable task modelI Platform / resource model
I Task mappingI Assigning application tasks to
platform resourcesI Custom Hardware, Software, IP blocks
I Automatic extraction of task behaviourfor estimationI Prepare input component models for
power and timing estimation point-toolsI Manual specification of Power State
Machines for IP componentsI Back-annotation for Virtual Platform
simulation
Philipp A. Hartmann QVVP’2012 16 March 2012
I 6 Hardware/Software Task Mapping and SeparationCOMPLEX Virtual Platform Estimation Flow
Bus
ASIC
Arbiter
CPU IP-core
MEM
Executable task model Architecture/Resource model
a d
Task mapping
MappedTaskModel(internal)c
I Starting fromI Executable task modelI Platform / resource model
I Task mappingI Assigning application tasks to
platform resourcesI Custom Hardware, Software, IP blocks
I Automatic extraction of task behaviourfor estimationI Prepare input component models for
power and timing estimation point-toolsI Manual specification of Power State
Machines for IP componentsI Back-annotation for Virtual Platform
simulation
Philipp A. Hartmann QVVP’2012 16 March 2012
I 6 Hardware/Software Task Mapping and SeparationCOMPLEX Virtual Platform Estimation Flow
Bus
ASIC
Arbiter
CPU IP-core
MEM
Executable task model Architecture/Resource model
a d
Task mapping
MappedTaskModel(internal)c
Task separatione
I Starting fromI Executable task modelI Platform / resource model
I Task mappingI Assigning application tasks to
platform resourcesI Custom Hardware, Software, IP blocks
I Automatic extraction of task behaviourfor estimationI Prepare input component models for
power and timing estimation point-toolsI Manual specification of Power State
Machines for IP componentsI Back-annotation for Virtual Platform
simulationPhilipp A. Hartmann QVVP’2012 16 March 2012
I 7 Hardware/Software Task Estimation and Back-AnnotationCOMPLEX Virtual Platform Estimation Flow
Power and Timing Estimation tools with back-annotation
Tool A Tool B Tool CBus
ASIC
Arbiter
CPU IP-core
MEM
Executable task model Architecture/Resource model
a d
Task mapping
MappedTaskModel(internal)c
Task separatione
► Forward individual task behaviour toappropriate estimation tool fortiming and power back-annotation
C Communication
Computation
Basic Block
Branch
Philipp A. Hartmann QVVP’2012 16 March 2012
I 7 Hardware/Software Task Estimation and Back-AnnotationCOMPLEX Virtual Platform Estimation Flow
Power and Timing Estimation tools with back-annotation
Tool A Tool B Tool CBus
ASIC
Arbiter
CPU IP-core
MEM
Executable task model Architecture/Resource model
a d
Task mapping
MappedTaskModel(internal)c
Task separatione
► Forward individual task behaviour toappropriate estimation tool fortiming and power back-annotation
C Communication
Computation
Basic Block
Branch
Communication
CC
C
Graph
C
Philipp A. Hartmann QVVP’2012 16 March 2012
I 7 Hardware/Software Task Estimation and Back-AnnotationCOMPLEX Virtual Platform Estimation Flow
Power and Timing Estimation tools with back-annotation
Tool A Tool B Tool CBus
ASIC
Arbiter
CPU IP-core
MEM
Executable task model Architecture/Resource model
a d
Task mapping
MappedTaskModel(internal)c
Task separatione
► Forward individual task behaviour toappropriate estimation tool fortiming and power back-annotation
C Communication
Computation
Basic Block
Branch
Communication
CC
C
Graph
C
Control DataFlow Graph
Non-functionalTiming & Power model
• No. cycles• Switchedcapacitance
Trigger/Linkbetweenfunctional andnon-functionalmodel
Philipp A. Hartmann QVVP’2012 16 March 2012
I 8 OutlinePower-Aware High-Level Synthesis
1 COMPLEX Virtual Platform Estimation Flow
2 Power-Aware High-Level Synthesis
Hardware Basic Blocks
Back-Annotation and Model Generation
3 Non-invasive TLM Power State Machines
4 Advanced tracing for TLM Virtual Platforms
5 Conclusion
Philipp A. Hartmann QVVP’2012 16 March 2012
I 9 Custom Hardware Estimation FlowPower-Aware High-Level Synthesis
HW task
CDFG
SystemC frontend
High-level synthesis
Controller synthesis
RT data path
Exis
ting
flow
I Integration into PowerOpt
HLS technology
(now provided by OFFIS)
1 Perform “classical”
High-Level synthesis
2 Power characterisation of
design properties
3 Identification and
characterisation of
individual control-steps
4 Generation of fast and
accurate functional models
with power and timing
Philipp A. Hartmann QVVP’2012 16 March 2012
I 9 Custom Hardware Estimation FlowPower-Aware High-Level Synthesis
HW task
CDFG
SystemC frontend
High-level synthesis
Controller synthesis
RT data path
Exis
ting
flow
Design characterisation(leakage, clock-tree power, controller power,…)
New
appro
ach
I Integration into PowerOpt
HLS technology
(now provided by OFFIS)
1 Perform “classical”
High-Level synthesis
2 Power characterisation of
design properties
3 Identification and
characterisation of
individual control-steps
4 Generation of fast and
accurate functional models
with power and timing
Philipp A. Hartmann QVVP’2012 16 March 2012
I 9 Custom Hardware Estimation FlowPower-Aware High-Level Synthesis
HW task
CDFG
SystemC frontend
High-level synthesis
Controller synthesis
RT data path
Exis
ting
flow
Design characterisation(leakage, clock-tree power, controller power,…)
New
appro
ach
Hardware basic blockidentification
Basicblockinformation
Hardware basic blockcharacterisation
(dyn. Power, timing, …)
I Integration into PowerOpt
HLS technology
(now provided by OFFIS)
1 Perform “classical”
High-Level synthesis
2 Power characterisation of
design properties
3 Identification and
characterisation of
individual control-steps
4 Generation of fast and
accurate functional models
with power and timing
Philipp A. Hartmann QVVP’2012 16 March 2012
I 9 Custom Hardware Estimation FlowPower-Aware High-Level Synthesis
HW task
CDFG
SystemC frontend
High-level synthesis
Controller synthesis
RT data path
Exis
ting
flow
Design characterisation(leakage, clock-tree power, controller power,…)
New
appro
ach
Hardware basic blockidentification
Basicblockinformation
Hardware basic blockcharacterisation
(dyn. Power, timing, …)
Block annotated C++ writer
Augmented SystemC(HW-BAC++)
I Integration into PowerOpt
HLS technology
(now provided by OFFIS)
1 Perform “classical”
High-Level synthesis
2 Power characterisation of
design properties
3 Identification and
characterisation of
individual control-steps
4 Generation of fast and
accurate functional models
with power and timing
Philipp A. Hartmann QVVP’2012 16 March 2012
I 10 Characterisation of design propertiesPower-Aware High-Level Synthesis
LeakageI Leakage estimate is provided by PowerOpt
base technology, after data path is fixed
I Modelled as apparent resistance
to enable voltage scaling
Design characterisation(leakage, clock-tree power, controller power,…)
Philipp A. Hartmann QVVP’2012 16 March 2012
I 10 Characterisation of design propertiesPower-Aware High-Level Synthesis
LeakageI Leakage estimate is provided by PowerOpt
base technology, after data path is fixed
I Modelled as apparent resistance
to enable voltage scaling
Design characterisation(leakage, clock-tree power, controller power,…)
Clock-tree power
I Assumed to be constant offset to the power dissipation
I Modelled as additional switched capacity offset per cycle
Philipp A. Hartmann QVVP’2012 16 March 2012
I 10 Characterisation of design propertiesPower-Aware High-Level Synthesis
LeakageI Leakage estimate is provided by PowerOpt
base technology, after data path is fixed
I Modelled as apparent resistance
to enable voltage scaling
Design characterisation(leakage, clock-tree power, controller power,…)
Clock-tree power
I Assumed to be constant offset to the power dissipation
I Modelled as additional switched capacity offset per cycle
Controller power
I Estimation provided by PowerOpt as average activity
I Assumed to be uniformly distributed per (active) control-step
Philipp A. Hartmann QVVP’2012 16 March 2012
I 11 Characterisation of individual control-steps (I)Hardware Basic Blocks
I Power-simulation of each individual RT component would be prohibitively slow
I Idea: Combine and characterize all activated RT components
in each control-step to an Hardware Basic Block
I Identification of active RT componentsI Determine target registers,
enabled for storing updated inputsI Traverse backwards towards source registers,
tracking multiplexer selectsI Traverse forward to collect extra functionality
that is active, but not needed for the result
I Strictly sequential control-steps can be optionally combined to multi-cycle HBBs,
further improving simulation performance
Philipp A. Hartmann QVVP’2012 16 March 2012
I 11 Characterisation of individual control-steps (I)Hardware Basic Blocks
I Power-simulation of each individual RT component would be prohibitively slow
I Idea: Combine and characterize all activated RT components
in each control-step to an Hardware Basic Block
I Identification of active RT componentsI Determine target registers,
enabled for storing updated inputsI Traverse backwards towards source registers,
tracking multiplexer selectsI Traverse forward to collect extra functionality
that is active, but not needed for the result
R1 R2 R3 R4 R5
R6 R7 R8
x - + +
mux0 1
active
inactive
extra func.
Con
trol
ler
I Strictly sequential control-steps can be optionally combined to multi-cycle HBBs,
further improving simulation performance
Philipp A. Hartmann QVVP’2012 16 March 2012
I 11 Characterisation of individual control-steps (I)Hardware Basic Blocks
I Power-simulation of each individual RT component would be prohibitively slow
I Idea: Combine and characterize all activated RT components
in each control-step to an Hardware Basic Block
I Identification of active RT componentsI Determine target registers,
enabled for storing updated inputsI Traverse backwards towards source registers,
tracking multiplexer selectsI Traverse forward to collect extra functionality
that is active, but not needed for the result
R1 R2 R3 R4 R5
R6 R7 R8
x - + +
mux0 1
active
inactive
extra func.
Con
trol
ler
I Strictly sequential control-steps can be optionally combined to multi-cycle HBBs,
further improving simulation performance
Philipp A. Hartmann QVVP’2012 16 March 2012
I 12 Characterisation of individual control-steps (II)Hardware Basic Blocks
Dynamic power model
I Simplest (and fastest) dynamic HBB power
model uses average activity
I Based on stimuli provided during synthesis
R1 R2 R3 R4 R5
R6 R7 R8
x - + +
mux0 1
active
inactive
extra func.
Con
trol
ler
I Control-step activity is assumed to be sum of average switched capacity of eachactivated component: A = ΣN
n=11
MnΣMn−1
m=1 α(νn, patternm−1, patternm
), with
I Active components ν1, . . . , νNI Mn is the number of stimuli applied to νnI α(νn, . . .) is the switched capacity by applying given patterns consecutively
I On-going research is evaluating probabilistic models to address internal
correlations (data dependencies, inter-block dependencies)
I Multi-cycle HBBs can be combined by averaging (with loss of resolution)
Philipp A. Hartmann QVVP’2012 16 March 2012
I 12 Characterisation of individual control-steps (II)Hardware Basic Blocks
Dynamic power model
I Simplest (and fastest) dynamic HBB power
model uses average activity
I Based on stimuli provided during synthesis
R1 R2 R3 R4 R5
R6 R7 R8
x - + +
mux0 1
active
inactive
extra func.
Con
trol
ler
I Control-step activity is assumed to be sum of average switched capacity of eachactivated component: A = ΣN
n=11
MnΣMn−1
m=1 α(νn, patternm−1, patternm
), with
I Active components ν1, . . . , νNI Mn is the number of stimuli applied to νnI α(νn, . . .) is the switched capacity by applying given patterns consecutively
I On-going research is evaluating probabilistic models to address internal
correlations (data dependencies, inter-block dependencies)
I Multi-cycle HBBs can be combined by averaging (with loss of resolution)
Philipp A. Hartmann QVVP’2012 16 March 2012
I 12 Characterisation of individual control-steps (II)Hardware Basic Blocks
Dynamic power model
I Simplest (and fastest) dynamic HBB power
model uses average activity
I Based on stimuli provided during synthesis
R1 R2 R3 R4 R5
R6 R7 R8
x - + +
mux0 1
active
inactive
extra func.
Con
trol
ler
I Control-step activity is assumed to be sum of average switched capacity of eachactivated component: A = ΣN
n=11
MnΣMn−1
m=1 α(νn, patternm−1, patternm
), with
I Active components ν1, . . . , νNI Mn is the number of stimuli applied to νnI α(νn, . . .) is the switched capacity by applying given patterns consecutively
I On-going research is evaluating probabilistic models to address internal
correlations (data dependencies, inter-block dependencies)
I Multi-cycle HBBs can be combined by averaging (with loss of resolution)
Philipp A. Hartmann QVVP’2012 16 March 2012
I 13 Back-Annotation and Model GenerationPower-Aware High-Level Synthesis
I Generated module container forgeneric white-box IP blocksI Functional and power modelsI Communication & Tracing
I The functional model implementsthe function-call of the originalmoduleI Execute the behaviourI Perform (extra-functional) simulation
steps, until functional model finishes
I The functional model consists ofhierarchical, plain C++ “processes”I RT data path (hardware basic blocks)I Controller (switch statement)I Possibly sub-processes
TLM2 interface BAC++ (HW module container)
Process<…>
Process<…>
RT data path
Controller
Process
Functional model<reg_map_t>
Extra-functionalmodel
Observer
LRMreg
map.
Callreg
map.
LRMIF
CallIF
Global trace filegeneration
Philipp A. Hartmann QVVP’2012 16 March 2012
I 13 Back-Annotation and Model GenerationPower-Aware High-Level Synthesis
I Generated module container forgeneric white-box IP blocksI Functional and power modelsI Communication & Tracing
I The functional model implementsthe function-call of the originalmoduleI Execute the behaviourI Perform (extra-functional) simulation
steps, until functional model finishes
I The functional model consists ofhierarchical, plain C++ “processes”I RT data path (hardware basic blocks)I Controller (switch statement)I Possibly sub-processes
TLM2 interface BAC++ (HW module container)
Process<…>
Process<…>
RT data path
Controller
Process
Functional model<reg_map_t>
Extra-functionalmodel
Observer
LRMreg
map.
Callreg
map.
LRMIF
CallIF
Global trace filegeneration
Philipp A. Hartmann QVVP’2012 16 March 2012
I 13 Back-Annotation and Model GenerationPower-Aware High-Level Synthesis
I Generated module container forgeneric white-box IP blocksI Functional and power modelsI Communication & Tracing
I The functional model implementsthe function-call of the originalmoduleI Execute the behaviourI Perform (extra-functional) simulation
steps, until functional model finishes
I The functional model consists ofhierarchical, plain C++ “processes”I RT data path (hardware basic blocks)I Controller (switch statement)I Possibly sub-processes
TLM2 interface BAC++ (HW module container)
Process<…>
Process<…>
RT data path
Controller
Process
Functional model<reg_map_t>
Extra-functionalmodel
Observer
LRMreg
map.
Callreg
map.
LRMIF
CallIF
Global trace filegeneration
Philipp A. Hartmann QVVP’2012 16 March 2012
I 14 OutlineNon-invasive TLM Power State Machines
1 COMPLEX Virtual Platform Estimation Flow
2 Power-Aware High-Level Synthesis
3 Non-invasive TLM Power State Machines
TLM Observable Sockets
Protocol State Machine
Power State Machine
4 Advanced tracing for TLM Virtual Platforms
5 Conclusion
Philipp A. Hartmann QVVP’2012 16 March 2012
I 15 Non-invasive TLM Power State MachinesGeneral overview
Goal: Add extra-functional power model to black-box IP components
(memories, interconnects, accelerators, . . . )
Ingredients:
Power State Machine (PSM), Protocol State Machine (PrSM),
Transaction Snooping
IP Component
DatasheetTLM-2.0 Functional Model
Philipp A. Hartmann QVVP’2012 16 March 2012
I 15 Non-invasive TLM Power State MachinesGeneral overview
Goal: Add extra-functional power model to black-box IP components
(memories, interconnects, accelerators, . . . )
Ingredients: Power State Machine (PSM),
Protocol State Machine (PrSM),
Transaction Snooping
IP Component
DatasheetTLM-2.0 Functional Model
PSM
PowerInformation
Op1
Opn···
5 mW
7 mW···
Tracing
Philipp A. Hartmann QVVP’2012 16 March 2012
I 15 Non-invasive TLM Power State MachinesGeneral overview
Goal: Add extra-functional power model to black-box IP components
(memories, interconnects, accelerators, . . . )
Ingredients: Power State Machine (PSM), Protocol State Machine (PrSM),
Transaction Snooping
IP Component
DatasheetTLM-2.0 Functional Model
PSM
PowerInformation
Op1
Opn···
5 mW
7 mW···
Tracing
Register IFDescription
PrSM trigger
extendedstate variables
Philipp A. Hartmann QVVP’2012 16 March 2012
I 15 Non-invasive TLM Power State MachinesGeneral overview
Goal: Add extra-functional power model to black-box IP components
(memories, interconnects, accelerators, . . . )
Ingredients: Power State Machine (PSM), Protocol State Machine (PrSM),
Transaction Snooping
IP Component
DatasheetTLM-2.0 Functional Model
PSM
PowerInformation
Op1
Opn···
5 mW
7 mW···
Tracing
Register IFDescription
PrSM trigger
extendedstate variables
TransactionInformation
observe
Philipp A. Hartmann QVVP’2012 16 March 2012
I 16 Observation of TLM-2 communicationNon-invasive TLM Power State Machines
I Approximate internal state by observing
the interaction with environment
I Generation of transparent PSM wrapper withspecial observable sockets
I Transaction forwarding to/from componentI Bookkeeping and protocol handling (LT/AT)
Wrapper
IP ComponentBP
BP
BP
BP
TxnMgmnt
I Two new convenience socket typesI tlm_utils::tlm_observable_initiator_socket<BUSWIDTH>I tlm_utils::tlm_observable_target_socket<BUSWIDTH>
I Observer infrastructure to register and triggerProtocol State Machine transition conditionsI Other use-cases supported as wellI Details presented at ESCUG24 @ FDL’2011
Philipp A. Hartmann QVVP’2012 16 March 2012
I 16 Observation of TLM-2 communicationNon-invasive TLM Power State Machines
I Approximate internal state by observing
the interaction with environment
I Generation of transparent PSM wrapper withspecial observable sockets
I Transaction forwarding to/from componentI Bookkeeping and protocol handling (LT/AT)
Wrapper
IP ComponentBP
BP
BP
BP
TxnMgmnt
I Two new convenience socket typesI tlm_utils::tlm_observable_initiator_socket<BUSWIDTH>I tlm_utils::tlm_observable_target_socket<BUSWIDTH>
I Observer infrastructure to register and triggerProtocol State Machine transition conditionsI Other use-cases supported as wellI Details presented at ESCUG24 @ FDL’2011
Philipp A. Hartmann QVVP’2012 16 March 2012
I 17 Protocol State MachineNon-invasive TLM Power State Machines
I Following the register interface description, a
Protocol State Machine is defined
I Abstracts from TLM-2 artefacts towards applicationI PrSM is triggered by transaction conditions,
matching specific transaction propertiesI Address, data, command, phase, . . .I User-defined conditions supported
I Triggers PSM and may update extended stateI Separate address ranges into categories
I Configuration dataI Control dataI Payload data
Register IFDescription
Datasheet
PrSM
Philipp A. Hartmann QVVP’2012 16 March 2012
I 17 Protocol State MachineNon-invasive TLM Power State Machines
I Following the register interface description, a
Protocol State Machine is defined
I Abstracts from TLM-2 artefacts towards applicationI PrSM is triggered by transaction conditions,
matching specific transaction propertiesI Address, data, command, phase, . . .I User-defined conditions supported
I Triggers PSM and may update extended stateI Separate address ranges into categories
I Configuration dataI Control dataI Payload data
Register IFDescription
Datasheet
PrSM
PrSM PSM
extendedstate variables
Philipp A. Hartmann QVVP’2012 16 March 2012
I 18 Power State MachineNon-invasive TLM Power State Machines
I Current states of Power State Machine provides
average switched capacity per cycle
I States (and transitions) report updates to the
tracing, according to current power model
PowerInformation
Op1
Opn···
5 mW
7 mW···
Datasheet
PSM
I Determine relevant power states from the vendor’s datasheetI Or by using low-level power simulationsI Needs to be done manually
I Transitions are triggeredI Externally via explicit PSM eventsI Internally via timeouts
I Extended state variables can be read to influenceI Current activityI Timeout expressions
Philipp A. Hartmann QVVP’2012 16 March 2012
I 18 Power State MachineNon-invasive TLM Power State Machines
I Current states of Power State Machine provides
average switched capacity per cycle
I States (and transitions) report updates to the
tracing, according to current power model
PowerInformation
Op1
Opn···
5 mW
7 mW···
Datasheet
PSM
I Determine relevant power states from the vendor’s datasheetI Or by using low-level power simulationsI Needs to be done manually
I Transitions are triggeredI Externally via explicit PSM eventsI Internally via timeouts
I Extended state variables can be read to influenceI Current activityI Timeout expressions
Philipp A. Hartmann QVVP’2012 16 March 2012
I 18 Power State MachineNon-invasive TLM Power State Machines
I Current states of Power State Machine provides
average switched capacity per cycle
I States (and transitions) report updates to the
tracing, according to current power model
PowerInformation
Op1
Opn···
5 mW
7 mW···
Datasheet
PSM
I Determine relevant power states from the vendor’s datasheetI Or by using low-level power simulationsI Needs to be done manually
I Transitions are triggeredI Externally via explicit PSM eventsI Internally via timeouts
I Extended state variables can be read to influenceI Current activityI Timeout expressions
Philipp A. Hartmann QVVP’2012 16 March 2012
I 19 OutlineAdvanced tracing for TLM Virtual Platforms
1 COMPLEX Virtual Platform Estimation Flow
2 Power-Aware High-Level Synthesis
3 Non-invasive TLM Power State Machines
4 Advanced tracing for TLM Virtual Platforms
5 Conclusion
Philipp A. Hartmann QVVP’2012 16 March 2012
I 20 Advanced tracing for TLM Virtual PlatformsGeneral overview
I Problem: Flexible tracing of physical quantities not directly possible in SystemCI sc_core::sc_trace not flexible enough (tied to simulation time)I sca_core::sca_trace is SystemC AMS-specific and not widely supportedI SCV transaction recording not really appropriate
I Goal: Enable flexible and configurable tracing of extra-functional properties
in TLM-2-based virtual platforms
I Integration with temporal-decoupling
→ Independence of current simulation time, backwards and forwards
I Hierarchical pre-processingI Filtering and data reduction (aggregation, averaging, selection)I Collection of user-defined performance metricsI Run-time configurable granularity (Region of Interest)
Philipp A. Hartmann QVVP’2012 16 March 2012
I 20 Advanced tracing for TLM Virtual PlatformsGeneral overview
I Problem: Flexible tracing of physical quantities not directly possible in SystemCI sc_core::sc_trace not flexible enough (tied to simulation time)I sca_core::sca_trace is SystemC AMS-specific and not widely supportedI SCV transaction recording not really appropriate
I Goal: Enable flexible and configurable tracing of extra-functional properties
in TLM-2-based virtual platforms
I Integration with temporal-decoupling
→ Independence of current simulation time, backwards and forwards
I Hierarchical pre-processingI Filtering and data reduction (aggregation, averaging, selection)I Collection of user-defined performance metricsI Run-time configurable granularity (Region of Interest)
Philipp A. Hartmann QVVP’2012 16 March 2012
I 21 Stream-based tracing of extra-functional propertiesAdvanced tracing for TLM Virtual Platforms
I Tracing is based on (time.value) streams per componentI Streams are hierarchically named SystemC objectsI Strongly-typed values for type safety with support for physical unitsI Fine granular control over (local) time offset, synchronisation with a local clockI push APIs for both absolute (start, value, [duration]), or relative (duration,value) tuplesI Non-overlapping tuples enforced by MergePolicy
I Hierarchy of (user-defined) stream preprocessors (and sinks)I Streams can be processed by an extensible set of (pre-)processorsI Hierarchically connected during simulationI Sinks can write streams to storage backends for offline analysis
I Automatic separation and merging of multiple source processes (initiators)optionally supported for temporal decoupling with overlapsI Example: Accumulate overlapping power consumptions due to loss of time resolution
→ retains total energy consumption
Philipp A. Hartmann QVVP’2012 16 March 2012
I 21 Stream-based tracing of extra-functional propertiesAdvanced tracing for TLM Virtual Platforms
I Tracing is based on (time.value) streams per componentI Streams are hierarchically named SystemC objectsI Strongly-typed values for type safety with support for physical unitsI Fine granular control over (local) time offset, synchronisation with a local clockI push APIs for both absolute (start, value, [duration]), or relative (duration,value) tuplesI Non-overlapping tuples enforced by MergePolicy
I Hierarchy of (user-defined) stream preprocessors (and sinks)I Streams can be processed by an extensible set of (pre-)processorsI Hierarchically connected during simulationI Sinks can write streams to storage backends for offline analysis
I Automatic separation and merging of multiple source processes (initiators)optionally supported for temporal decoupling with overlapsI Example: Accumulate overlapping power consumptions due to loss of time resolution
→ retains total energy consumption
Philipp A. Hartmann QVVP’2012 16 March 2012
I 21 Stream-based tracing of extra-functional propertiesAdvanced tracing for TLM Virtual Platforms
I Tracing is based on (time.value) streams per componentI Streams are hierarchically named SystemC objectsI Strongly-typed values for type safety with support for physical unitsI Fine granular control over (local) time offset, synchronisation with a local clockI push APIs for both absolute (start, value, [duration]), or relative (duration,value) tuplesI Non-overlapping tuples enforced by MergePolicy
I Hierarchy of (user-defined) stream preprocessors (and sinks)I Streams can be processed by an extensible set of (pre-)processorsI Hierarchically connected during simulationI Sinks can write streams to storage backends for offline analysis
I Automatic separation and merging of multiple source processes (initiators)optionally supported for temporal decoupling with overlapsI Example: Accumulate overlapping power consumptions due to loss of time resolution
→ retains total energy consumption
Philipp A. Hartmann QVVP’2012 16 March 2012
I 22 Example: Variable granularityAdvanced tracing for TLM Virtual Platforms
I Preprocessors enable configurable tracing granularity
time
dyn
amic
pow
er
fine-grained (at basic blocks)
I Trade-off between resolution and simulation speed
I Selectable for each task/stream individually
I Adjustable during simulation time CC
CCom
munic
atio
nG
raph
ControlDat
aFlo
wG
raph C Communication
Computation
Basic Block
Branch
Philipp A. Hartmann QVVP’2012 16 March 2012
I 22 Example: Variable granularityAdvanced tracing for TLM Virtual Platforms
I Preprocessors enable configurable tracing granularity
time
dyn
amic
pow
er
fine-grained (at basic blocks)
time
coarse-grained (at sync. points)
I Trade-off between resolution and simulation speed
I Selectable for each task/stream individually
I Adjustable during simulation time CC
CCom
munic
atio
nG
raph
ControlDat
aFlo
wG
raph C Communication
Computation
Basic Block
Branch
Philipp A. Hartmann QVVP’2012 16 March 2012
I 22 Example: Variable granularityAdvanced tracing for TLM Virtual Platforms
I Preprocessors enable configurable tracing granularity
time
dyn
amic
pow
er
fine-grained (at basic blocks)
time
coarse-grained (at sync. points)
time
dyn
amic
pow
er
variable-grained (at deviation)
I Trade-off between resolution and simulation speed
I Selectable for each task/stream individually
I Adjustable during simulation time CC
CCom
munic
atio
nG
raph
ControlDat
aFlo
wG
raph C Communication
Computation
Basic Block
Branch
Philipp A. Hartmann QVVP’2012 16 March 2012
I 23 OutlineConclusion
1 COMPLEX Virtual Platform Estimation Flow
2 Power-Aware High-Level Synthesis
3 Non-invasive TLM Power State Machines
4 Advanced tracing for TLM Virtual Platforms
5 Conclusion
Philipp A. Hartmann QVVP’2012 16 March 2012
I 24 Conclusion
I COMPLEX partners work on advanced Virtual Platform technologies enablingI Component-level power state tracingI Co-analysis with softwareI Debugging with power and timing informationI Fast design-space explorationI Derivation of power management strategies
I through
I Automatic timing and power annotation techniques and tools forVirtual Platform Components (Embedded Software, Custom Hardware, and Hardware IP)
I Extra-functional scalable executable model for timing and powerI Scalable timing and power tracing infrastructure
Philipp A. Hartmann QVVP’2012 16 March 2012
I 25 Thanks for your attention!
For more information visit:http://complex.offis.de
Industry Partners: EDA Tool Partners: Research Partners: Dissemination Partner:
Philipp A. Hartmann QVVP’2012 16 March 2012