Post on 02-Jan-2016
Hardware-Software Co-partitioning for Distributed
Embedded Systems
2
Outline
1. Introduction2. Related Work3. Distributed Embedded System and
System Model 4. Multi-Level Partitioning 5. Case Study
3
1. Introduction• Hardware-Software Codesign• Distributed Embedded System• Motivation
Task Graph Physical Restrictions
• Distributed Embedded System Codesign (DESC) Object Modeling Technique (OMT) Linear Hybrid Automata (LHA) SES Models
4
1. Introduction (cont’)
• Multi-Level Partitioning Partitioning Algorithm Sharing, Clustering
• Case Studies
5
2. Related Work
• Target Embedded System 1-CPU and 1-ASIC Topology n-CPU and m-ASIC Topology
Optimal Codesign Heuristic Codesign
6
2. Related Work (cont’) • Codesign of 1-CPU and 1-ASIC
Topology Kumar et al. 1993 Kalavade and Lee 1993 Thomas et al. 1993 Gupta and De Micheli 1993 Barros et al. 1994
7
2. Related Work (cont’)
• Codesign of n-CPU and m-ASIC Topology Optimal Codesign Approaches:
Mixed integer linear programming
Prakash and Parker 1992 Exhaustive search
Wolf 1994, Haworth et al. 1993
D’Ambrosio and Hu 1994
8
2. Related Work (cont’) Heuristic Codesign Approaches:
Iterative and Constructive
Iterative:
Dick and Jha 1998 --- MOGAC, CORDS
Dick and Jha 1999 --- MOCYN
9
2. Related Work (cont’) Constructive:
Wolf 1996 --- object-oriented
Yen and Wolf 1996 --- sensitivity-driven
Dave, Lakshminarayana, and Jha 1999 --- COSYN
Dave and Jha 1999 --- COFTA
Dave and Jha 1998 --- COHRA
Our proposed: Distributed Embedded System
Codesign (DESC)
10
3. Distributed Embedded Systems and System Models • An embedded computer system is a system
which uses computers but is not a general-purpose computer.
• In 1971, there were about 142,000 computers world-wide.
• In 1999, there are now some 350 to 400 million personal computers alone and at least of magnitude more embedded devices.
11
3. Distributed Embedded Systems and System Models (cont’)
• There are several reasons to build distributed hardware engine for embedded system Cheaper Faster response time The devices control may be physically
distributed
12
3. Distributed Embedded Systems and System Models (cont’)
• System Models Object Modeling Technique (OMT)
Models Object Model Dynamic Model Functional Model
13
3. Distributed Embedded Systems and System Models (cont’) Linear Hybrid Automata (LHA) Models
Internal system model For verifying systems
SES Models SES/workbench is a popular modeling and simulation
tool for system performance evaluation
14
4. Multi-Level Partitioning
• Multi-Level Partitioning (MLP)
Three Main Phases Codesign Space Exploration (CSE) System Structural Partitioning (SSP) Binary Search Copartitioning (BSC)
Explore Design Space
Generate StructuralPartition
Copartitioning
Last StructuralPartition?
Last DesignAlternative?
Output HeuristicallyOptimal Partition
Yes
Yes
Number of CPU andhardware cost
CPU allocation todistributed subsystems
No
No
CPUSharing
ASICSharing
CSE level
HardwareClustering
SoftwareGrouping
Initialization
SSP level
BSC levelNext structural partition
Overall Flow Chart of Multi-Level Partitioning
Calculate CPD ratios of each object in MLA
Sort all MLA objects in an ascending order of their CPD ratios
Select an object with median CPD ratio
Use software to implement all objects with CPD ratios not less than that of the selected median object.
Use hardware to implement all objects with CPD ratios less than that of the selected median object.
Check if the partition result satisfies system constraints
Check if the partition is a heuristically
optimal solution?
Yes
Store structural partition result and perform sharing and clustering
Performance is more important Cost is more
important
Cost and performance constraints are satisfied
Cost constraint is not satisfied, but performance constraints are satisfied
Cost constraint is satisfied, but performance constraints are not
satisfied
Increase software objects
Increase hardware objects
No satisfactory partition
Place all objects of hardware parts into ILA and all other
objects into MLA
Select number of CPU and hardware cost (Explore Design Space)
Allocate CPU to distributed Subsystem (Generate Structural Partition)
Output least costly partition
Initialization
Copartitioning
CSE level
SSP level
BSC level
Yes
Yes
No
No
Last structural partition?
Last design alternative?
Next structural partition
OMT Models
LHA Models
Partition found?
Yes
Print “No partition”
No
Detailed Flow Diagram of Multi-Level Partitioning
17
)(_/|)(_)(_|
)(_)(_)(
xConstraintPerfxPerfSoftwarexPerfHardware
xCostSoftwarexCostHardwarexCPD
where x is a object
CPD: Cost-Performance Difference
4. Multi-Level Partitioning (cont’)
18
4. Multi-Level Partitioning (cont’)
• CPU/ASIC Sharing Sharing Threshold Distance (STD)
SLI: Subsystem Location Inter-distance
Sharing No Sharing
STD0SLI:
19
Interconnect Cost (IC) Model IC (X1, X2) = α × SLI(S1, S2) × #Link(X1, S2) ×
BW(X1, S2) + EC(X1)
SLI: Subsystem Location Inter-distanceS1 and S2 : Subsystems
X1 and X2 : A component (PE or ASIC)
α : A parameter that depends on the interconnection technology#Link(X1, S2) : The number of links between X1 and S2
BW(X1, S2) : The communication bandwidth between X1 and S2
EC(X1) : The cost for enhancing X1 such that both S1 and S2 can
use X1.
4. Multi-Level Partitioning (cont’)
20
Algorithm 5.2 Share Components AlgorithmShare_Components(s){
/* s=<s1, s2, …,s>, si=(si1, si2) where si1 is the number of PE in subsystem Si and si2 is the number of ASIC in subsystem Si. si1, si2{0,1, ……} */
for (i = 1, i , i++) { for (j = i, j , j++) {
if SLI(si, sj) STD {
if (si1 0 sj1 0)
Share_PE(Si, Sj); /* Refer to Algorithm 5.3 */
if (si2 0 sj2 0)
Share_ASIC(Si, Sj); /* Refer to Algorithm 5.4 */} }}}
4. Multi-Level Partitioning (cont’)
21
• Hardware Clustering and SoftwareGrouping
In DESC, hardware clustering is based on Kernighan and Lin basic graph partitioning algorithm, but it is enhanced to include DEMS characteristics.
Software grouping technique similar to load balancing on multiple processors
4. Multi-Level Partitioning (cont’)
22
4. Multi-Level Partitioning (cont’)
• Analysis and Validation of MLP Complexity analysis
r: the number of objects : the number of subsystems
,...,0
)]__()([ _p
MLP timeClustertimeSharepspBSCtimeInit
)])(2()([loglog 2
,...,0 12 max rkkpCpsprrrr
p pkMLP
23
5. Case Studies
• Vehicle Parking Management System (VPMS)
• Examples of Sharing and Clustering in MLP
• Application of MLP to Coal Mine System
24
• Vehicle Parking Management System (VPMS) VPMS Specifications
A VPMS consists of three subsystems: ENTRY management, EXIT management, and DISPLAY.
An ENTRY (or an EXIT) subsystem consists of three parts: a ticket facility, a gate controlled by a gate-motor, and a pair of sensors.
A DISPLAY subsystem
5. Case Studies (cont’)
25
Constraints for the VPMS system A maximum cost of $1,300, A maximum display response time of 14,000 µs, and A maximum ENTRY (EXIT) gate response time of 250
µs.
7. Case Study (cont’)
26
Specification and Mapping of VPMS VPMS is described using OMT models consisting of
Object
Dynamic, and
Functional models.
5. Case Study (cont’)
Vehicle ParkingManagement System
ENTRY ManagementSystem
Display System
GateController Ticket Checker
Motor
ControlUnit
ENTRY Gate EXIT Gate
isa isa
Sensor Send/ReceiveDevice
ControlUnit
ENTRY Sensor EXIT Sensor
isa isa
Display Device Control System
Counter DisplayInterface
7-Segment LCD Dot Matrix
TimeStamp
EXIT ManagementSystem
Object Model of VPMS
Dynamic Model of a DISPLAY Subsystem
Decrementcounter
UpdateDisplayIdle
Car in
Incrementcounter
Car out
Push time stampbutton
Read count
Count > 0,send ACK!
Count = 0,out of space
Functional Model of a DISPLAY Subsystem
CounterIncrementCounter
EXIT Sensor ENTRY Sensor
DecrementCounter
UpdateDisplay
Car out signal Car in signal
Counter Data
30
• LHA Model of VPMS
Hardware LHA Model
Software LHA Model
5. Case Study (cont’)
Hardware LHA of a DISPLAY Subsystem
Update Display
DecrementCounter
Idle IncrementCounter
Read Count
Count:=500
t:= 0,
t = 100ns
Car outt := 0
Push time stamp buttont := 0
Car int := 0
t = 42ns, t := 0
Count := Count 1t = 42ns, t := 0
Count := Count + 1
t = 18ns
Software LHA of a DISPLAY Subsystem
Update Display
DecrementCounter
Polling IncrementCounter
Read Count
Count:=500
t:= 0, x := 0,
t = 10ms, t:= 0x := 0
Car out,t := 0
Push time stamp button,
t := 0
Car in,t := 0
t = 3.2μ s, t := 0Count := Count 1x 33ms, x := 0
t = 3.2μ s, t := 0
Count := Count +1
x 33ms, x := 0
t = 10μ s ,
t := 0
t = 5.12μ s,t := 0,x 33ms,x := 0
33
• SES Models
Using SES/workbench Model
A car-simulator
An ENTRY management subsystem
An EXIT management subsystem
A DISPLAY subsystem
5. Case Study (cont’)
34
SES Model of a DISPLAY Subsystem
5. Case Study (cont’)
35
• Applying MLP to VPMS
Calculation of CPD for VPMS Parts Hardware
Cost Software
Cost Hardware
Performance Software
Performance CPD
Sensor Driver 115 90 210 1,030 7.622 Counter 120 90 290 13,200 32.533 Motor Driver 260 90 820 1,030 202.381
5. Case Study (cont’)
36
Applying MLP to the VPMS Example Binary Search Copartitioning (BSC)
Codesign Space Exploration (CSE) (Number of CPU) Partitions(SSP) Cost ($)
Response time (s) (sensor to display)
Response time (s)
(sensor to gate)
Feasi-bility
0 A(HC, HS, HM) 1,450 190 0.2 No
1 B(HC, HS, SM) 1,280 190 215.0 Yes C(HC, HS,
2MS) 1,370 13,200 820.0 No
2 D(SC, HS, SM) 1,250 13,100 215.0 Yes E( 2
CS, HS, SM) 1,340 13,100 210.0 No 3
F(SC, SS, SM) 1,225 13,200 1,030.0 No
H: hardware, S: software, subscripts: C = Counter, S = Sensor Driver, M = Motor Driver,
superscripts: 1 One CPU, 2 Two CPUs, 3 Three CPUs
5. Case Study (cont’)
37
•VPMS Emulation Block Diagram for Prototype D(SC, HS, SM)
S i n g l e - c h i p P r o c e s s o r
( 8 7 5 1 )
T i m e S t a m p M a c h i n e
M I n t e r f a c e E n t r y g a t e
D i s p l a y D e v i c e
M I n t e r f a c e E x i t g a t e
E n t r y S e n s o r & D r i v e r
S i g n a l P r o c e s s i n g
E x i t S e n s o r & D r i v e r
S i g n a l P r o c e s s i n g
T i c k e t C h e c k e r
C a r i n ( i )
C a r o u t ( i )
P a r k i n g f e e s p a i d ( i )
D i s p l a y s c a n d a t a ( o )
O p e n ( o ) o r C l o s e ( o )
O p e n ( o ) o r C l o s e ( o )
P u s h t i m e s t a m p b u t t o n ( i ) T i c k e t t a k e n ( i ) A c k n o w l e d g m e n t ( o )
S i n g l e - c h i p P r o c e s s o r
( 8 7 5 1 )
5. Case Study (cont’)
38
VPMS Emulation Results
VPMS Emulation Results
Partitions B(HC, HS, SM) D(SC, HS, SM)
Cost ($) 1278 1240 Power Consumption (W) 4.76 4.20
Response time (µs) (sensor to display) 180 13,000
Response time (µs) (sensor to gate) 210 210
5. Case Study (cont’)
39
•Examples of Sharing and Clustering in MLP Sharing and clustering techniques in MLP
based on several variants of the VPMS case study.
How object oriented modeling can be advantageous in hierarchical partitioning.
Coal mine control and monitoring system
5. Case Study (cont’)
Advantage of Sharing in MLP
Partitioning Results for three VPMS Specifications
with and without Sharing Specifications
VPMS-1 VPMS-2 VPMS-3 STD (m) 1.0 1.0 1.0 SLI(ENTRY, EXIT) (m) 6.0 0.5 0.8 SLI(Display, EXIT) (m) 7.0 3.0 0.5 SLI(Display, ENTRY) (m)
2.0 3.0 0.5
Partitioning Results
Number and Locations of PE
3
(1) ENTRY gate control
(2) EXIT gate control
(3) Display
2
(1) ENTRY/ EXIT gate control
(2) Display 1
(1)ENTRY/ EXIT/
Display Subsystem
Number and Locations of ASIC
2
(1) ENTRY sensor control
(2) EXIT sensor control
1
(1) ENTRY/ EXIT sensor 1
(1) ENTRY/ EXIT/
Display Subsystem Interface
System Cost ($) 1,430 1,250 1,180 Display response time (s)
13,200 13,200 14,020
Performance Gate response time (s)
210 210 1030
MLP Execution Time (sec)
0.602 3.857 14.789
Advantage of Clustering in MLP
Partitioning Results for five VPMS Specifications
with and without Clustering Specifications
VPMS-A VPMS-B VPMS-C VPMS-D VPMS-E Number of Subsystems
1 2 2 2 3
Subsystems
(1) ENTRY/ EXIT/ Display Subsystem
(1) ENTRY/ EXIT Subsystem
(2) Display Subsystem
(1) ENTRY/ Display Subsystem
(2) EXIT Subsystem
(1) ENTRY Subsystem
(2) EXIT/ Display Subsystem
(1) ENTRY Subsystem
(2) EXIT Subsystem
(3) Display Subsystem
Partitioning Results
Number and locations of PE
1 (1) Motor
Driver/ Counter
2 (1) Motor
Driver (2) Counter
2
(1) ENTRY Motor Driver/ Counter
(2) EXIT Motor Driver
2
(1) ENTRY Motor Driver
(2) EXIT Motor Driver/ Counter
3
(1) ENTRY Motor Driver
(2) EXIT Motor Driver
(3) Counter
Number and locations of ASIC
1 (1) Sensor Driver
1 (1) Sensor
Driver 2
(1) ENTRY Sensor
(2) EXIT Sensor
2
(1) ENTRY Sensor
(2) EXIT Sensor
2
(1) ENTRY Sensor
(2) EXIT Sensor
System Cost ($) 1,180 1,250 1,340 1,340 1,430 Display response time (s)
14,020 13,200 13,100 13,100 13,200 Perfor-mance Gate
response time (s)
1,030 210 110 110 110