Post on 05-Jan-2016
description
May 2004Department of Electrical and
Computer Engineering
1
AA NEW GRAPH STRUCTURE FOR HARDWARE-NEW GRAPH STRUCTURE FOR HARDWARE-SOFTWARE PARTITIONING OF SOFTWARE PARTITIONING OF HETEROGENEOUS SYSTEMSHETEROGENEOUS SYSTEMS
G. N. Khan and M. JinG. N. Khan and M. JinSystem-on-Chip Research GroupSystem-on-Chip Research Group
Electrical & Computer EngineeringElectrical & Computer Engineering
Ryerson University, Toronto ON M5B 2K3Ryerson University, Toronto ON M5B 2K3
May 2004Department of Electrical and
Computer Engineering
2
Hardware-Software (HW/SW) Co-design
Objective:
To design HW/SW early in the design cycle to produce more reliable, efficient and first time right design with in a reasonable time.
May 2004Department of Electrical and
Computer Engineering
3
Hardware Software Partitioning
• Assignment of System parts to hetrogeneous implementation units (Hardware and Software)
• Meet constraints (Timing) and Minimize cost (Area, Time to Market)
• Directly affects the cost and performance of final system
May 2004Department of Electrical and
Computer Engineering
4
Specification
• Traditionally in Plain English
• MSC, SDL, SystemC were developed
• Both textual and graphical representation like DAG (Directed Acyclic Graph) are used to describe system.
May 2004Department of Electrical and
Computer Engineering
5
What is DADGP
• Directed Acyclic Data dependency Graph with Precedence is an extension of DAG
• DADGP is a super set of DAG • Two types of edges:
1) Weighted Dependency edge2) Precedence edge
May 2004Department of Electrical and
Computer Engineering
6
DADGP Example• Arrow represents dependence
relationship• Precedence edge is
represented with a line• Precedence dependency
captures the order of execution between nodes and such nodes can be executed in parallel.
• Only necessary parallelism is exposed
A
B
C
D
1
3
10
5
May 2004Department of Electrical and
Computer Engineering
7
Overall System Partitioning Structure
Specification
Profiling
LD Path Search
Mapping
Scheduling
Valid
Map
ping
Constr
aint
Satisf
ied FinishYes Yes
No
No
May 2004Department of Electrical and
Computer Engineering
8
System Partitioning Algorithm
i. Profiling and building an initial DADGP
ii. Find the LD_path (longest delay path) in DADGP
iii. Mapping of LD-path nodes to hardware
iv. Schedule and if invalid mapping then goto Step iii
v. Update DADGP and calculate the total execution time of target system.
vi. If system constraints (specified by the user) are not met then goto Step ii, otherwise quit.
May 2004Department of Electrical and
Computer Engineering
9
Profiling
Profiler collects the following data
• Execution time• Amount of data transfer• Execution order• Data dependencies between nodes
May 2004Department of Electrical and
Computer Engineering
10
Longest Delay Path Search
• Finding the longest delay path in DADGP is like finding a bottleneck of the system
• Minimizes search space for mapping
• Longest Delay path means, longest execution path
May 2004Department of Electrical and
Computer Engineering
11
Mapping
• Maps a node to be hardware
• Mapping can change the Longest Delay path, as well as DADGP
• Mapping is valid if mapping that node to Hardware gives the shortest Longest Delay path
May 2004Department of Electrical and
Computer Engineering
12
Scheduling
• Very simple List Scheduling approach.
• Schedules the earliest node first without violating the resource limit.
• Exposes parallelism and changes the DADGP accordingly.
May 2004Department of Electrical and
Computer Engineering
13
Summary of DADGP Scheduling
• Start scheduling from the root of DADGP• Traverse down the tree and schedule the earliest
starting time node• If the node is connected with precedence
dependency edge, check whether exposing parallelism can eliminate that edge. When an edge is eliminated, DADGP structure may convert to two DADGPs. Roots of the two DADGPs are combined to form a single DADGP with a dummy root node.
• In case of multiple descendents, schedule them forcibly by adding PEs
• Update the PE resource (HW-SW) library
May 2004Department of Electrical and
Computer Engineering
14
Constraints
• Constraints of deadline and cost is given by the designer.
• Hardware cost is calculated by gate count.
• Different granularity level should be explored if no solution is found.
May 2004Department of Electrical and
Computer Engineering
15
Edge Detection Example
Pair of 3x3 masks are convolved to estimate gradients (Gx & Gy) in x and y directions
HW-SW LibraryHW-SW Library DataData
dependencydependency
Precedence Precedence dependencydependency
GGxx
GGyy22
GGyy
GGxx22
AdAddd
Operation SWEXE(ms
)
HWEXE(ms)
HW Area
(gates)
Gradient(Gx or Gy)
9.4 1.4 1200
Square 5.2 0.9 500
Add 3.88 0.3 100
May 2004Department of Electrical and
Computer Engineering
16
Edge Detection Solutions
0.1
0.1
0.1
0.1
0.1
Gx
SqY
Gy
SqX
Add
Gx
SqY
Gy
SqX
Add
0.1
0.1
0.1
0.1
Gx
SqY
Gy
SqX
Add
0.1
0.1
0.1
0.1
0.1
0.1
0.1
0.1
Gx
SqY
Gy
SqX
Add
Gx
SqY
Gy
SqX
Add
0.1
0.1
0.1
0.1
May 2004Department of Electrical and
Computer Engineering
17
Performance improvement vs. HW area
2.8
6.38
10.68
15.88
23.68
33.8
0
5
10
15
20
25
30
35
40
0 1200 2400 2900 3400 3500
HW area
Seco
nds
May 2004Department of Electrical and
Computer Engineering
18
Conclusion
• HW-SW Partitioning is a NP-hard problem
• To find optimal partitioning Hardware-Software set is very difficult due to many factors affecting the partitioning decision.
• DADGP Structure Expose Parallelism
• The complexity of DADGP partitioning algorithm is approximately n2log(n).