R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

19
R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayagüez WALSAIP

description

Partitioning of Discrete Signal Transforms for Distributed Hardware Architectures. R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering University of Puerto Rico – Mayag üez. Motivation and Objective. Discrete Signal Transforms (DSTs) - PowerPoint PPT Presentation

Transcript of R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

Page 1: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

R. Arce-Nazario, M. Jimenez, and D. RodriguezElectrical and Computer EngineeringUniversity of Puerto Rico – Mayagüez

WALSAIP

Page 2: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

2

Motivation and Objective

Discrete Signal Transforms (DSTs)DFT, DCT, lots of applicationsHardware accelerated but at high area cost

Distributed (dedicated) hardware architectures (DHAs)Cost-effectivePartitioning plays key role

Objective: Use inherent properties of DSTs to improve their hardware partitioning to distributed hardware architectures.

DST Partitioning

DHA

Page 3: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

3

Previous Work

Automated partitioning of DST to DHA’sDSTs treated as any other algorithm/benchmark [Srinivasan01][Bringmann00]Converted to high-level or structural DFG and treated as such.

Manual partitioning & automated code generationDST specific properties exploited [Kumhom01]New formulations developed to exploit architectural features. [VanLoan92]SPIRAL and FFTW – code generation platforms exploring the space of equivalent algorithms. ([Pueschel05], [Frigo05])

[Arce05] – Automated partitioning methodology that incorporates DST features and formulation exploration

Page 4: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

4

Partitioning Methodology KPA DST

FormulationArchitecturalDescription

FormulationManipulator

FormulationTo DFG

Heuristic Control

Partition/Placement

Estimators

High-level partition solution

KPAFormulation DFG

Cost andIndicators

RuleSelection

KPAFormulation

HypergraphRepresentation

Page 5: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

5

DSTs – General Concepts

),()..,(],..,[..],..,[ 111111

ddddnn

d knknnnxkkXd

General formula for d-dimensional DST

Essentially a vector-matrix multiplicationFast versions exists, using divide and conquer techniques

Highly regularHighly connectedRules can be applied at formulation level: permutation,index-set..

α’s determine type of transform, e.g. DFT: iii Nknjiii ekn /2),(

( ) ( )( ) ( )8 2 4 1 2 2 2 0 4 2 8F F I T I F I T I F R Ä Ä Ä Ä

8R ( )4 2I FÄ ( )( )2 2 2 0I F I TÄ Ä ( )2 4 1F I TÄ

Page 6: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

6

Kronecker Algebra

4444 FFF x Ä)()( 242,4248 FITIFF ÄÄ

84242,4248 )()( PFITIFF ÄÄ

F4

F2 W

W

F2 W

W

F2 W

W

F2 W

WF4

Page 7: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

7

Target topology

Similar to existing platforms in market and academia.Annapolis Micro Systems (Wildforce)Gidel (PROC20KE)Berkeley Emulation Engine (BEE) – being proposed as a cost effective alternative to traditional high performance computing systems.

M0

D0

M1

D1

Mk-1

Dk-1

Crossbar

Page 8: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

8

Partitioning Methodology KPA DST

FormulationArchitecturalDescription

FormulationManipulator

FormulationTo DFG

Heuristic Control

Partition/Placement

Estimators

High-level partition solution

KPAFormulation DFG

Cost andIndicators

RuleSelection

KPAFormulation

HypergraphRepresentation

Page 9: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

9

DST properties in our methodology

Incorporated graph considerations to partitioning/placement process

Exploration of equivalent formulations

Partition/Placement

FormulationManipulator

FormulationTo DFG

Heuristic Control

Partition/Placement

Estimators

KPAFormulation DFG

Cost andIndicators

RuleSelection

Page 10: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

10

Graph partitioning considerations

Focus on horizontal partitioning schemes (SIMD-like implementation)

Initial solution = balanced horizontal linear partitioning

scheduling consideration: swap nodes from same computational stages.

M0

D0

M1

D1

Mk-1

Dk-1

Crossbar

Kernigan Lin - bipartitioning Heterogeneous channel k-way partitioning

Page 11: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

11

Formulation exploration( ) ( ), ,n p m n p p m n pF F I T I F P Ä ÄFormulation

ManipulatorFormulation

To DFG

Heuristic Control

Partition/Placement

KPAFormulation

DFG

Cost andIndicators

RuleSelection

FormulationManipulator

Applies permutation and factorization to Kronecker formulation of DSTs to obtain equivalent formulations

Rule

Number of possible reformulations grows exponentially with DST size

Heuristic control method, first answer questions:Do reformulations have an effect on solution quality?How can we effectively explore the equivalent formulation space to find more apt formulations?

Experiments Gain an understanding of algorithmic level effects on solution quality and convergence.

( ) ( )8 2 16,8 8 2 16,8F I T I F PÄ Ä

( ) ( )( )( )( )2 4 8,2 2 4 8,2 2

16,8 8 2 16,8

F I T I F P I

T I F P

Ä Ä Ä

Ä

Page 12: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

12

Measuring quality of solution

0 1 1, , , mCost where ‘weight’ of channel iii i WR

required communications through i

D0

D1

D2

D3

D0

D1

D2

D3

4,4 4, ,8Cost

Example: W01 = W12 = W23 = 1, WXBAR = 2

Page 13: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

13

Experiment #1 – Inter-stage permutationsSince Cooley-Tukey’s FFT several common formulations available.

( ) ( )( ) ( )8 2 4 1 2 2 2 0 4 2 8F F I T I F I T I F R Ä Ä Ä Ä Pease formulation here

Experiment – several sizes of 5 common formulations where partitioned.

ISP have effect on solution quality, yet no clear winner formulation.

StockahmTr. Stockahm

Cooley-TukeyG. Sande

Pease

Page 14: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

14

Experiment #2 - GranularityThe weight of the nodes for the various computational stages of the transform.

F4F4 F4F4

F4F4

F4F4

F4F4

F4F4

F4F4

F4F4

F2F2

F2F2

F4F4

F4F4

F4F4

F4F4

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

F2F2

164 4 4 4 4 4 4( ) ( )F F I T I F P Ä Ä 16

422422244444 )))()(()(( PFITIFIIFF ÄÄÄÄ

coarser finer

Page 15: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

15

Experiment #2 – Granularity

Decomposition rules: Large DST = combinations of smaller DSTs analogous to node clustering

* Multiple formulations achieved best cost. Coarsest granularity is shown.

Size Cost Formulation Cost Formulation Cost Formulation Cost Formulation32 11 2/2/2/4* 7 2/2/2/4 32 8/2/2* 16 2/4/2/264 22 8/2/4* 14 2/2/8* 48 2/2/2/2/4 20 4/2/2/4

128 43 8/2/8* 26 16/2/2/2* 92 2/2/2/2/2/4 32 2/2/2/2/2/4256 86 4/2/32* 55 16/8/2* 132 4/2/2/2/2/4 58 2/2/2/2/2/2/4512 171 4/2/64* 106 64/4/2* 276 2/2/2/2/2/2/4/2 116 2/2/2/2/2/2/8

Array 4 Ring 4 Array 8 Ring 8

Effect of topology: Ring vs. Linear: 57% cost reductionFinest granularity not necessarily best.

( ) ( ) ( ) ( ) ( ) ( ) ( )( )( )8 4 2 8,4 4 2 8,4 2 4 8,2 2 4 8,2 2 4 8,2 2 2 2 4,2 2 2 4,2 8,2F F I T I F P F I T I F P F I T I F I T I F P P Ä Ä Ä Ä Ä Ä Ä Ä

Page 16: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

16

Experiment #3 – Breakdown strategy

Breakdown strategy – order and divisors with which a transform is decomposed.Split trees – a common graphical representation of break. StrategyExample: Two split tress for a DFT size 64.

( ) ( )( )( ) ( )64 4 2 8,4 4 2 8,4 8 64,8 8 8 64,8F F I T I F P I T I F P Ä Ä Ä Ä

( )64 2 32 64,2F F I T Ä ( ) ( )( )( )2 2 16 16,2 2 16 16,2 64,2I F I T I F P PÄ Ä Ä

(a)

(b)

6

3 3

2 1

6

1 5

41

(a) (b)

Page 17: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

17

Experiment #3 – Results

ProcedureExhaustive generation of split trees for DFT sizes n=16 to 256. Formulations partitioned for various topologiesObservation of split tree decisions that lead to ‘partition friendly’ formulationsGeneration of n > 256 formulations using rules.

Page 18: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

18

Conclusions and Future WorkMethodology for partitioning of DST to DHAs:

DST graph considerations Formulation exploration

Graph considerationsGeneration of initial partition linear – provides better results than random.Limitation of node moves – faster convergence time.

Exploration at the algorithmic level experimentsIsolated features such as permutations and granularity

Effect was evidenced, but hard to establish a relation to solution quality.Coarse granularity = better convergence, good solution quality

Breakdown strategy – ‘partition friendly’ formulations generated.

Current Work: Experimentation with DCTs.Experimentation with other properties define overall exploration strategy

Page 19: R. Arce-Nazario, M. Jimenez, and D. Rodriguez Electrical and Computer Engineering

19

Acknowledgements

Puerto Rico Experimental Program to Stimulate Competitive Research (PR-EPSCoR)

WALSAIP - Wide-Area Large Scale Automated Information Project

Puerto Rico NASA Space Grant

QUESTIONS?