Optimizations for a Simulator Construction System Supporting Reusable Components

24
Optimizations for a Simulator Construction System Supporting Reusable Components David A. Penry and David I. August David A. Penry and David I. August The Liberty Architecture Research Group The Liberty Architecture Research Group Princeton University Princeton University

description

Optimizations for a Simulator Construction System Supporting Reusable Components. David A. Penry and David I. August The Liberty Architecture Research Group Princeton University. Architecture Options. Architectural Simulator. Architectural Exploration. - PowerPoint PPT Presentation

Transcript of Optimizations for a Simulator Construction System Supporting Reusable Components

Page 1: Optimizations for a Simulator Construction System Supporting Reusable Components

Optimizations for a Simulator Construction System Supporting Reusable Components

David A. Penry and David I. AugustDavid A. Penry and David I. August

The Liberty Architecture Research GroupThe Liberty Architecture Research Group

Princeton UniversityPrinceton University

Page 2: Optimizations for a Simulator Construction System Supporting Reusable Components

2

Architectural Simulator

Architectural Exploration

Architectural options are Architectural options are studied using simulatorsstudied using simulators

More iterations = better More iterations = better decisionsdecisions

Need fast path to simulatorNeed fast path to simulator

Need fast simulator Need fast simulator

ArchitectureOptions

Page 3: Optimizations for a Simulator Construction System Supporting Reusable Components

3

Simulator Construction Systems

Reuse simulator Reuse simulator infrastructureinfrastructure

Architectural SimulatorInstance

Architecture Description

Simulator Builder

But still must be able to But still must be able to reuse descriptionsreuse descriptions

Structural compositionStructural compositionMedium-grained Medium-grained components components Standard communication Standard communication contractscontractsHigh parameterizabilityHigh parameterizabilitySeparation of concernsSeparation of concerns

Page 4: Optimizations for a Simulator Construction System Supporting Reusable Components

4

The Reuse Penalty

Reusability leads to a speed penalty: Reusability leads to a speed penalty: more component instancesmore component instancesmore signalsmore signalsmore general codemore general code

Therefore: Therefore: reusable systems are often slowerreusable systems are often slower

How can we mitigate the reuse penalty?How can we mitigate the reuse penalty?

Page 5: Optimizations for a Simulator Construction System Supporting Reusable Components

5

Liberty Simulation Environment

Simulator construction system for high reuseSimulator construction system for high reuse

Two-tiered specificationsTwo-tiered specificationsLeaf module templates in CLeaf module templates in CNetlisting language for instantiation and customizationNetlisting language for instantiation and customization

Three-signal standard communications contract with Three-signal standard communications contract with overrides (overrides (control functionscontrol functions))

Code is generatedCode is generated

Enable

Data

Ack

Page 6: Optimizations for a Simulator Construction System Supporting Reusable Components

6

Contrast: SystemC

Simulator construction libraries (C++)Simulator construction libraries (C++)

Partially supports reuse:Partially supports reuse:++ Structural composition Structural composition++ Module granularity varies Module granularity varies? Communications contracts by convention? Communications contracts by convention-- Low parameterizability Low parameterizability-- Separation of concerns Separation of concerns

Description is a C++ programDescription is a C++ program

Page 7: Optimizations for a Simulator Construction System Supporting Reusable Components

7

A C

D

B

A C

D

B

A C

D

B

A C

D

B

A C

D

B

A C

D

B

A C

D

B

Models of Computation

System C uses Discrete Event (DE)System C uses Discrete Event (DE)

LSE uses Heterogenous Synchronous Reactive (HSR)LSE uses Heterogenous Synchronous Reactive (HSR)Edwards (1997)Edwards (1997)Unparsed code blocks (black boxes)Unparsed code blocks (black boxes)Values begin Values begin unresolvedunresolved and resolve monotonically and resolve monotonicallyChaotic schedulingChaotic scheduling

Page 8: Optimizations for a Simulator Construction System Supporting Reusable Components

8

Potential HSR Benefits vs. DE

Static schedules possibleStatic schedules possible

Lower per-signal overheadLower per-signal overhead

Use of Use of unresolvedunresolved value to avoid redundant computation value to avoid redundant computation

A C

D

B

Page 9: Optimizations for a Simulator Construction System Supporting Reusable Components

9

Three models of a 4-way out-of-order microprocessorThree models of a 4-way out-of-order microprocessor

SystemC using custom speed-optimized componentsSystemC using custom speed-optimized componentsLSE model using custom speed-optimized componentsLSE model using custom speed-optimized componentsLSE model using standard reusable componentsLSE model using standard reusable components

9 benchmarks (CPU 2000/MediaBench)9 benchmarks (CPU 2000/MediaBench)See paper for compiler, etc.See paper for compiler, etc.

Experimental methodology

481383Custom LSE

42348911Reusable LSE

32714Custom SystemC

Non-edge signalsSignalsInstancesModel

Page 10: Optimizations for a Simulator Construction System Supporting Reusable Components

10

Custom LSE vs. SystemC

Custom LSE outperforms custom SystemCCustom LSE outperforms custom SystemCReduction in overheadReduction in overheadUse of Use of unresolvedunresolved signal value signal valueStatic instantiation and code specializationStatic instantiation and code specialization

Dynamic schedule for bothDynamic schedule for both

Model Cycles/sec Speedup

Custom SystemC 53722 -

Custom LSE 155111 2.88

Page 11: Optimizations for a Simulator Construction System Supporting Reusable Components

11

Reuse Penalty

Reusable model suffers large reuse penalty (0.26)Reusable model suffers large reuse penalty (0.26)Many more signalsMany more signalsMany more non-edge signalsMany more non-edge signalsMore componentsMore components

All dynamic schedulesAll dynamic schedules

Model Cycles/sec Speedup

Custom SystemC 53722 -

Custom LSE 155111 2.88

Reusable LSE 40649 0.76

Page 12: Optimizations for a Simulator Construction System Supporting Reusable Components

12

Creating Static Schedules

Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce

A C

D

B

Page 13: Optimizations for a Simulator Construction System Supporting Reusable Components

13

Creating Static Schedules

Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce

A C

DB

1 23

4

1

2

4

3

Page 14: Optimizations for a Simulator Construction System Supporting Reusable Components

14

Creating Static Schedules

Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce

1

2

4

3

ab

c

Schedule: a b c

A C

DB

1 23

4

Page 15: Optimizations for a Simulator Construction System Supporting Reusable Components

15

Creating Static Schedules

Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC).Break into strongly-connected components (SCC). Schedule in topological order Schedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce

1

2

4

3

ab

c

Schedule: 1 b 4

HT

A C

DB

1 23

4

Page 16: Optimizations for a Simulator Construction System Supporting Reusable Components

16

Creating Static Schedules

Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce

1

2

4

3

ab

c

Schedule: 1 2 3 2 4

HT

A C

DB

1 23

4

Page 17: Optimizations for a Simulator Construction System Supporting Reusable Components

17

Creating Static Schedules

Edward’s algorithm (1997)Edward’s algorithm (1997)Construct a signal dependency graphConstruct a signal dependency graphBreak into strongly-connected components (SCC). Break into strongly-connected components (SCC). Schedule in topological orderSchedule in topological orderPartition each SCC into a head and tailPartition each SCC into a head and tailSchedule tail recursively, then repeat head (any order)Schedule tail recursively, then repeat head (any order)and tail’s scheduleand tail’s scheduleCoalesceCoalesce

1

2

4

3

AB

C

HT

Choosing an optimal partition is exponential

A C

DB

1 23

4

Schedule: 1 2 3 2 4 A B C B (D)

Page 18: Optimizations for a Simulator Construction System Supporting Reusable Components

18

Dynamic sub-schedule embedding

SCCs arise due to incomplete informationSCCs arise due to incomplete information

““Optimal” schedules are optimal w.r.t. informationOptimal” schedules are optimal w.r.t. information

““Optimal” schedule may be Optimal” schedule may be worseworse than dynamic than dynamic

A

B C

When an SCC is “too big”, just schedule that section When an SCC is “too big”, just schedule that section dynamicallydynamically

Page 19: Optimizations for a Simulator Construction System Supporting Reusable Components

19

Dependency information enchancement

In practice, we see big SCCsIn practice, we see big SCCs

Peek in the black boxPeek in the black boxSimple parsing of communication overrides (control functions)Simple parsing of communication overrides (control functions)Can ask user to tell about internal dependenciesCan ask user to tell about internal dependenciesNot too painful because it is reusedNot too painful because it is reused

A

B C

Page 20: Optimizations for a Simulator Construction System Supporting Reusable Components

20

Evaluation of Information Enhancement

Control function parsing more useful aloneControl function parsing more useful aloneNot principally through schedulingNot principally through scheduling

It is important to have both kinds of enhancementIt is important to have both kinds of enhancement

Optimization Cycles/sec SpeedupNo static scheduling 40649 -

With control function parsing 47850 1.18

With internal dependencies 41306 1.02

With both 57046 1.40

Page 21: Optimizations for a Simulator Construction System Supporting Reusable Components

21

Reuse Penalty Revisited

Reuse penalty mitigated in part Reuse penalty mitigated in part

Model Cycles/sec Speedup Build time (s)

Custom SystemC 53722 - 49.1

Custom LSE 155111 2.88 15.4Reusable LSE w/o optimization

40649 0.76 33.9

Reusable LSE with optimization

57046 1.06 34.4

Reusable LSE model 6% faster than custom SystemC

Page 22: Optimizations for a Simulator Construction System Supporting Reusable Components

22

Conclusions

A tradeoff exists between speed and reuseA tradeoff exists between speed and reuse

The simulator construction system can helpThe simulator construction system can helpHigher base speed makes reuse penalty less painfulHigher base speed makes reuse penalty less painful

Optimizations are possible with HSR modelOptimizations are possible with HSR modelAbility of scheduler adapt to information available is powerfulAbility of scheduler adapt to information available is powerfulThis adaptation is not possible with DEThis adaptation is not possible with DE

You can have high reuse at reasonable speedsYou can have high reuse at reasonable speeds

Page 23: Optimizations for a Simulator Construction System Supporting Reusable Components

23

Future Work

Release of LSERelease of LSEFall 2003Fall 2003http://liberty.princeton.eduhttp://liberty.princeton.edu

Hybrid model of computationHybrid model of computationEmbed HSR in DE, DE in HSREmbed HSR in DE, DE in HSRAutomatic extraction of HSR portions from DEAutomatic extraction of HSR portions from DE

Page 24: Optimizations for a Simulator Construction System Supporting Reusable Components

24

Other optimizations

Improved block coalescingImproved block coalescingSee paperSee paper

Code specializationCode specializationImplementation of APIs depends upon environmentImplementation of APIs depends upon environment