Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and...

27
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis

Transcript of Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and...

Page 1: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Managing Multi-Configuration Hardware via Dynamic Working Set Analysis

By Ashutosh S.Dhodapkar and James E.Smith

Presented by Kyriakos Yioutanis

Page 2: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Introduction

• When a program executes, it passes through phases where performance characteristics and hardware resources may vary.

• If these phases changes can be detected, optimizations for performance and/or power can be applied and dynamic reconfiguration can be invoked.

• Configurable units can have a fixed number of configurations e.g. four different cache sizes.

• This paper appose many configuration algorithms that can be applied to any multi-configuration unit.

Page 3: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Dynamically configurable hardware

• Configurable caches and TLBs • Allocation of memory hierarchy resources • Allocation of memory buffer resources • Configurable branch predictors • Configurable instruction windows • Configurable pipelines

A combination of these methods can be used. Complexity of the optimization problem increases, especially if the methods interact with one another.

Page 4: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Dynamic reconfiguration algorithm

Page 5: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Dynamic reconfiguration algorithm(2)

• Refer as Rochester Algorithm, used to control multi-configuration data cache

• Interval 100K instructions• 2 states: Stable, Unstable• Tuning and reconfiguration in unstable state• Dynamically the threshold change to detect phase change • Detection efficiency (ability to detect phase change)• Reconfiguration overhead (depends on the amount of state) (10 cycles to 1000 e.g. data cache).• Tuning Overhead (time to find optimal configuration).• Each tuning can lead to multiple reconfigurations.• Why we study working sets?

Because phase changes are manifestations of working set changes.

Page 6: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Working with working set

• Working set W(ti,τ) for i=1,2…, set of distinct segments {s1,s2…,sω} touched over the ith window (interval) of size τ

• Working set (w.s.) size is ω, cardinality of the set.• Segments are memory regions of some fixed size (page)• General model, phase transition model.• Phase is defined as the maximal interval over which the

working set remains more or less constant.• The program follow a series of steady state phases with

abrupt transitions in between.• Focus on the instruction working set (distinguish data, instr)• Window size is important to capture a working set.• In this paper, working set contains cache line granularity

sized elements (32-256 bytes), units are caches and predictors

• Use of non-overlapping window

Page 7: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Working with working set(2)

• Main goal identify w.s, measure size & detect change in w.s.• Measure to compare two phases with w.s. W(ti,τ) and W(tj,τ)• Relative working set distance.

• Large δ value indicates a w.s. change, a small no change.• If δ=0 the sets identical, δ=1 sets are totally different.

• Define threshold δth, there is a w.s change if δ> δth

Page 8: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Working set signature

• Difficult to manipulate complete working sets• A lossy-compressed w.s representation,w.s signature (w.s.s)• W.s.s is a n-bit vector mapping w.s.elements into n-buckets• Use of random hash function (srand and rand)• Low-order b address bit ignored in hash (w.s. elements

cache line granularity)• Range of bit-vector 32-128 bytes• Bit-vector is cleared at the begging of every interval.

Page 9: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Working set signature(2)

Page 10: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Working set signature(3)

• The size of the signature is related to the w.s size.• K random keys are hashed into n buckets, the fraction of

buckets filled, f,

• W.s.size can be estimated

• 90% filled table corresponds w.s.size about 2.5 times larger than the number of filled entries.

• The measure of similarity of two signatures S1,S2, the relative signature distance is defined as

• Use a threshold value Δth to detect phase changes

Page 11: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

3. Methology

• Modified version SimpleScalar and SPEC2000 benchmarks• Compile using base-level optimization• Choice of benchmarks

1)long and short term phase with differing performance

2)recurring phases (test w.s. identification)

3)different w.s in a benchmark that lead to similar behavior for certain cache/predictor configuration & completely different behavior (test reconfiguration)

• 100K instructions/interval, 20,000 intervals (2 billion intst)• Signature vector size is 1024 bits(128 bytes)

Page 12: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Signature accuracy

• Evaluate accuracy of w.s.s distance with comparison full w.s• Measure the relative distances between pairs of consecutive

windows

Page 13: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Signature accuracy (2)

• Rochester algorithm uses dynamic count of conditional branches to measure w.s.changes

• Relative distance metric for conditional branches

Page 14: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Signature accuracy (3)

• Some correlation, high level dispersion.• Several significant working set changes that are associated

with very small relative branch

• Define Δth = 0,5 which filters out most of the noise and detects significant phase changes.

• Phase changes is relatively insensitive to Δth, because phase change tends to be abrupt.

Page 15: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Evaluation: managing configurable hardware

Signature based algorithm

Page 16: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Evaluation: managing configurable hardware (2)

• Three states: Stable – program w.s is stable & configuration is optimal, Unstable – w.s is in transition, Tuning w.s is stable & different configurations explored.

• Similar with Rochester Algorithm, but the signature-based algorithm does not tune while the w.s is in transition

• Icache size configured to 2KB,8KB,32KB or 128KB• Parameters for Rochester algorithm

base_br_noise = 4500, br_dec = 50, br_inc = 1000, base_perf_noise = 450, perf_dec = 5, perf_inc = 100

and threshold = 2%

Page 17: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Evaluation: managing configurable hardware (3)

Page 18: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Evaluation: managing configurable hardware (4)

Page 19: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Evaluation: managing configurable hardware (5)

Page 20: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Evaluation: managing configurable hardware (6)

Page 21: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Evaluation: managing configurable hardware (7)

• To reduce unnecessary tunings, we extend the signature-

based algorithm to wait for 4 stable intervals before tuning. • If the state is UNSTABLE for more than 10 intervals and

performance is below threshold, the cache size is increased to the maximum. This acts as a backup strategy in cases where the working set does not stabilize, so tuning is never performed.

Page 22: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

4. Measuring working set sizes

• The signature size is closely related to the actual working set

size. In cases where performance is directly related to the working set size, for example instruction and data caches, the signature size can be used to determine the optimal configuration; there is no need for tuning.

Page 23: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Working set size experiments

• For small w.s. the graph is close to linear and as it gets bigger the graph becomes non-linear. Even in the non-linear, the signature can give reasonably accurate w.s size estimates 3-4x the maximum signature size.

• So a typical signature size (32-128 bytes) with line-size granularities (32-128 bytes) can estimate w.s sizes of many tens to hundreds of KB

Page 24: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Evaluation: reconfiguration using signature size

• The extended signature-based algorithm can be modified to use the signature size for selecting an optimal cache configuration – the smallest that holds the current working set (plus 10% to allow for some noise).

• To determine the appropriate size, equation 2 (Section 2) is used. This eliminates the need to tune, and it typically reduces the number of reconfigurations as well. (signature size)

Page 25: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Identifying recurring phases

• The same phases often recur multiple times during program execution. Implement an algorithm to save recurring phase information to avoid re-tuning.

• This will be done by maintaining a phase table in memory. After tuning has determined the optimal configuration for a particular phase, it will be stored in the table. Later, if the phase recurs, the optimal configuration can be reinstated without going through the tuning process.

Page 26: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Phase statistics

Page 27: Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.

Evaluation: recurring working sets

• The algorithm (phase table) for exploiting recurring working sets is similar to the basic algorithm. On detecting a phase change, the algorithm first performs a table lookup to see if configuration information for the phase exists in the table. If so, the optimal configuration is reinstated. If not, the algorithm goes into the TUNING state. At the end of tuning, the optimal configuration is committed to the signature table.

• In addition to the configuration information, the table also keeps track of phase lengths. If, during its last execution, the length was fewer than four intervals (400,000 instructions), then tuning is not performed. This avoids tuning for insignificant phases. Four intervals are chosen because the tuning process takes a maximum of four intervals.