BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs
description
Transcript of BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs
![Page 1: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/1.jpg)
BarrierWatch:Characterizing Multithreaded Workloads across and within
Program-Defined EpochsSocrates
Demetriades and Sangyeun Cho
Computer Frontiers 2011, Ischia, Italy.
![Page 2: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/2.jpg)
Program’s time-varying behavior.Bodytrack / 16-threads parallel execution
Time
Challenge: How to detect behavioral changes?
NoC
Tra
ffic
Adaptive CMP architectures can take advantage of this time varying behavior.
![Page 3: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/3.jpg)
Tracking program behaviorTraditionally, two methods for tracking program
phases
1. Run-time monitoring of the program execution.– Observations are limited by the monitoring metric. – Cost of monitoring mechanisms. – Granularity of monitoring intervals? Fine- vs coarse- grain?
2. Profile based analysis. – Static program analysis, complicated algorithms. – Binary rewriting– Architectural support.
Code-based metrics: not directly suitable for parallel workloads.
![Page 4: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/4.jpg)
Overview of our proposal
Track the program behavior at Run Time.
Effective
Simple
Low-cost
View the program execution on ‘epoch’ granularity.
![Page 5: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/5.jpg)
Outline
Introduction Program epochs and
characterization Run-time epoch change detection. Case study Summary
![Page 6: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/6.jpg)
Observation / Motivation
• Natural alignment of barriers with the changes in program behavior.
• Intervals enclosed by barriers repeat with consistent behavior.
NoC
Tra
ffic
Time
![Page 7: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/7.jpg)
NoC
Tra
ffic
Time
Program epochs
• Epoch: An execution interval between two consecutive barriers.
A B
epoc
h
BarriersA
Bepoch
AB
epoc
h AB
epoc
h AB
epoc
h AB
epoc
h A Bep
och
![Page 8: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/8.jpg)
NoC
Tra
ffic
Time
Program epochs
BarriersA
Bepoch
BA
epoc
h BA
epoc
h BA
epoc
h BA
epoc
h BAep
och
• Epoch: An execution interval between two consecutive barriers.
![Page 9: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/9.jpg)
NoC
Tra
ffic
Time
Program epochs
DCepoch
DCepoch
• Epoch: An execution interval between two consecutive barriers.
![Page 10: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/10.jpg)
NoC
Tra
ffic
Time
Program epochs
• Epoch: An execution interval between two consecutive barriers.
![Page 11: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/11.jpg)
Epochs’ effectiveness: characterization
Are epochs effective in characterizing the variability of program behavior?
How similar is program behavior among the different dynamic instances of the same epoch?
How different is the behavior across different epochs?
How the program behaves within the epochs?
![Page 12: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/12.jpg)
Characterization across epochs.
Error bars:variability across the dynamic instances of an epoch
Dispersion across points:variability across different epochs
LOW variability
HIGH variability
NoC
Tra
ffic
![Page 13: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/13.jpg)
Characterization across epochs.
• fundamental correlation between epoch boundaries and changes in program behavior
• High predictability of behavior across epoch instances
NoC
Tra
ffic
L2 M
iss
Rat
io
Glo
bal
IPC
C2C
Tra
nfer
s
Low variability across instances of an epochHigh variability across different epochs
![Page 14: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/14.jpg)
Characterization across epochs.
Low variability across instances of an epochHigh variability across different epochs
Ratio =
The smaller the ratio, • the sharper the behavioral shifts on epoch
boundaries• the more predictable the program behavior across
repeating epoch instances.
![Page 15: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/15.jpg)
PARSEC and SPLASH2 programs.
bodytrack fluidanimatestreamcluster barnes fmm lu ocean radiosity water-ns0.0
0.2
0.4
0.6
0.8
1.0
Global IPC L2 Miss Ratio Traffic Volume C2C-tranfer Hit Ratio
Varia
tion
Ratio
Less than 0.2 for most benchmarks.
![Page 16: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/16.jpg)
Epochs’ effectiveness: characterization
Are epochs effective in characterizing the variability of program behavior?
How similar is program behavior among the different dynamic instances of the same epoch?
How different is the behavior across different epochs?
How the program behaves within the epochs?
![Page 17: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/17.jpg)
Characterization within epochs. • Epochs may exhibit stable or other behavioral
patterns within their boundaries.
• Internal behavior patterns reoccur and thus can be accurately predicted.
Stable
Unstable
Multiphase
![Page 18: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/18.jpg)
Characterization within epochs.
0%
20%
40%
60%
80%
100%
bodytrack fluidan. streamcl. barnes fmm lu ocean radiosity water-ns average
0%
20%
40%
60%
80%
100%Stable epochs Unstable epochs Multi-phase epochs
![Page 19: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/19.jpg)
Characterization within epochs.
0%
20%
40%
60%
80%
100%Stable epochs Unstable epochs Multi-phase epochs
bodytrack fluidan. streamcl. barnes fmm lu ocean radiosity water-ns average
![Page 20: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/20.jpg)
Characterization within epochs.
• Most epochs exhibit stable behavior within their boundaries.
• Close relation to classic definition of program phase.
• Reoccurring Internal patterns can be predictable.
0%
20%
40%
60%
80%
100%Stable epochs Unstable epochs Multi-phase epochs
bodytrack fluidan. streamcl. barnes fmm lu ocean radiosity water-ns average
![Page 21: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/21.jpg)
Epoch characterization summary
Epochs repeat in a consistent and predictable way providing a reliable granularity of the cyclic pattern of program behavior.
Epoch boundaries are likely to naturally indicate changes of program behavior
Most epochs exhibit stable behavior within their boundaries or other reoccurring predictable patterns.
![Page 22: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/22.jpg)
Epochs: Advantages
Independent from the underlying architecture.
Naturally adopting variable-length intervals
Deterministic boundaries (global sync points).
Barriers can be easily captured at run time.
Many multithreaded workloads are written with barrier synchronizations.
![Page 23: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/23.jpg)
Outline
Introduction Program epochs and
characterization Run-time epoch change detection. Case study Summary
![Page 24: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/24.jpg)
...barrier_wait(barrier)......barrier_wait(barrier)...
Application’s source code
Application’s Instruction stream
Run-time epoch change detection.
Reconfiguration
units
EPOCH ID Decision signature
F bit
Barrier A Barrier ABarrier A
Epoch Table
Barrier B
Barrier B
Barrier A TBarrier B Config ABBarrier B TConfig ABBarrier A TBarrier B Config AB
![Page 25: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/25.jpg)
Outline
Introduction Program epochs and
characterization Run-time epoch change detection. Case study. Summary
![Page 26: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/26.jpg)
Case study: Overview Purpose: Demonstrate the applicability of the
BarrierWatch approach in the context of dynamic adaptation.
Goal: Optimize energy/performance trade-off in a CMP architecture using BarrierWatch.
Adaptation Technique: DVFS applied to the NoC. (@ epoch granularity)
![Page 27: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/27.jpg)
Experimental methodologyBenchmarks: From PARSEC & Splash2 suites (pthread).
Architectural ModelFull system simulator (simics) augmented with a cycle accurate memory hierarchy model.Tile-based CMP model / 16 in-order cores / 2-issue widthShared, physically distributed L2 Cache. Mesh NoC, x-y routing.Two-stage router pipeline, buffer size 2 per VC.
![Page 28: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/28.jpg)
On-Chip DVFS
On-Chip Power consumption Model NoC power + Background power NoC Voltage/Frequency levels:
Frequency (GHz)
Voltage (V) alias
3 0.8 f100% 2.25 0.65 f75% 1.5 0.5 f50% 0.75 0.35 f25%
![Page 29: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/29.jpg)
Evaluated schemes• Schemes with fixed/static NoC frequency.
1. f100% (baseline).2. f75% 3. f50% 4. f25%
• Epoch-based DVFS schemes (adaptive architectures) 1. f-DVFS dyn (Run-time DVFS)2. F-DVFS stat (off-line predefined DVFS settings).
• Best frequency: The one that minimizes the Energy x Delay (ED product).
![Page 30: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/30.jpg)
bodytrack streamcluster barnes average-25
-20
-15
-10
-5
0
5
10
15
20
25
f-75% f-50% f-25% f-DVFS dyn f-DVFS stat
% E
nerg
y Sa
ving
s
bodytrack streamcluster barnes average0
10
20
30
40
% S
low
dow
n
Case study: Results
![Page 31: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/31.jpg)
bodytrack streamcluster barnes average-25
-20
-15
-10
-5
0
5
10
15
20
25
f-75% f-50% f-25% f-DVFS dyn f-DVFS stat
% E
nerg
y Sa
ving
s
bodytrack streamcluster barnes average0
10
20
30
40
% S
low
dow
n
Case study: Results
![Page 32: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/32.jpg)
bodytrack streamcluster barnes average-25
-20
-15
-10
-5
0
5
10
15
20
25
f-75% f-50% f-25% f-DVFS dyn f-DVFS stat
% E
nerg
y Sa
ving
s
bodytrack streamcluster barnes average0
10
20
30
40
% S
low
dow
n
Case study: Results
![Page 33: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/33.jpg)
bodytrack streamcluster barnes average-25
-20
-15
-10
-5
0
5
10
15
20
25
f-75% f-50% f-25% f-DVFS dyn f-DVFS stat
% E
nerg
y Sa
ving
s
bodytrack streamcluster barnes average0
10
20
30
40
% S
low
dow
n
Case study: Results
-38.5
-83.2
![Page 34: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/34.jpg)
bodytrack streamcluster barnes average-25
-20
-15
-10
-5
0
5
10
15
20
25
f-75% f-50% f-25% f-DVFS dyn f-DVFS stat
% E
nerg
y Sa
ving
s
bodytrack streamcluster barnes average0
10
20
30
40
% S
low
dow
n
Case study: Results
-38.5
-83.2
Run-time Epoch-based DVFS: 12.5% energy savings for 2.7% slowdown
![Page 35: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/35.jpg)
Case study: Results
Epoch-based dynamic schemes outperform all static scheme.
bodytrack fluidanimate streamcluster barnes fmm ocenan radiosity water-ns average0
0.2
0.4
0.6
0.8
1
1.2
1.4
f-75% f-50% f-25% f-DVFS dyn f-DVFS stat
ED Im
prov
emen
t
![Page 36: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/36.jpg)
Outline
Introduction Program epochs and
characterization Run-time epoch change detection. Case study. Summary.
![Page 37: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/37.jpg)
Summary
Program-defined epochs represent well the repetitive and varying behavior of multithreaded programs.
BarrierWatch prominent method for effective run-time management in CMPs.
Desirable properties: 1. Simple and lightweight. 2. Effective at run-time. 3. Independent of the underlying architecture. 4. Well suited for Parallel applications.
![Page 38: BarrierWatch: Characterizing Multithreaded Workloads across and within Program-Defined Epochs](https://reader030.fdocuments.in/reader030/viewer/2022012919/56816174550346895dd10111/html5/thumbnails/38.jpg)
Thank you!
Computer Frontiers 2011, Ischia, Italy.