Exploiting Streams in Instruction and Data Address Trace Compression
description
Transcript of Exploiting Streams in Instruction and Data Address Trace Compression
Exploiting Streams in Instruction and Data Address Trace Compression
Aleksandar Milenković, Milena MilenkovićLaboratory for Advanced Computer Architectures and Systems at Alabama - LaCASAECE Department, The University of Alabama in Huntsville
{milenka | milenkm} @ece.uah.edu
WWC-06 2/29
Outline
Introduction Related work Stream-based compression Evaluation Conclusion
WWC-06 3/29
Why Program Execution Traces?
Trace-driven simulation in computer architecture research
Performance tuning
System validation
Introduction
WWC-06 4/29
Trace Issues
Trace collection, reduction, processing Traces must be large to offer
faithful representation of the system workload
An example: – 1 billion instructions, 10 B/instr: 10GB– SPEC CPU2000 benchmarks, reference
input: hundreds of billions of instructions Effective reduction technique:
– lossless, high compression ratio, fast decompression
Introduction
WWC-06 5/29
Trace Types
Basic block traces for control flow analysis
Address traces for cache studies Instruction words
for processor studies Operands
for arithmetic unit studies
Introduction
WWC-06 6/29
Related Work Ziv-Lempel algorithm (gzip utility) WPP - Whole Program Path (J. Larus, 1999)
– program instrumentation, only instruction traces– a trace of acyclic paths compressed with Sequitur
Timestamped WPP (Y. Zhang, R.Gupta, 2001)– path traces for a function stored in one block
PDATS, PDI (E. E. Johnson, 2001)– PDATS: stores address differences
with an optional repetition count – PDI: each of the N most frequently used instruction
words in the trace is replaced with its dictionary index; while other words are left unchanged
Loop detection (E. N. Elnozahy, 1999)– links info about data addresses with the loop
Using Value Predictors (M. Burtsher, 2003)
WWC-06 7/29
Stream Based Compression (SBC)
For combined address+instruction traces SBC exploits trace inherent characteristics
– Limited number of instruction streams– Locality of data addresses
Instructions from a stream replaced by ID Information about data addresses linked
to the corresponding instruction stream Resulting files:
– Stream Table File (STF)– Stream-Based Instruction Trace (SBIT)– Stream-Based Data Trace (SBDT)
WWC-06 8/29
Compression FlowH A IwH A IwH A Iw
T Iw… …T Iw
Dinero+ Trace
DA…DA
IBuffer DBufferS.SA
S.L
Stream Table
SA LSA L… …
SA L
1
2
n
T Iw
CaT Iw
CaSid Mid Rdy Aoff Stride Count
Sid Mid Rdy Aoff Stride Count
…
Sid Mid Rdy Aoff Stride Count
Data FIFO Buffer
SBIT
1
…
STF
SA L T1Iw1 … Tk Iwk
SBDT
Aoff Stride CountdH
H- Header; A – Address; Iw – Instruction Word; T- Type; DA – Data Address; S.SA – Stream Starting Address; S.L – Stream Length; Ca – Current Data Address, Sid – Stream Id; Mid – Memory Ref Id; Aoff – Address Offset; Rdy – Ready for Commit; dH – Data Header
Stream Based Compression
WWC-06 9/29
SBC Data Trace Format
DataHeader 1BStride
0, 1, 2, 4, or 8BAddrOffset
1, 2, 4, or 8BRepCount
0, 1, 2, 4, or 8B
Bits 7-5: RepCount size Bits 4-2: Stride size Bits 0-1: AddrOffset size
000: 0B (=0)001: 1B010: 2B011: 4B100: 8B101: 0B (=1)110: unused111: unused
000: 0B (=0)001: 1B010: 2B011: 4B100: 8B101: 0B (=1)110: 0B (=4)111: 0B (=8)
00: 1B01: 2B10: 4B11: 8B
Stream Based Compression
WWC-06 10/29
SBC: An ExampleType Address IWord
2 120026a60 223e00181 11ff96ff82 120026a64 b7fe00082 120026a68 421106522 120026a6c 424114122 120026a70 23bd19a42 120026a74 465204132 12002678 a43300000 11ff970202 1200267c 426114132 12002680 f43ffffd2 12002678 a43300000 11ff970282 1200267c 426114132 12002680 f43ffffd2 12002678 a43300000 11ff970302 1200267c 426114132 12002680 f43ffffd… … …2 12002678 a43300000 11ff971002 1200267c 426114132 12002680 f43ffffd2 12002678 a43300000 11ff971082 1200267c 426114132 12002680 f43ffffd2 120026a84 23defff0
Stream1 (It. 0)
Stream2 (It. 1)
Stream2 (It. 2)
Stream2 (It. 28)
Stream3 (It. 29)
Dinero+
Trace
Stream Based Compression
for (i=0; i<30;++i){ … a += c[i]; …} …
WWC-06 11/29
SBC: An Example
1
2
2
..
3
Stream-based Instruction Trace (SBIT)
AddrOffset Stride RepCount
11ff96ff8 0 0
11ff97020 0 0
11ff97028 8 1b
11ff97108 0 0
Stream-based Data Trace (SBIT)
1 223e0018
AddrOffset Length
120026a60 9
12002678 3
12002678 4
2 f43ffffd..
0 a4330000 2 f43ffffd..
0 a4330000 2 f43ffffd..
Stream Table File (STF)
Stream Based Compression
WWC-06 12/29
2 120026a64 b7fe00082 120026a68 421106522 120026a6c 424114122 120026a70 23bd19a42 120026a74 465204132 12002678 a43300000
1
SBC: How It Works
1
2
2
..
3
Stream-based
Instruction Trace (SBIT)
AddrOffset Stride RepCount
11ff96ff8 0 0
11ff97020 0 0
11ff97028 8 1b
11ff97108 0 0
Stream-based Data Trace (SBIT)
1 223e0018
AddrOffset Length
120026a60 9
12002678 3
12002678 4
2 f43ffffd..
Stream Table (in memory)
Stream Based Compression
1
2
3
Type Address IWord2 120026a60 223e0018
0
0
0
11ff96ff8
11ff96ff8
11ff97020
Current Address
Stride
Repetition Count
2 1200267c 426114132 12002680 f43ffffd
WWC-06 13/29
SBC: How It Works
1
2
2
..
3
Stream-based
Instruction Trace (SBIT)
AddrOffset Stride RepCount
11ff96ff8 0 0
11ff97020 0 0
11ff97028 8 1b
11ff97108 0 0
Stream-based Data Trace (SBIT)
AddrOffset Length
120026a60 9
12002678 3
12002678 4
Stream Table
Stream Based Compression
1
2
3
Type Address IWord2 120026a60 223e0018
2 120026a64 b7fe00082 120026a68 421106522 120026a6c 424114122 120026a70 23bd19a42 120026a74 465204132 12002678 a4330000
2 1200267c 426114132 12002680 f43ffffd
0
0
0
11ff97028
11ff96ff81
11ff970200
0 a4330000 2 f43ffffd..
2 12002678 a43300000
8
1b
11ff970282 1200267c 426114132 12002680 f43ffffd
WWC-06 14/29
SBC: How It Works
1
2
2
..
3
Stream-based
Instruction Trace (SBIT)
AddrOffset Stride RepCount
11ff96ff8 0 0
11ff97020 0 0
11ff97028 8 1b
11ff97108 0 0
Stream-based Data Trace (SBIT)
AddrOffset Length
120026a60 9
12002678 3
12002678 4
Stream Table
Stream Based Compression
1
2
3
Type Address IWord2 120026a60 223e0018
2 120026a64 b7fe00082 120026a68 421106522 120026a6c 424114122 120026a70 23bd19a42 120026a74 465204132 12002678 a4330000
2 1200267c 426114132 12002680 f43ffffd
11ff97028
8
1b
11ff97030
11ff96ff81
11ff970200
0 a4330000 2 f43ffffd..
2 12002678 a433000002 1200267c 426114132 12002680 f43ffffd
1a
11ff97028
2 12002678 a43300000
… … …2 12002678 a43300000 11ff971002 1200267c 426114132 12002680 f43ffffd2 12002678 a43300000 11ff971082 1200267c 426114132 12002680 f43ffffd2 120026a84 23defff0
11ff97030
11ff97108
0
2 1200267c 426114132 12002680 f43ffffd
WWC-06 15/29
Experimentation
SPEC CPU2000 Traces for Alpha ISA– First 2 billion instructions (F2B)– Mid 2 billion instructions (M2B)
• skip 50 billion, then collect 2 billion
Collection: modified SimpleScalar Measure compression ratio & decompression
time relative to the Dinero+– Gzipped only – mPDI– SBC– SBC.gz : SBC combined with Gzip– SBC.seq : SBC combined with Sequitur
Evaluation
WWC-06 16/29
Stream Statistics: CINT
Less than 7000 instruction streams for most applications
Evaluation
F2B M2B All F2B M2B All F2B M2B All164.gzip 751 336 1437 229 229 229 13.9 13.8 13.6176.gcc 25416 22222 30162 272 254 315 11.8 10.7 11.4181.mcf 744 308 1181 88 64 88 8.9 6.0 7.4186.crafty 4122 1892 5347 191 100 191 13.1 13.4 13.3197.parser 4767 4200 6116 157 157 189 9.4 9.9 10.0252.eon 3486 588 4389 169 168 169 13.8 14.1 13.7253.perlbmk 9034 6344 11542 84 868 868 10.1 12.0 11.8254.gap 3218 476 3530 284 75 284 24.3 10.3 11.1255.vortex 5496 2644 8254 126 110 126 11.1 11.2 11.0300.twolf 2399 1014 4902 163 185 185 12.3 14.5 14.4
Average 5943.3 4002.4 7686.0 176.3 221.0 264.4 12.9 11.6 11.8
# of Streams MaxStreamlen AvrStreamLen
WWC-06 17/29
Stream Statistics: CFPEvaluation
F2B M2B All F2B M2B All F2B M2B All168.wupwise 1563 234 1912 229 229 229 23.9 27.5 27.4171.swim 1582 496 1839 707 707 707 93.6 132.3 130.8172.mgrid 1457 875 1725 1944 1944 1944 240.1 159.6 420.8173.applu 1470 506 1752 3162 3162 3162 411.5 448.9 462.4177.mesa 1637 593 1938 550 266 550 14.8 18.5 18.15178.galgel 1818 81 4153 264 206 264 18.4 23.0 21.8179.art 435 341 976 168 561 561 10.3 8.7 9.0183.equake 517 260 1355 44 623 623 8.6 28.3 27.7188.ammp 955 502 1810 168 561 422 12.5 35.2 38.5189.lucas 964 317 1414 427 427 427 27.1 127.9 113.3191.fma3d 2083 841 5007 383 1158 1158 10.7 43.6 34.3200.sixtrack 3532 82 6515 264 580 580 20.1 192.9 170.5301.appsi 2439 389 2989 729 729 894 34.0 51.5 50.7
Average 1573.2 424.4 2568.1 695.3 857.9 886.2 71.2 99.8 117.3
# of Streams MaxStreamlen AvrStreamLen
Less than 7000 instruction streams for all applications
WWC-06 18/29
Compression Ratio: CINT, F2B
F2BCINT mPDI SBC Din.gz mPDI.gz SBC.gz SBC.seq164.gzip 4.4 61.5 40.6 47.9 214.5 197.5176.gcc 3.2 31.9 9.7 20.0 173.8 198.8181.mcf 3.4 47.7 24.9 56.9 513.2 612.3186.crafty 3.0 40.9 7.2 22.8 233.7 253.7197.parser 3.7 34.4 28.2 33.1 187.3 356.1252.eon 3.5 22.5 6.2 27.4 408.3 797.6253.perlbmk 3.2 31.4 6.0 16.8 349.4 327.1254.gap 4.0 51.0 13.3 36.3 783.4 888.6255.vortex 3.5 21.4 7.0 14.6 118.3 340.9300.twolf 3.4 28.8 7.6 23.9 107.9 90.2
Average 3.54 37.15 15.06 29.97 308.99 406.28
Evaluation
WWC-06 19/29
Compression Ratio: CINT, M2B
M2BCINT mPDI SBC Din.gz mPDI.gz SBC.gz SBC.seq164.gzip 3.8 61.8 42.4 49.2 222.3 204.4176.gcc 3.1 41.5 15.1 21.5 268.3 300.0181.mcf 2.4 16.6 21.4 20.7 59.9 84.8186.crafty 3.0 45.1 7.1 25.5 263.1 285.2197.parser 3.5 33.8 28.7 33.4 170.7 340.9252.eon 3.5 22.0 6.1 28.9 395.6 774.9253.perlbmk 2.9 43.1 35.8 48.2 755.6 1132.7254.gap 3.0 35.8 34.4 39.3 1142.0 1957.6255.vortex 3.4 27.4 12.1 25.4 234.2 411.8300.twolf 3.3 24.9 6.6 19.8 80.0 66.3
Average 3.2 35.2 21.0 31.2 359.2 555.9
Evaluation
WWC-06 20/29
Compression Ratio: CFP, F2B
F2BCFP mPDI SBC Din.gz mPDI.gz SBC.gz SBC.seq168.wupwise 4.0 79.0 34.3 99.7 2878.9 4811.3171.swim 3.1 410.7 24.4 179.6 43946.9 43522.9172.mgrid 2.9 74.9 12.2 38.3 8976.3 16329.6173.applu 2.9 66.3 13.0 23.1 2708.6 31370.8177.mesa 3.0 74.7 10.3 56.9 1238.6 1775.6178.galgel 3.5 99.9 21.1 29.0 11829.7 44227.4179.art 4.2 80.9 24.2 30.6 12606.5 24796.3183.equake 3.8 54.4 30.7 153.2 1929.8 3353.1188.ammp 4.6 79.6 24.9 49.2 2624.8 3571.9189.lucas 3.6 151.7 69.6 182.1 31181.3 78054.0191.fma3d 4.3 48.0 12.7 23.7 3617.7 17601.0200.sixtrack 3.2 68.5 20.0 50.7 1292.0 1951.1301.appsi 3.0 35.2 8.5 20.0 2295.1 11320.8
Average 3.5 101.8 23.5 72.0 9778.9 21745.1
Evaluation
WWC-06 21/29
Compression Ratio: CFP, M2B
M2BCFP mPDI SBC Din.gz mPDI.gz SBC.gz SBC.seq168.wupwise 2.7 42.9 18.0 37.9 2047.5 3741.7171.swim 2.8 505.6 21.0 155.7 99989.2 189501.1172.mgrid 2.9 76.9 12.6 38.6 9582.5 17525.1173.applu 2.8 77.7 14.2 24.9 3523.7 45522.8177.mesa 2.9 83.6 10.7 50.9 1081.5 1508.0178.galgel 2.5 55.9 27.9 38.6 9421.5 76728.1179.art 2.9 68.5 26.2 36.7 20895.7 94731.9183.equake 2.5 34.8 27.2 27.0 374.4 436.8188.ammp 2.5 41.8 22.7 28.5 445.0 442.8189.lucas 2.6 270.4 37.9 77.3 29332.7 58094.7191.fma3d 2.6 111.7 4.9 9.7 11987.6 34224.3200.sixtrack 2.6 130.8 13.5 32.5 7433.1 15566.1301.appsi 2.9 34.8 8.1 18.6 2290.8 13523.0
Average 2.7 118.1 18.8 44.4 15261.9 42426.7
Evaluation
WWC-06 22/29
Decompression Speedup, F2B
Decompression speedup - F2B
0
1
10
100
modPDI.gz
SBC.gz
SBC.seq
… relative to Dinero+.gz
Evaluation
WWC-06 23/29
Decompression Speedup, M2B… relative to Dinero+.gz
Decompression speedup - M2B
0
1
10
100
modPDI.gz
SBC.gz
SBC.seq
Evaluation
WWC-06 24/29
Compressibility of Instruction/Data Components
The instruction component(instruction address + instruction word) compresses much better
Only 5% of whole compressed trace for CINT, 10% for CFP
Further research efforts shouldimprove data address compression
Evaluation
WWC-06 25/29
Compressibility of Instruction/Data Components
Instruction address + instruction word trace component
1
10
100
1000
10000
100000
Co
mp
ress
ion
rat
io
SBC.gz
SBC.seq
mPDI.gz
Din.gz
Data address trace component
1
10
100
1000
164.g
zip
176.g
cc
181.m
cf
186.c
rafty
197.p
arse
r
252.e
on
253.p
erlbm
k
254.g
ap
255.v
orte
x
300.t
wolf
Co
mp
ress
ion
rat
io
SBC.gz
SBC.seq
mPDI.gz
Din.gz
Evaluation
WWC-06 26/29
Data Address Compression
A good indicator of compression ratio:the number of memory references in the trace divided by the number of records in SBDT file, NMEM/NSBDT.
Also depends on the length of repetition, stride, and address offset fields
E.g., 176.gcc and 300.twolf in F2B: NMEM/NSBDT =4.6 (176.gcc ), 4.5 (300.twolf)
Compression ratio: 10.7 (176.gcc ), 6.9 (300.twolf),
Reason - different length of record fields
Evaluation
WWC-06 27/29
|SBDT| = i (AddrOffi + Stridei + RepCounti), i =0,1,2,4,8
Data Address Compression: Components
|Din+Data| = 8 NMEM
ComprRatio = 8NMEM/(NSBDT i (PAddrOffi +PStridei +PRepCounti)
i =0,1,2,4,8; P - percentage
Percentage 176.gcc 300.twolf
AddrOffsetByte1 67.53 37.73AddrOffsetByte2 27.82 33.30AddrOffsetByte4 4.60 28.97AddrOffsetByte8 0.05 0.01
StrideByte0 49.68 32.64StrideByte1 28.12 20.72StrideByte2 19.03 28.76StrideByte4 3.16 24.24StrideByte8 0.00 0.00
RepCountByte0 77.24 74.71RepCountByte1 22.58 24.97RepCountByte2 0.18 0.33RepCountByte4 0.00 0.00
Evaluation
WWC-06 28/29
Conclusions
SBC: new technique for compression of combined data address and instruction traces– Reduces trace size and decompression time– Can be successfully combined
with other compression techniques such as Gzip and Sequitur
– One pass algorithm => migrate into hardware
– Does not require program instrumentation– Stream Table + Stream Frequency enable
fast workload characterization
WWC-06 29/29
Conclusions
Future directions– 2-level SBT referencing BBT
(Basic Block Table)– Study what happens when other trace
information are included (time, data value)– Possible hardware implementation– Can SBC trace driven simulation beat
execution-driven?
Backup Slides
WWC-06 31/29
Compressibility of Instruction/Data Components Not the same through the trace
Evaluation
171.swim Instructions (F2B)
1
10
100
1000
10000
100000
1000000
1 11 21 31 41 51 61 71 81 91
[ x 20 million instr]
Co
mp
res
sio
n r
ati
o
DineroI.raw/DineroI.gzipDineroI.raw/SbcI.rawDinero.raw/SbcI.gzip
171.swim Data (F2B)
1
10
100
1000
10000
100000
1 11 21 31 41 51 61 71 81 91
[ x 20 million instr ]
Co
mp
res
sio
n r
ati
o
DineroD.raw/DineroD.gzipDineroD.raw/SbcD.rawDineroD.raw/SbcD.gzip
WWC-06 32/29
FIFO Size Influence?
For most applications, not very significant after 4000 entries
Evaluation
Size decrease for SBDTrelative to 1000-entry FIFO
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1000 2000 4000 8000 16000
FIFO size
301.appsi SBDT 189.lucas SBDT
Size decrease for SBDT.gzrelative to 1000-entry FIFO
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1000 2000 4000 8000 16000
FIFO size
301.appsi SBDT.gz 189.lucas SBDT.gz
WWC-06 33/29
Trace Size: CINT
F2B M2B F2B M2B164.gzip 33.17 32.07 29.16 28.99176.gcc 50.94 52.10 31.80 31.98181.mcf 41.36 37.98 30.38 29.87186.crafty 37.74 36.71 29.84 29.68197.parser 37.94 35.06 29.87 29.44252.eon 48.59 48.58 31.45 31.45253.perlbmk 45.02 46.88 30.92 31.20254.gap 37.36 38.36 29.78 29.93255.vortex 44.40 38.95 30.83 30.02300.twolf 33.77 33.00 29.25 29.13
Average 41.03 39.97 30.33 30.17
Load+Store% Dinero+ [GB]
Evaluation
WWC-06 34/29
Trace Size: CFP
F2B M2B F2B M2B168.wupwise 19.76 30.96 27.16 28.83171.swim 31.02 32.86 28.84 29.11172.mgrid 36.66 36.43 29.68 29.64173.applu 37.75 38.20 29.84 29.91177.mesa 37.53 38.09 29.81 29.89178.galgel 41.80 41.27 30.44 30.36179.art 37.81 34.12 29.85 29.30183.equake 36.00 45.04 29.58 30.93188.ammp 31.13 37.23 28.85 29.76189.lucas 18.73 22.20 27.01 27.52191.fma3d 18.71 45.70 27.00 31.02200.sixtrack 32.09 24.69 29.00 27.89301.appsi 37.24 37.29 29.76 29.77
Average 32.02 35.70 28.99 29.53
Dinero+ [GB]Load+Store%
Evaluation