Instruction and Data Address Trace Compression Aleksandar Milenković (collaborative work with...
-
Upload
brent-hubbard -
Category
Documents
-
view
221 -
download
0
Transcript of Instruction and Data Address Trace Compression Aleksandar Milenković (collaborative work with...
Instruction and Data Address Trace Compression
Aleksandar Milenković
(collaborative work with Milena Milenković and Martin Burtscher)
Electrical and Computer Engineering Department
The University of Alabama in Huntsville
Email: [email protected]
Web: http://www.ece.uah.edu/~milenka
http://www.ece.uah.edu/~lacasa
2
Outline
Program Execution Traces Trace Compression Trace Compression in Hardware
Stream caches and predictors for instruction address trace compression
Data address stride caches for data address trace compression
Results Conclusions
3
Program Execution Traces
Streams of recorded events Basic block traces Address traces Instruction words Operands
Trace uses Computer architects for evaluation
of new architectures Computer analysts for workload characterization Software developers for program tuning,
optimization, and debugging
4
Instruction and Data Address Traces:An Example
for(i=0; i<100; i++) {
c[i] = s*a[i] + b[i];
sum = sum + c[i];
}
2 0x020001f4
0 0x020001f8 0xbfffbe24
0 0x020001fc 0xbfffbc94
2 0x02000200
2 0x02000204
2 0x02000208
2 0x0200020c
1 0x02000210 0xbfffbb04
2 0x02000214
InstructionAddress
DataAddressType
Dinero+ Execution Trace
@ 0x020001f4: mov r1,r12, lsl #2
@ 0x020001f8: ldr r2,[r4, r1]
@ 0x020001fc: ldr r3,[r14, r1]
@ 0x02000200: mla r0,r2,r8,r3
@ 0x02000204: add r12,r12,#1 (1 >>> 0)
@ 0x02000208: cmp r12,#99 (99 >>> 0)
@ 0x0200020c: add r6,r6,r0
@ 0x02000210: str r0,[r5, r1]
@ 0x02000214: ble 0x20001f4
5
Trace Issues
Trace issues Capture Compression Processing
Traces tend to be very large In terabytes for a minute of program execution Expensive to store, transfer, and use
Effective reduction techniques: Lossless High compression ratio Fast decompression
6
Outline
Program Execution Traces Trace Compression Trace Compression in Hardware
Stream caches and predictors for instruction address trace compression
Data address stride caches for data address trace compression
Results Conclusions
7
Trace Compression
General purpose compression algorithms Ziv-Lempel (gzip) Burroughs-Wheeler transformation (bzip2) Sequitur
Trace specific compression techniques Tuned to exploit redundancy in traces Better compression, faster,
can be further combined with general-purpose compression algorithms
8
Trace-Specific Compression TechniquesLossless Compression
Instructions Instructions + data
- Acyclic path (WPP [Larus 1999], Time Stamped WPP [Zhang and Gupta 2001])
- N-tuple [Milenkovic, Milenkovic and Kulick 2003]
- Instruction (PDI [Johnson, Ha and Zaidi 2001])
Graph with number of repetitions in nodes
Replacing an execution sequence with its identifier
Control flow graph + trace of transitions
Offset
Offset + repetitions
Link data addresses to dynamic basic block
Link data addresses to loop
Regenerate addresses
Abstract execution
Value Predictor
Mache [Samples 1989],LBTC [Luo and John 2004]
QPT [Larus 1993]
[Hamou-Lhadj and Lethbridge 2002]
PDATS [Johnson, Ha and Zaidi 2001]
[Pleszkun 1994],SBC [Milenkovic and Milenkovic, 2003]
[Elnozahy 1999], SIGMA [DeRose, et al. 2002]
[Eggers, et al. 1990],[Larus 1993]
VPC [Burtscher and Jeeradit 2003],TCGEN [Burtscher and Sam 2005]
9
Outline
Program Execution Traces Trace Compression Trace Compression in Hardware
Stream caches and predictors for instruction address traces
Data address stride caches for data address traces
Results Conclusions
10
Why Trace Compression in Hardware?
Problem #1: Capture program traces In software: trap after each instruction or taken branch
E.g., IBM’s Performance Inspector Slowdown > 100 times
Multiple cores on a single chip + more detailed information needed (e.g., time stamps of events)
Problem #2: debugging is far from fun Stop execution on breakpoints, examine the state Time-consuming, difficult,
may miss a critical state leading to erroneous behavior Stopping the CPU may perturb the sequence of events
making your bugs disappear => Need an unobtrusive real-time tracing mechanism
11
Trace Compression in Hardware
Goals Small on-chip area and small number of pins Real-time compression (never stall the processor) Achieve a good compression ratio
Solution A set of compression algorithms
targeting on-the-fly compression of instruction and data address traces
12
Exploiting Stream and Strides
Instruction address trace compression
Limited number andstrong temporal locality of instruction streams
=> Replace an instruction streamwith its identifier
Data address trace compression Spatial and temporal locality
of data addresses => Recognize regular strides
CINT #Streams Max.L Dyn.SL164.gzip 1437 229 13.6176.gcc 30162 315 11.4181.mcf 1181 88 7.4186.crafty 5347 191 13.3197.parser 6116 189 10.0252.eon 4389 169 13.7253.perlbmk 11542 868 11.8254.gap 3530 284 11.1255.vortex 8254 126 11.0300.twolf 4902 185 14.4
CFP #Streams Max.L Dyn.SL168.wupwise 1912 229 27.4171.swim 1839 707 130.8172.mgrid 1725 1944 420.8173.applu 1752 3162 462.4177.mesa 1938 550 18.15178.galgel 4153 264 21.8179.art 976 561 9.0183.equake 1355 623 27.7188.ammp 1810 422 38.5189.lucas 1414 427 113.3191.fma3d 5007 1158 34.3200.sixtrack 6515 580 170.5301.appsi 2989 894 50.7
13
Trace Compressor: System Overview
SCIT
Stream Cache(SC)
Data Address Stride Cache (DASC)
Predictor +Byte rep. FSM
Processor Core
SCMT DT DMT
Program
Counter
Data Address
Task Switch
Trace Output Controller
To External Unit
DAPC
Data Address
Buffer
Byte rep.FSM
Processor Core
Memory
Trace Compressor
System Under Test
Trace port
External Trace Unitfor Storing/Processing(PC or Intelligent Drive)
14
Outline
Program Execution Traces Trace Compression Trace Compression in Hardware
Stream caches and predictors for instruction address traces
Data address stride caches for data address traces
Results Conclusions
15
Stream Detector + Stream Cache
F(S.SA, S.SL)
iSet
Hit/Miss
SCMT (SA, SL) SCIT
’00…0’
S.SA & S.L
Stream Cache (SC)
NSET - 1
…NWAY - 1
=?
iWay
S.SA & S.LFrom InstructionStream Buffer
Stream Cache Index Trace
Stream Cache Miss Trace
iWay
PC
PPC
-
S.SA S.L
SA
=! 4
SL
Instruction Stream Buffer
SA
SA
0
1
i
01
reserved
SA L
(0x020001f4,0x09)
0x0E
(0x020001f4,0x09) 0x00 // it. 0
0x020001f40x020001f8
...0x02000214
0x0E // it. 1
0x0E // it. 99
16
SC Itrace Compression
Instruction Stream Buffer size Not to stall processor
(e.g., have consecutive very short instruction streams)
Stream cache Size Associativity Replacement policy Mapping function
Compress instruction stream1. Get the next instruction stream record
from the instruction stream buffer(S.SA, S.SL);2. Lookup in the stream cache with iSet = F(S.SA, S.SL);3. if (hit) 4. Emit(iSet && iWay) to SCIT; 5. else {6. Emit reserved value 0 to SCIT;7. Emit stream descriptor (S.SA, S.SL) to SCMT;8. Select an entry (iWay) in the iSet set to be replaced;9. Update stream cache entry: SC[iSet][iWay].Valid = 1
SC[iSet][iWay].SA = S.SA, SC[iSet][iWay].SL = S.SL;}10. Update stream cache replacement indicators;
Design Decisions:
17
SC Itrace Compression: An Analytical Model
Legend: CR(SC.I) – compression ratio N – number of instructions SL.Dyn – average stream
length (dynamic) SC.Hit(Nset,Nway) – SC hit rate
Assumptions: stream length < 256
(1 byte for SL) 4 bytes for stream starting
address
).1(5)(log81
.4).(
5).1(.
)(
8
)(log
.)(
4).(
)()(
).().(
2
2
WAYSNSETNWAYSSET
WAYSNSETN
WAYSSET
HitSCNN
DynSLISCCR
BytesHitSCDynSL
NSCMTSize
BytesNN
DynSL
NSCITSize
BytesNIDineroSize
SCMTSizeSCITSize
IDineroSizeISCCR
DynSLISCCRLimNN
DynSLISCCRLimNN
DynSLISCCRLimNN
NN
DynSLLimISCCRLim
HitSCWAYSSET
HitSCWAYSSET
HitSCWAYSSET
WAYSSETHitSCHitSC
.34.5)).((64
.57.4)).((128
.4)).((256
)(log
.32)).((
1.
1.
1.
21.1.
18
2nd Level Itrace Compression
Size(SCIT) >> Size(SCMT) HitRate = 98%, 8-bit index
=> Size(SCIT) = 10*Size(SCMT) Redundancy in SCIT
Temporal and spatial locality of instruction streams Reduce SCIT trace
Global Predictor N-tuple compression using Tuple History Table N-tuple compression using SCIT History Buffer
19
Global Predictor Structure
...
SCIT Trace
==?’0’
0
MaxP-1
Hit/Miss
SCIT PRED Trace SCIT PRED Miss Trace
History Buffer
F
’1’
next.sid
pindex
Predictor
20
SCIT Compression
Predict SCIT index1. Get the incoming index, next.sid, from the SCIT trace2. Calculate the SCIT predictor index, pindex,
using indices in the History bufferpindex = F (indices in the History Buffer);
3. Perform lookup in the SCIT Predictor with pindex;4. if(SCIT.Predictor[pindex] == next.sid) 5. Emit(‘1') to SCIT PRED trace; 6. else {7. Emit(‘0’) to SCIT PRED trace;8. Emit next.sid to SCIT Miss PRED trace; 9. SCIT.Predictor[pindex] = next.sid; }10. Shift in the next.sid to the History Buffer;
Length of history buffer Global predictor Size Mapping function
Design Decisions:
21
Redundancy in SCIT Pred Trace
High predictor hit rates and long runs of 0xFF bytes are expected in Predictor Hit Trace
Use a simple FSM to exploit byte repetitions
PREDHit
TracePrev.BYTE
=?CNT
SCIT PRED Header
SCIT PRED Repetition
Trace
// Detect byte repetitions in SCIT pred1. Get next SCIT Pred byte, Next.BYTE; 2. if (Next.BYTE == Prev.BYTE) CNT++;3. else {4. if (CNT == 0) {5. Emit Prev.BYTE to SCIT.REP.Trace;6. Emit ‘0’ to SCIT Header;7. } else {8. Emit (Prev.BYTE, CNT) pair
to SCIT.REP.Trace;9. Emit ‘1’ to SCIT Header;}10. Prev.BYTE = Next.BYTE;}
22
Outline
Program Execution Traces Trace Compression Trace Compression in Hardware
Stream caches and predictors for instruction address traces
Data address stride caches for data address traces
Results Conclusions
23
Data Address Trace Compression
More challenging task Data addresses rarely stay constant
during program execution However, they often have a regular stride => Use Data Address Stride Cache (DASC) to exploit
locality of memory referencing instructions and regularity in data address strides
24
index
PC
Data Address Stride Cache (DASC)
0
1
i
N - 1
… …
… …
LDA Stride
DA-LDA
G(PC)
DA
==?’0’ ’1’
DT (Data trace)DMT
Data Miss Trace
Stride.Hit
Data Address Stride Cache
Stride.Hit
DASC Tagless structure Indexed by PC of
the corresponding instruction Entry fields
LDA – Last Data Address Stride
0x020001f8
0xbfffbe24
0xbfffbe200xbfffbe1c
0xbfffbe20
0xbfffbe24
0 0 1
25
DASC Compression
// Compress data address stream1. Get the next pair from data buffers (PC, DA)2. Lookup in the data address stream cache indexSet = G(PC);3. cStride = DA - DASC[iSet].LDA;4. if (cStride == DASC[iSet].Stride) {5. Emit(‘1’) to DT; //1-bit info 6. } else {7. Emit(‘0’) to DT;8. Emit DA to DMT;9. DASC[iSet].Stride =lsb(cStride); }10. DASC[iSet].LDA = DA;
Number of entries Index function G Stride length Data address buffer depth
Design Decisions:
26
DASC Dtrace Compression: An Analytical Model
Legend: CR(SC.D) – compression ratio Nmemref – number of memory
referencing instructions DASC.Hit – DASC hit rate Assumptions:
4 bytes for stream starting address
HitDASCDSCCR
BHitDASCNDMTSizeDTSize
BNDDineroSize
DMTSizeDTSize
DDineroSizeDSCCR
memref
memref
.03125.1
1).(
)]125.04).1[()()(
4).(
)()(
).().(
3203125.0
1)).((
1.
DSCCRLim
HitDASC
27
Redundancy in DT Trace
DT
Prev.DT
=?CNT
Data Header(DH)
Data Repetition Trace (DRT)
// Detect data repetitions1. Get next DT byte; 2. if (DT == Prev.DT) CNT++;3. else {4. if (CNT == 0) {5. Emit Prev.DT to DRT;6. Emit ‘0’ to DH;7. } else {8. Emit (Prev.DT, CNT) pair to DRT;9. Emit ‘1’ to DH;}10. Prev.DT = DT;}
High predictor hit rates and long runs of 0xFF bytes are expected in DT Trace
Use a simple FSM to exploit byte repetitions
28
Outline
Program Execution Traces Trace Compression Trace Compression in Hardware
Stream caches and predictors for instruction address traces
Data address stride caches for data address traces
Results Conclusions
29
Experimental Evaluation
Goals Assess the effectiveness
of the proposed algorithms
Explore the feasibility of the proposed hardware implementations
Determine optimal size and organization of HW structures
Workload 16 MiBench benchmarks ARM architecture
IC NUS maxSL SL.Dyncjpeg 104,607,812 1636 239 10.89djpeg 23,391,628 1324 206 21.81lame 1,285,111,635 3410 252 27.81tiff2bw 143,254,646 1058 43 12.79tiff2rgba 151,691,275 1146 75 27.54tiffmedian 541,260,067 1431 75 22.22tiffdither 832,951,018 1831 51 12.57mad 286,974,899 1659 1055 20.09sha 140,885,982 495 62 15.15bf_e 544,053,846 413 300 5.85rijndael_e 319,977,971 542 254 18.94ghostscript 708,090,638 6900 187 8.70rsynth 824,942,227 1323 180 15.77stringsearch 3,675,745 439 62 5.61adpcm_c 732,513,651 347 71 54.63gsm_d 1,299,270,245 845 401 11.07
Legend: • IC – Instruction count• NUS – Number of unique instruction streams• maxSL – Maximum stream length• SL.Dyn – Average stream length (dynamic)
30
Findings about SC Size/Organization
Good compression ratio Outperforms fast GZIP High stream cache hit rates for
all application (>98 %) Smaller SCs work well too
Replacement policy Pseudo-LRU vs. FIFO
Associativity 4-way is a reasonable choice 8-way and 16-way desirable
Mapping function S.SA<5+n:6> xor S.L<n-1:0>
n=log2(NSET)
CR(SC.I) WaysEntries 1 2 4 8
8 16.3 17.6 17.0 15.816 21.1 22.1 27.8 26.632 23.9 28.0 34.4 34.064 27.5 36.9 44.1 47.1
128 29.0 47.6 54.1 57.4256 28.0 47.8 53.6 54.2
CR=f(Complexity), 4-way SC
0
0.2
0.4
0.6
0.8
1
1.2
0 50 100 150 200 250 300
#SC entries
CR
/Ma
xCR
31
Findings about Global Predictor
Number of entries should not exceed the number of entries in SC
Having longer histories and larger predictorsgives only marginal improvements for all applicationsexcept ghostscript, blowfish, and stringsearch
History length = 1 Index GPRED using the
previous SCIT index
CR(SC+GP.I) Pred. entriesSC Entries P32 P64 P128 P2568x4 47.6416x4 72.17 81.1932x4 91.91 113.22 145.7964x4 100.32 115.09 150.54 207.64
32
Putting It All Together (SC+GPRED+BREP): Itrace Compression
SC,GPRED DEF. FAST BEST BEST DEF.CR 8x8,64 16x8,128 32x8,256 64x4,256 I.GZ I.GZ I.GZ I.BZ2 GZGZcjpeg 263.7 316.7 315.0 277.1 109.6 54.5 124.5 342.0 265.7djpeg 287.1 443.3 539.4 492.3 71.8 39.8 73.7 202.0 232.5lame 214.0 238.6 255.2 250.6 60.5 128.5 333.9 87.6 174.2tiff2bw 351.5 1111.5 3062.2 1493.0 114.1 83.9 114.4 376.8 615.2tiff2rgba 517.6 3713.1 3592.0 1834.0 121.3 20.3 122.0 529.6 1292.7tiffmedian 649.4 1229.4 1827.4 1601.2 152.8 92.3 155.5 472.9 1017.5tiffdither 54.8 120.9 184.8 154.3 91.1 46.4 99.8 170.9 147.1mad 221.0 230.4 257.2 253.4 73.5 37.8 78.5 94.3 206.2sha 348.5 339.6 322.4 322.3 211.4 54.4 221.8 656.5 4112.1bf_e 100.2 100.2 92.6 92.6 170.4 41.0 182.3 352.0 4065.9rijndael_e 142.1 298.6 290.1 285.6 143.8 12.6 150.6 141.8 2392.9ghostscript 30.4 106.4 123.6 119.4 100.6 39.7 111.2 212.5 434.5rsynth 97.0 152.8 246.0 211.5 46.7 30.6 48.0 143.2 191.2stringsearch 21.8 78.5 114.0 74.9 82.1 32.3 100.6 202.5 132.8adpcm_c 29972.5 28663.9 27457.8 27456.6 233.1 107.3 233.6 1862.6 12764.7gsm_d 234.9 292.3 401.2 376.0 85.4 59.2 87.2 165.6 507.1TOTAL 113.2 209.0 254.4 237.8 87.5 47.2 112.9 172.0 321.6
33
Findings about DASC
Stride size 1 byte is optimal 2 byte stride improves
compression for 10% DASC with 1K entries
is an optimal choice Tagged (multi-way) DASC
further improves overall compression ratio
Increased complexity
CR=f(Complexity)
0
1
2
3
4
5
6
7
0 1000 2000 3000 4000 5000
# DASC entries
CR
34
DASC Compression Ratio
DASC DASC DASC DASC DASC DASC DEF. FAST BEST32 64 128 256 512 1024 D.GZ D.GZ D.GZ D.BZ2 D.GZGZ
cjpeg 3.35 4.60 5.14 5.77 6.54 7.11 5.98 4.50 6.11 18.20 9.57djpeg 2.81 3.57 4.28 4.96 5.22 5.29 4.22 3.78 4.22 8.62 4.92lame 1.20 1.52 2.81 3.82 4.49 4.88 6.56 4.01 6.63 8.80 8.60tiff2bw 76.31 78.04 84.28 105.04 128.84 134.23 2.14 2.55 2.10 14.28 3.07tiff2rgba 5.98 79.81 91.24 107.49 127.05 139.57 2.10 2.79 2.09 4.06 4.03tiffmedian 8.64 8.70 8.74 8.81 8.87 8.89 4.40 4.37 4.53 11.16 6.03tiffdither 2.61 6.08 7.21 8.69 9.65 10.06 4.51 4.41 4.51 7.87 6.77mad 1.30 1.59 1.96 2.07 2.35 2.64 4.08 3.60 4.22 13.47 6.97sha 6.58 7.94 9.38 10.79 11.36 11.36 44.91 8.36 45.61 172.71 591.69bf_e 1.58 1.95 2.38 2.61 2.75 2.91 7.58 4.86 7.83 16.35 9.08rijndael_e 1.10 1.10 1.10 1.13 1.29 2.06 4.24 3.22 4.27 7.31 4.49ghostscript 1.07 1.19 1.56 2.19 2.93 5.27 27.21 18.58 27.46 47.42 40.83rsynth 1.22 1.36 1.76 3.81 8.30 32.43 24.44 21.46 25.27 57.40 43.88stringsearch 1.80 2.04 2.70 4.13 4.44 5.16 11.12 8.57 11.23 15.03 11.47adpcm_c 3.13 3.13 3.13 3.13 3.13 3.13 6.57 3.64 7.15 12.27 11.42gsm_d 2.67 4.48 11.30 13.60 14.81 16.78 21.60 18.05 23.29 63.53 33.15TOTAL 1.66 2.04 2.80 3.77 4.67 6.12 6.78 5.51 6.90 13.29 9.70
35
Hardware Complexity Estimation
CPU model In-order, Xscale like Vary SC and DASC parameters
SC and DASC timings SC: Hit latency = 1 clock,
Miss latency = 2 clocks DASC: Hit latency = 2 clocks
Miss latency = 2 clocks To avoid any stalls
Instruction stream input buffer: MIN = 2 entries
Data address input buffer: MIN = 8 entries
Results are relatively independent of SC and DASC organization
Component Entries Complexity Bytes
Instruction stream buffer
2 2x5 10
Stream detector 2 2x4 8
Stream cache 64x4 256x5 1280
Global Predictor 256 256 + 1(h) 257
Data address buffer 8 8x8 64
Data address stride cache
1024 1024x5 5120
Byte repetition state machines
- 4 4
36
Trace Port Bandwidth Analysis
CJPEG
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1 21 41 61 81 101
Instructions Executed (millions)
bit
s/in
str.
SC
SC+PRED
SC+PRED+BREP
CJPEG
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1 21 41 61 81 101
Instruction Executed (millions)
bit
s/in
str.
TDASC
TDASC+BREP
MAD
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
1 41 81 121 161 201 241 281
Instructions Executed (millions)
bit
s/in
str.
SC
SC+PRED
SC+PRED+BREP
MAD
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
1 41 81 121 161 201 241 281
Instruction Executed (millions)
bit
s/in
str.
TDASC
TDASC+BREP
37
Outline
Program Execution Traces Trace Compression Trace Compression in Hardware
Stream caches and predictors for instruction address traces
Data address stride caches for data address traces
Results Conclusions
38
Conclusions
A set of algorithms and hardware structuresfor instruction and data address trace compression
Stream Caches + Global Predictor + Byte repetition FSMfor instruction traces
Data Address Stride Cache + Byte repetition FSM for data traces Benefits
Enabling real-time trace compression with high compression ratio Low complexity (small structures, small number of external pins)
Analytical & simulation analysis focusing on compression ratio and optimal sizing/organization of the structures as well as real-time trace port bandwidth requirements
Laboratory for Advanced Computer Architectures and Systems
at Alabama: Research Overview
Aleksandar Milenković
The LaCASA Laboratory
Electrical and Computer Engineering Department
The University of Alabama in Huntsville
Email: [email protected]
Web: http://www.ece.uah.edu/~milenka
http://www.ece.uah.edu/~lacasa
40
Secure Processors
PMAC (Parallel MACs) for reducedcryptographic latency
A variation of the one-time-pad for code encryption
Instruction Verification Buffer for conditional execution before verification
Computer Security is Critical Software & physical attacks
Sign & Verify for Guaranteed Integrity and Confidentiality of Code
Improvements
Buffer overflow in MMClient.exe in IndiatimesMessenger 6.0 allows remote attackers to cause a denial of service (application crash) and
possibly execute arbitrary code via a long group name argument to the RenameGroupfunction in the MMClient.MunduMessenger.1 ActiveX object.
Multiple format string vulnerabilities in (1) neon 0.24.4 and earlier, and other products that use neon including (2) Cadaver, (3) Subversion, and (4) OpenOffice, allow remote malicious WebDAV servers to
execute arbitrary code.
Buffer overflow in the J PEG (J PG) parsing engine in the Microsoft Graphic Device Interface Plus (GDI+) component, GDIPlus.dll, allows remote
attackers to execute arbitrary code via a J PEG image.
Multiple buffer overflows in RealOne Player, RealOne Player 2.0, RealOne Enterprise Desktop, and RealPlayer Enterprise allow remote
attackers to execute arbitrary code via malformed (1) .RP, (2) .RT, (3) .RAM, (4) .RPM or (5) .SMIL files.
Multiple heap-based buffer overflows in the imlibBMP image handler allow remote
attackers to execute arbitrary code via a crafted BMP file.
I nteger overflow in pixbuf_create_from_xpm (io-xpm.c) in the XPM image decoder for gtk+ 2.4.4 (gtk2) and earlier, and gdk-pixbuf before 0.22, allows
remote attackers to execute arbitrary code via certain n_col and cpp values that enable a
heap-based buffer overflow.
Stack-based buffer overflow in the URL parsing function in Gaim before 1.3.0 allows remote attackers to
execute arbitrary codevia an instant message (IM) with a large URL.
Buffer overflow in WIDCOMM Bluetooth Connectivity Software, as used in products such as BTStackServer 1.3.2.7 and 1.4.2.10, Windows XP and Windows 98 with MSI Bluetooth Dongles, and HP IPAQ 5450 running WinCE 3.0, allows remote
attackers to execute arbitrary code via certain service requests.
Original Code Signed Code
Secure Installation
Trusted Code
Signature Match
Signature Fetch
Instruction Fetch
Secure Execution
CalculateSignature
EKey3(I-Block)
Signature
Encrypt
Generate Program Keys(Key1,Key2,Key3)
Secure Mode
EKey.Cpu(Key1)
EKey.Cpu (Key2)
EKey.CPU(Key3)
Encrypt
I-Block
ProgramLoading
DecryptProgram Keys
(Key1,Key2,Key3)
Decrypt I-Block
=?
CalculateSignature
Yesterday
Today
Tomorrow
http://www.ece.uah.edu/~lacasa/research.htm#secure_processors
41
Microbenchmarks for Architectural Analysis
Small programs for uncovering architectural parameters (usually not publicly disclosed) of modern processors
Relatively simple, so their behavior can be understood
Benefits Architecture-aware
compiler optimization Processor design evaluation
and verification Testing Competitive analysis
PerformanceCounters
Microbenchmarks
...
BTBOutcome Predictor
Branch relatedevents
BTB Size
BTB Org.
Local History
BTB Indexing
Global History...
Results Microbenchmarks
for BTB analysis Experimental flow for
outcome predictor Tested on P6 and NetBurst
(Northwood core)
Challenge Dothan (PentiumM) predictor
http://www.ece.uah.edu/~lacasa/bp_mbs/bp_microbench.htm
42
TinyHMS
Concept Prototype
Software
WirelessTransceiver
TimeSyncInterface(USB/CF)
Main Control (Messaging, Fusion, Buffering)
Flash StorageWireless
Transceiver
TimeSync
MessagingBuffering
FlashStorage
ActiSProtocol
Network Coordinator(Telos)
Interface(USB/CF)
ActiSProtocol
WWAN/WLANCommunication
Messaging Control
Storage
User Interface
Network Coordinator(Telos)
Interface(USB/CF)
ActiSProtocol
WWAN/WLANCommunication
Messaging Control
Storage
User Interface
PS(PDA)
ActiS Application Layer
Signal Processing
ActiS(Tmote sky)
IAS/ISPM
Data Acquisition
Filtering/Pre-processing
ActiSInterface
IAS/ISPM
Data Acquisition
Filtering/Pre-processing
ActiSInterface
Data Acquisition
Filtering/Pre-processing
ActiSInterface
Sensor Interface
http://www.ece.uah.edu/~lacasa/research.htm#tinyHMS
43
TinyHMS105 105.2 105.4 105.6 105.8 106 106.2 106.4 106.6 106.8 1070
1000
2000
105 105.2 105.4 105.6 105.8 106 106.2 106.4 106.6 106.8 1070.5
1
1.5x 10
4
105 105.2 105.4 105.6 105.8 106 106.2 106.4 106.6 106.8 1071000
2000
3000
4000
accXaccYaccZ
Heart Beat
Event Messagewith Timestamp
…
BeaconMessage
…
Heart Beat Step Heart Beat Step
Frame i-1
Motion Sensor(TS2)
ECGSensor(TS1)
TS1 TS2NC TS3
Frame i
BeaconMessage
TS1 TS2NC TS3