Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM...
-
Upload
annice-fox -
Category
Documents
-
view
212 -
download
0
Transcript of Department of Computer Science Mining Performance Data from Sampled Event Traces Bret Olszewski IBM...
Department of Computer Science
Mining Performance Data from Sampled Event Traces
Bret OlszewskiIBM Corporation – Austin, TX
Ricardo Portillo, Diana Villa, Patricia J. Teller The University of Texas at El PasoDepartment of Computer Science
Department of Computer Science
Outline
Motivation Data Collection Environment
• Workload & Platform• Monitored Events
Data Analysis & Results Conclusions and Future Work
Department of Computer Science
Motivation
Capturing Event Traces System Simulation: Overhead penalty is too high Real-time Metrics: Capture every event during actual execution
Problem Growing size of full event traces is becoming unmanageable
GoalUse sampled event traces to analyze execution behavior
Department of Computer Science
Data Collection Environment
Workload• TPC-C benchmark
Commercial OLTP
Platform• IBM eServer pSeries 690 architecture (p690)
8- and 32-processor configurations
Department of Computer Science
P X
XP
XP
L2
L2
L2
L3
MCM 0
8-processor p690 configurationPlatform
P X
XP
XP
P
L2
L2
L2
L2
L3
MCM 1
X XP
L2
Department of Computer Science
32-processor p690 configurationPlatformP P
PP
PP
P
L2
L2
L2
L2
L3
MCM 0
P
P P
PP
PP
P
L2
L2
L2
L2
L3
MCM 2
P
P P
PP
PP
P
L2
L2
L2
L2
L3
MCM 1
P
P P
PP
PP
P
L2
L2
L2
L2
L3
MCM 3
P
Department of Computer Science
Monitored Events
L2-cache data-load misses• L2.5• L2.75• L3• L3.5• MEM
Department of Computer Science
P X
XP
XP
L2
L2
L3
MCM 0
P X
XP
XP
P
L2
L2
L2
L2
L3
MCM 1
X XP
L2
Where is L2 Miss Resolved?
L2
Department of Computer Science
P X
XP
XP
L2
L2
L2
L3
MCM 0
P X
XP
XP
P
L2
L2
L2
L2
L3
MCM 1
X XP
L2
Where is L2 Miss Resolved?
L2.5 Event
Department of Computer Science
P X
XP
XP
L2
L3
MCM 0
P X
XP
XP
P
L2
L2
L2
L2
L3
MCM 1
X XP
L2
L2 L2
Where is L2 Miss Resolved?
L2.5 Event L2.75 Event
Department of Computer Science
P X
XP
XP
L2
L3
MCM 0
P X
XP
XP
P
L3
MCM 1
X XP
L2
L2 L2
L2 L2
L2L2
Where is L2 Miss Resolved?
L2.5 Event L2.75 EventL3 Event
Department of Computer Science
P X
XP
XP
L2
MCM 0
P X
XP
XP
P
L3
MCM 1
X XP
Where is L2 Miss Resolved?
L2.5 Event L2.75 EventL3 Event
L3
L2
L2L2
L2 L2
L2L2
L3.5 Event
Department of Computer Science
Data Collection
Performance Monitoring Unit (PMU)• Special-purpose registers• Programming interface
Kernel extension
eprof• PMU configuration• Event-based sampling
Department of Computer Science
Sampled Event Trace
10-minute observation interval• Record periodic occurrences of an event• 100 events/sec/CPU
Event record372872 184469 0.328104637 000000000000A8C4 0000000000218880
PID TID Timestamp Effective Instruction Address
EffectiveData Address
Average number of samples collected/event• 238,448 for 8-processor data • 212,396 for 32-processor data
Department of Computer Science
Analysis
• Memory Hotspots
• Individual Address Region
• Process Migration
Department of Computer Science
• L3 and Memory are most active memory levels
• Counted total number of L3 hits
• Counted number of L3 hits per address region
• Counted number of unique cache lines referenced per region
Memory Hotspots
Department of Computer Science
Distribution of L3 Data Load Hits
0 0.1 0.2 0.3 0.4 0.5
Kernel
Text
Data,BSS,Heap
BufferPool
Stack
Ublock&KernelStack
M_BUF
KERN_HEAP
Ad
dre
ss r
egio
n
Fraction of data loads
Unique cache line
Hit %
Memory Hotspots
Department of Computer Science
Individual Address Region
• We can look at an address region in more detail
• Looked at Buffer Pool region
• Counted number of references per memory level
• Counted number of unique cache lines referenced per memory level
Department of Computer Science
0
20000
40000
60000
80000
100000
120000
L2 L2.5 MOD L2.75 MOD L3 L3.5 MEMEvent Name
Distribution of Data Load Hits: BUFFER_POOL
DataLoadHits
UniqueCacheLines
Individual Address Region
Department of Computer Science
Process Migration
• Process migration from one chip to another can degrade performance when all or part of the process' working set must follow, via L2-cache misses
• Looked at 885 threads
• Counted number of migrations per thread
• Counted number of L2.5 hits per thread
Department of Computer Science
Process Migration32-Way L2.5 Hits VS. Intra-MCM Migrations
0
5000
10000
15000
20000
25000
0 1000 2000 3000 4000 5000 6000
Intra-MCM Migrations
L2.
5 M
od
ifie
d H
its
Department of Computer Science
Only a few addresses in Buffer Pool region are causing most of its L3 hits
For Buffer Pool, heavily referenced shared data is constantly resolved outside an MCM
Process migration is not a source of performance degradation
Conclusions
Department of Computer Science
Quantify representativeness of sampled event traces
Suggest more ways to improve p690 application performance
Study sampled event traces for other workloads
In depth study of process characterization
Future Work
Department of Computer Science
Thank You!