Best cpu, video card and ram tuning utilities help desk geek
K T A U Kernel Tuning and Analysis Utilities
description
Transcript of K T A U Kernel Tuning and Analysis Utilities
K T A UKernel Tuning and Analysis Utilities
Department of Computer and Information Science
Performance Research Laboratory
University of Oregon
University of Oregon Performance Research Lab
Agenda
• Motivations
• KTAU Overview
• ZeptoOS - KTAU - TAU on BG/L
• KTAU - TAU on Linux Cluster
University of Oregon Performance Research Lab
What is a process is doing inside a kernel?
Solution:
Context-of-Execution Based profile/trace
We can analyze the execution path of a process, and store the data local to a process.
University of Oregon Performance Research Lab
What about other processes on the system?
Solution:
System-wide performance analysis
By aggregating performance of each process in the system (all or selectively), we can capture interactions among processes.
University of Oregon Performance Research Lab
Profiling or Tracing?
Answer:
Why not doing both?
• Profile• A summarized view of performance data, with the
advantage of compact data size.
• Trace• A detail view of process execution timeline, with a
disadvantage of large data size.
University of Oregon Performance Research Lab
Why do we need another kernel profiling/tracing tool?
Answer:
Why not?
• LTT• Oprofile• KernInst
University of Oregon Performance Research Lab
KTAU Design Goals
• Fine-grained kernel-level performance measurement
– Parallel applications
– Support both profiling and tracing
• Both process-centric and system-wide view
• Merge user-space performance with kernel-space
• Detailed program-OS interaction data
• Analysis and visualization compatible with existing tools
University of Oregon Performance Research Lab
KTAU Method• Instruments Linux kernel source with KTAU profiling
API
• Maintains performance data for each kernel routine (per process)
• Performance data accessible via /proc filesystem
• Instrumented application maintains data in user-space
• Post-execution performance data analysis
University of Oregon Performance Research Lab
KTAU
Framework
University of Oregon Performance Research Lab
KTAU Architecture
5 modules
- KTAU Instrumentation
- KTAU Profiling/Tracing Infrastructure
- KTAU Proc Interface
- KTAU User-API Library
- KTAU-D
University of Oregon Performance Research Lab
Kernel Profiling Issues on BG/L
• I/O node kernel• Linux kernel approach
• Compute node kernel• No daemon processes• Single address space
– single performance database– single callstack across user/kernel
• Keeps track of one process only (optimization)• Instrumented compute node kernel
University of Oregon Performance Research Lab
KTAU on BG/L I/O Node
. . . . . . . . C N 2. . . . . . . .
. . . . . . . . C N 3. . . . . . . ..
…32 Compute Nodes….
. . . . . . . . C N 31. . . . . . . .
. . . . . . . . C N 32. . . . . . . ...
BG/L IO-Node
BG/L Compute-Node
ZeptoOS IO-N Kernel
KTAU
User-space + ZeptoOS RamDisk
IBM’sCIOD KTAU-D
IBM Compute-N Kernel
User-space
Compute Job w/ TAU
. . . . . . . . C N 2. . . . . . . .
. . . . . . . . C N 3. . . . . . . ..
…32 Compute Nodes….
. . . . . . . . C N 31. . . . . . . .
. . . . . . . . C N 32. . . . . . . ...
BG/L IO-Node
BG/L Compute-Node
ZeptoOS IO-N Kernel
KTAU
User-space + ZeptoOS RamDisk
IBM’sCIOD KTAU-D
IBM Compute-N Kernel
User-space
Compute Job w/ TAU
University of Oregon Performance Research Lab
KTAU on BG/L
• Current status– IO Node ZeptoOS kernel profiling/tracing
– KTAU integrated into ZeptoOS build system
– Detailed IO Node kernel observation now possible
– KTAU-Daemon (KTAU-D) on IO Node• monitors system-wide and individual process• more than what strace allows
– Visualization of trace/profile of ZeptoOS and CIOD• Vampir/JumpShot (trace), and Paraprof (profile),
University of Oregon Performance Research Lab
KTAU Usage Models for BG/L IO-Node
• Daemon-based monitoring (KTAU-D)– Use KTAU-D to monitor (profile/trace) a single process (e.g.,
CIOD) or entire IO-Node kernel– No access to source code of user-space program– CIOD kernel-activity available though CIOD source N/A
• ‘Self’ monitoring– A user-space program can be instrumented (e.g., with TAU)
to access its OWN kernel-level trace/profile data– ZIOD (ZeptoOS IO-D) source (when available) can be
instrumented– Can produce MERGED user-kernel trace/profile
University of Oregon Performance Research Lab
More on KTAU-D
• A daemon running on BG/L IO-node that periodically accesses kernel profile/trace data and outputs to filesystem
• Configuration done through ZeptoOS configuration tool
• KTAU-D, configuration file, and necessary scripts are integrated into the ZeptoOS runtime environment.
University of Oregon Performance Research Lab
KTAU-D Configuration in ZeptoOS-1.2
University of Oregon Performance Research Lab
KTAU-D Profile Data• KTAU-D can be used to access profile data (system-
wide and individual process) of BGL IO-Node
• Data is obtained at the start and stop of KTAUD, and then the resulting profile is generated
• Currrently flat profiles with inclusive/exclusive times and Function call counts are produced– (Future work: Call-graph profiles).
• Profile data is viewed using the ParaProf visualization tool
University of Oregon Performance Research Lab
Example of Operating System Profile on I/O Nodes
Running Flash3 on 32 compute-node
Ciod KernelProfile
University of Oregon Performance Research Lab
KTAU-D Trace
• KTAU-D can be used to access system-wide and individual process trace data of BGL IO-Node
• Trace from KTAU-D is converted into TAU trace-format which then can be converted into other formats– Vampir, Jumpshot
• Trace from KTAU-D can be used together (merged) with trace from TAU to monitor both user and kernel space activities– (Work in progress)
University of Oregon Performance Research Lab
Exp 1: Observe activities on the IO node
Set up:– KTAU:
• Enable all instrumentation points• Number of kernel trace entries per process = 10K
– KTAU-D:• System-wide tracing• Accessing trace every 1 second and dump trace output
to a file in user’s home directory through NFS
– IOTEST:• An mpi-based benchmark (open/write/read/close)• Running with default parameters (block-size = 16MB) on
NFS.
University of Oregon Performance Research Lab
Read Time
Write Time
Write Seek Time
Read Seek Time
Main
IOTESTwith TAU
instrumentation
University of Oregon Performance Research Lab
sys_write() / sys_read()
KTAU Trace of CIOD running 2, 4, 8, 16, 32 nodes
As the number of compute node increase, CIOD has to handle larger amount of sys_call
being forwarded.
University of Oregon Performance Research Lab
Zoomed View of CIOD Trace (8 compute nodes)
University of Oregon Performance Research Lab
Can Correlate CIOD Activity with RPC-IOD?
• Activity within a BG/L ionode system switching from “CIOD” to “rpciod” during a “sys_write” system call
• rpciod performs “socket_send” and interrupt handling before switching back
rpciod
ciod
University of Oregon Performance Research Lab
Exp 2: Correlating multiple traces from Compute-node and IO-node
• Set up:– Running IOTEST with TAU instrumentation on 64
compute nodes– Running ZeptoOS-1.2 with KTAU on 2 io-node– Reduced set of kernel instrumentation.
• No TCP stack and schedule()
– 10K entries of ring-trace buffer– Using PVFS2
(Note: Trace of 64 compute-node and 2 io-node)
University of Oregon Performance Research Lab
read() @ 12:678 sec
write() @ 3:283 sec
TAU Trace
University of Oregon Performance Research Lab
sys_open() @ 53:1 sys_read() @1:05:545sys_write() @ 56:6
sys_open() @ 53:2 sys_write() @ 56:85 sys_read() @ 1:05:778
ciod on ionode23
ciod on ionode47
pvfs2-client on ionode23
pvfs2-client on ionode47
University of Oregon Performance Research Lab
Exp 3: Analyze system-wide performance
• Set up:– 2 runs of IOTEST with TAU instrumentation on 32
compute nodes• NFS• PVFS
– Running ZeptoOS-1.2 with KTAU on 1 io-node– Analyzing both profile and trace data
University of Oregon Performance Research Lab
write() @ 39:00 read() @ 47.804
write() @ 42:99 read() @ 54:61
pvfs2-client
ciod
rpciod
ciod