K T A U Kernel Tuning and Analysis Utilities

29
K T A U Kernel Tuning and Analysis Utilities Department of Computer and Information Science Performance Research Laboratory University of Oregon

description

K T A U Kernel Tuning and Analysis Utilities. Department of Computer and Information Science Performance Research Laboratory University of Oregon. Agenda. Motivations KTAU Overview ZeptoOS - KTAU - TAU on BG/L KTAU - TAU on Linux Cluster. - PowerPoint PPT Presentation

Transcript of K T A U Kernel Tuning and Analysis Utilities

Page 1: K     T     A     U Kernel Tuning and Analysis Utilities

K T A UKernel Tuning and Analysis Utilities

Department of Computer and Information Science

Performance Research Laboratory

University of Oregon

Page 2: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

Agenda

• Motivations

• KTAU Overview

• ZeptoOS - KTAU - TAU on BG/L

• KTAU - TAU on Linux Cluster

Page 3: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

What is a process is doing inside a kernel?

Solution:

Context-of-Execution Based profile/trace

We can analyze the execution path of a process, and store the data local to a process.

Page 4: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

What about other processes on the system?

Solution:

System-wide performance analysis

By aggregating performance of each process in the system (all or selectively), we can capture interactions among processes.

Page 5: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

Profiling or Tracing?

Answer:

Why not doing both?

• Profile• A summarized view of performance data, with the

advantage of compact data size.

• Trace• A detail view of process execution timeline, with a

disadvantage of large data size.

Page 6: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

Why do we need another kernel profiling/tracing tool?

Answer:

Why not?

• LTT• Oprofile• KernInst

Page 7: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

KTAU Design Goals

• Fine-grained kernel-level performance measurement

– Parallel applications

– Support both profiling and tracing

• Both process-centric and system-wide view

• Merge user-space performance with kernel-space

• Detailed program-OS interaction data

• Analysis and visualization compatible with existing tools

Page 8: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

KTAU Method• Instruments Linux kernel source with KTAU profiling

API

• Maintains performance data for each kernel routine (per process)

• Performance data accessible via /proc filesystem

• Instrumented application maintains data in user-space

• Post-execution performance data analysis

Page 9: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

KTAU

Framework

Page 10: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

KTAU Architecture

5 modules

- KTAU Instrumentation

- KTAU Profiling/Tracing Infrastructure

- KTAU Proc Interface

- KTAU User-API Library

- KTAU-D

Page 11: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

Kernel Profiling Issues on BG/L

• I/O node kernel• Linux kernel approach

• Compute node kernel• No daemon processes• Single address space

– single performance database– single callstack across user/kernel

• Keeps track of one process only (optimization)• Instrumented compute node kernel

Page 12: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

KTAU on BG/L I/O Node

. . . . . . . . C N 2. . . . . . . .

. . . . . . . . C N 3. . . . . . . ..

…32 Compute Nodes….

. . . . . . . . C N 31. . . . . . . .

. . . . . . . . C N 32. . . . . . . ...

BG/L IO-Node

BG/L Compute-Node

ZeptoOS IO-N Kernel

KTAU

User-space + ZeptoOS RamDisk

IBM’sCIOD KTAU-D

IBM Compute-N Kernel

User-space

Compute Job w/ TAU

. . . . . . . . C N 2. . . . . . . .

. . . . . . . . C N 3. . . . . . . ..

…32 Compute Nodes….

. . . . . . . . C N 31. . . . . . . .

. . . . . . . . C N 32. . . . . . . ...

BG/L IO-Node

BG/L Compute-Node

ZeptoOS IO-N Kernel

KTAU

User-space + ZeptoOS RamDisk

IBM’sCIOD KTAU-D

IBM Compute-N Kernel

User-space

Compute Job w/ TAU

Page 13: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

KTAU on BG/L

• Current status– IO Node ZeptoOS kernel profiling/tracing

– KTAU integrated into ZeptoOS build system

– Detailed IO Node kernel observation now possible

– KTAU-Daemon (KTAU-D) on IO Node• monitors system-wide and individual process• more than what strace allows

– Visualization of trace/profile of ZeptoOS and CIOD• Vampir/JumpShot (trace), and Paraprof (profile),

Page 14: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

KTAU Usage Models for BG/L IO-Node

• Daemon-based monitoring (KTAU-D)– Use KTAU-D to monitor (profile/trace) a single process (e.g.,

CIOD) or entire IO-Node kernel– No access to source code of user-space program– CIOD kernel-activity available though CIOD source N/A

• ‘Self’ monitoring– A user-space program can be instrumented (e.g., with TAU)

to access its OWN kernel-level trace/profile data– ZIOD (ZeptoOS IO-D) source (when available) can be

instrumented– Can produce MERGED user-kernel trace/profile

Page 15: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

More on KTAU-D

• A daemon running on BG/L IO-node that periodically accesses kernel profile/trace data and outputs to filesystem

• Configuration done through ZeptoOS configuration tool

• KTAU-D, configuration file, and necessary scripts are integrated into the ZeptoOS runtime environment.

Page 16: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

KTAU-D Configuration in ZeptoOS-1.2

Page 17: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

KTAU-D Profile Data• KTAU-D can be used to access profile data (system-

wide and individual process) of BGL IO-Node

• Data is obtained at the start and stop of KTAUD, and then the resulting profile is generated

• Currrently flat profiles with inclusive/exclusive times and Function call counts are produced– (Future work: Call-graph profiles).

• Profile data is viewed using the ParaProf visualization tool

Page 18: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

Example of Operating System Profile on I/O Nodes

Running Flash3 on 32 compute-node

Ciod KernelProfile

Page 19: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

KTAU-D Trace

• KTAU-D can be used to access system-wide and individual process trace data of BGL IO-Node

• Trace from KTAU-D is converted into TAU trace-format which then can be converted into other formats– Vampir, Jumpshot

• Trace from KTAU-D can be used together (merged) with trace from TAU to monitor both user and kernel space activities– (Work in progress)

Page 20: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

Exp 1: Observe activities on the IO node

Set up:– KTAU:

• Enable all instrumentation points• Number of kernel trace entries per process = 10K

– KTAU-D:• System-wide tracing• Accessing trace every 1 second and dump trace output

to a file in user’s home directory through NFS

– IOTEST:• An mpi-based benchmark (open/write/read/close)• Running with default parameters (block-size = 16MB) on

NFS.

Page 21: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

Read Time

Write Time

Write Seek Time

Read Seek Time

Main

IOTESTwith TAU

instrumentation

Page 22: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

sys_write() / sys_read()

KTAU Trace of CIOD running 2, 4, 8, 16, 32 nodes

As the number of compute node increase, CIOD has to handle larger amount of sys_call

being forwarded.

Page 23: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

Zoomed View of CIOD Trace (8 compute nodes)

Page 24: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

Can Correlate CIOD Activity with RPC-IOD?

• Activity within a BG/L ionode system switching from “CIOD” to “rpciod” during a “sys_write” system call

• rpciod performs “socket_send” and interrupt handling before switching back

rpciod

ciod

Page 25: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

Exp 2: Correlating multiple traces from Compute-node and IO-node

• Set up:– Running IOTEST with TAU instrumentation on 64

compute nodes– Running ZeptoOS-1.2 with KTAU on 2 io-node– Reduced set of kernel instrumentation.

• No TCP stack and schedule()

– 10K entries of ring-trace buffer– Using PVFS2

(Note: Trace of 64 compute-node and 2 io-node)

Page 26: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

read() @ 12:678 sec

write() @ 3:283 sec

TAU Trace

Page 27: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

sys_open() @ 53:1 sys_read() @1:05:545sys_write() @ 56:6

sys_open() @ 53:2 sys_write() @ 56:85 sys_read() @ 1:05:778

ciod on ionode23

ciod on ionode47

pvfs2-client on ionode23

pvfs2-client on ionode47

Page 28: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

Exp 3: Analyze system-wide performance

• Set up:– 2 runs of IOTEST with TAU instrumentation on 32

compute nodes• NFS• PVFS

– Running ZeptoOS-1.2 with KTAU on 1 io-node– Analyzing both profile and trace data

Page 29: K     T     A     U Kernel Tuning and Analysis Utilities

University of Oregon Performance Research Lab

write() @ 39:00 read() @ 47.804

write() @ 42:99 read() @ 54:61

pvfs2-client

ciod

rpciod

ciod