A22 Introduction to DTrace by Kyle Hailey

DTrace Introduction Kyle Hailey

Agenda 1. Intro … Me … Delphix 2. What is DTrace 3. Why DTrace

– Make the Impossible be possible – Low overhead

4. Where DTrace can be used 5. How DTrace is used

– Probes – Overhead – Variables – Resources

Kyle Hailey • OEM 10g Performance Monitoring • Visual SQL Tuning (VST) in DB Optimizer

• Delphix

Delphix

What is DTrace • Way of tracing O/S and Programs

– Making the impossible possible

• Your code unchanged – Optional add static DTrace probes

• No overhead when off – Turning on dynamically changes code path

• Low overhead when on – 1000s of events per second cause less 1% overhead

• Event Driven – Like event 10046, 10053

Shouting at Disks

Where can we trace • Solaris • OpenSolaris • FreeBSD … • MacOS • Linux – announced from Oracle • AIX – working “probevue”

What can we trace? Almost anything

– All system calls “read” – All kernel calls “biodone” – All function calls in a program – All DTrace stable providers

• Example : io:::start • Predefined stable probes • Non-stable Probe names and arguments can change

over time – Custom probes

• Write custom probes in programs to trace

Structure

$ cat mydtrace.d #!/usr/sbin/dtrace -s

Name_of_something_to_trace / filters / { actions }

# additional tracing Something_else_to_trace /optional filters / { take some actions }

(called a probe)

Section1 : •Probe •Filter •Clause

Section 2

Event Driven • DTrace Code run when probes fire in OS

/usr/sbin/dtrace -n ' #pragma D option quiet io:::start { printf(" timestamp %d ¥n",timestamp); }'

• Program runs until canceled $ sudo ./mydtrace.d timestamp 8135515300287183

timestamp 8135515300328512

timestamp 8135515300346769

Probe (multi-threaded, process) when this happens then:

Take action Print variable

What are these What are these probes and variables:?

io:::start { printf(" timestamp %d ¥n",timestamp); }'

– Probes • kernel and system calls • program function calls • predefined by DTrace

– Variables • Variables are either predefined in DTrace like timestamp • defined by user

Variable

How to list Probes? Two ways to list probes 1. All System and kernel calls

dtrace –l

2. All Process functions dtrace –l pid[pid]

Output will have 4 part name, colon separated Provider:module:function:name

Kernel vs User Space

dtrace –l Kernel Functions

dtrace –l System Calls

User Processes

899 731 21

$ dtrace –l pid21

User Land

$ dtrace –l

dtrace -l

$ sudo dtrace –l

ID PROVIDER MODULE FUNCTION NAME

1 dtrace BEGIN

2 dtrace END

3 dtrace ERROR

16 profile tick-1sec

17 fbt klmops lm_find_sysid entry

18 fbt klmops lm_find_sysid return

19 fbt klmops gister_share_locally entry

Thousands of lines .

Provider Module Function Name

dtrace –l : grouping probes

Provider:module:function:name $ sudo dtrace -l | awk '{print $2 }' | sort | uniq -c | sort -nr

Count provider area 72095 fbt – kernel functions 1283 sdt - system calls 629 mib - system statistics 473 hotspot_jni, hotspot – JVM 466 syscall – system calls 173 nfsv4,nfsv3,tcp,udp,ip – network 61 sysinfo – kernel statistics 55 sched – CPU, io, scheduling 46 fsinfo - file system info 41 vminfo - memory 40 iscsi,fc - iscsi,fibre channel 22 lockstat - locks 15 proc - fork, exit , create 14 profile - timers tick 12 io - io:::start, done 3 dtrace - BEGIN, END, ERROR

Providers:defined interfaces Instead of tracing a kernel function, which could change between O/S

versions, trace a maintained, stable probe

https://wikis.oracle.com/display/DTrace/Providers – I/O io Provider – CPU sched Provider – system calls syscall Provider – memory vminfo Provider – user processes pid Provider – network tcp Provider

Provider definition files in /usr/lib/dtrace, such as io.d, nfs.d, sched.d, tcp.d

Example Network: TCP What if we wanted to look for TCP transmissions for receive ?

Probes have 4 part name Provider:module:function:name

$ dtrace –l | grep tcp | grep receive tcp:ip:tcp_input_data:receive

Or look at wiki https://wikis.oracle.com/display/DTrace/tcp+Provider

Probe arguments: dtrace –lnv What are the arguments for the probe function “tcp:ip:tcp_input_data:receive”

$ dtrace -lvn tcp:ip:tcp_input_data:receive ID PROVIDER MODULE FUNCTION NAME 7301 tcp ip tcp_input_data receive

Argument Types args[0]: pktinfo_t * args[1]: csinfo_t * args[2]: ipinfo_t * args[3]: tcpsinfo_t * args[4]: tcpinfo_t *

What is “tcpsinfo_t ” for example ?

Probe Argument definitions Find out what “tcpsinfo_t ” is

Two ways: 1. Stable Provider

– https://wikis.oracle.com/display/DTrace/Providers – In our case there is a TCP stable provider

https://wikis.oracle.com/display/DTrace/tcp+Provider

2. Look at source code – For OpenSolaris see: http://scr.illumos.org – Otherwise get a copy of the source

• Load into Eclipse or similar for easy search

Let’s look up “tcpsinfo_t ”

src.illumos.org Type in variable

Click on Link

src.illumos.org

example string tcps_raddr = Remote machines IP address

tcpsinfo_t - points to many things

Creating a Program • Find out all the machines we are receiving TCP packets from

$ sudo ./tcpreceive.d address 127.0.0.1 address 172.16.103.58 address 127.0.0.1 address 172.16.100.187 address 172.16.103.58 address 127.0.0.1 ^C

$ cat tcpreceive.d #!/usr/sbin/dtrace -s #pragma D option quiet tcp:ip:tcp_input_data:receive { printf(" address %s ¥n", args[3]->tcps_raddr ); }

args[3]: tcpsinfo_t *

When TCP receive Print remote address

probe action

Using for TCP Window sizes

ip usend ssz send recd 172.16.103.58 564 16028 564 ¥ 172.16.103.58 696 16208 132 ¥ 172.16.103.58 1180 16208 484 ¥ 172.16.103.58 1664 16208 484 ¥ 172.16.103.58 2148 16208 484 ¥ 172.16.103.58 2148 16208 / 0 172.16.103.58 1452 16208 / 0

Remote Machine

Unacknowledged Bytes Sent

Send Window Bytes

Send Bytes

Receive Bytes

If unacknowleged bytes sent goes above send window then transmissions will be delayed

Review so far • DTrace – trace O/S and user programs • Solaris and partially on Linux among others • Code is event driven, structure

– probe – Include optional filter – Action

• Get all event’s with “dtrace –l” • Get event arguments with “dtrace –lnv probe” • Get argument definitions in source or wiki

Variables 1. Globals

• Not thread save X=1; A[1]=1;

2. Aggregates • Thread safe scalars and arrays • Special operations, Count, average, quantize

@ct = count() ; @sm = sum(value); @sm[type]=sum(value); @agg = quantize(value);

3. Self-> var • Thread variable, self->x = value;

4. This->var • Light weight variable for only this probe firing • this->x = value;

Variables: Aggregates are best

dtrace.org/blogs/brendan/2011/11/25/dtrace-variable-types/

What is an aggregate? • Multi CPU safe variable • Light weight • Array or scalar • Denoted by @

– @var= function(value); – @var[array_indice]=function(value);

• Functions pre-defined only, such as – sum() – count() – max() – quantize()***

• Print out with “printa”

Using Aggregates: count()

syscall::write:entry { @counts[execname] = count(); } expr 72 sh 291 tee 814 make.bin 2010

https://wikis.oracle.com/display/DTrace/Aggregations

Count of occurrences doing writes execname = session

What program writes the most often?

$ sudo dtrace -ln io::: ID PROVIDER MODULE FUNCTION NAME 6281 io genunix biodone done 6282 io genunix biowait wait-done 6283 io genunix biowait wait-start 7868 io nfs nfs_bio done 7871 io nfs nfs_bio start

Aggregate: quantize()

Alternately Limit output to specific probes with “-ln” flag:

Get distribution of all I/O sizes

$ sudo dtrace -l | grep io

If the following returns too many rows

Aggregate : quantize() What if we wanted a distribution of all I/O sizes?

$ sudo dtrace -ln io::: ID PROVIDER MODULE FUNCTION NAME 6281 io genunix biodone done 6282 io genunix biowait wait-done 6283 io genunix biowait wait-start 7868 io nfs nfs_bio done 7871 io nfs nfs_bio start

NFS module

bio = block I/O

$ sudo dtrace -lvn io:genunix:biodone:done ID PROVIDER MODULE FUNCTION NAME 6281 io genunix biodone done Argument Types args[0]: bufinfo_t * args[1]: devinfo_t * args[2]: fileinfo_t

What is bufinfo_t? Sounds like Buffer information

Finding what bufinfo_t points to

bufinfo_t arguments $ sudo dtrace -lvn io:genunix:biodone:done

ID PROVIDER MODULE FUNCTION NAME 6281 io genunix biodone done Argument Types args[0]: bufinfo_t * args[1]: devinfo_t * args[2]: fileinfo_t

args[0] = bufinfo_t * bufinfo_t -> b_bcount= number of bytes Use in Dtrace args[0]->b_bcount

Aggregate Example: iosizes.d

$ sudo iosizes.d value --- Distribution -- count 256 | 0 512 |@@@@ 6 1024 |@@@@ 6 2048 |@@@@@@@@@@@@@@@@@@ 31 4096 |@@@ 5 8192 |@@@@@ 9 16384 |@@@@ 6 32768 | 0 65536 | 0 ^C

#!/usr/sbin/dtrace -s #pragma D option quiet io:::done

{ @sizes = quantize(args[0]->b_bcount); } Size of the I/O

Aggregate : iosizes.d with execname

$ sudo iosizes.d sched value --- Distribution -- count 256 | 0 512 |@@@@ 6 1024 |@@@@ 6 2048 |@@@@@@@@@@@@@@@@@@ 31 4096 |@@@ 5 8192 |@@@@@ 9 16384 |@@@@ 6 32768 | 0 ^C

#!/usr/sbin/dtrace -s #pragma D option quiet io:::done { @sizes[execname] = quantize(args[0]->b_bcount); }

Size of the I/O

Only returns I/O for sched Why?

Kernel land I/O

Kernel vs User Space

899 731 21

User Land

I/O is in kernel done by sched

User programs make a system call “read”

• I/O is done by the kernel so only see “sched” • User I/O is done via a system call to kernel

io:::start : kernel, look for user syscall

• Look for the read system call $ sudo dtrace -l | grep syscall | grep read

5425 syscall read entry 5426 syscall read return

$ sudo dtrace -lvn syscall::read:entry ID PROVIDER MODULE FUNCTION NAME 5425 syscall read entry Argument Types None

User program system call “read”

Arg0 = fd Arg1 = *buf Arg2 = size Instead of args[2]->size Use arg2

$ sudo dtrace -lvn syscall::read:entry Argument Types None

Aggregate Example: readsizes.d

java value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2 16384 | 0 cat value ------------- Distribution ------------- count 16384 | 0 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 65536 | 0 sshd value ------------- Distribution ------------- count 8192 | 0 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 931 32768 | 0

#!/usr/sbin/dtrace -s #pragma D option quiet syscall::read:entry { @read_sizes[execname] = quantize(arg2); }

Size of the I/O

User land I/O

Built in variables • pid – process id • tid – thread id • execname • timestamp – nano-seconds • cwd – current working directory • Probes:

– probeprov – probemod – probefunc – probename

Built in variable examples

# cat exec.d #!/usr/sbin/dtrace -s syscall:::entry { @num[execname, probefunc] = count(); } dtrace:::END { printa(" %-32s %-32s %@8d¥n", @num);} # ./syscall.d dtrace: script './exec.d' matched 236 probes sleep stat64 32 vmtoolsd pollsys 37 java pollsys 72 java lwp_cond_wait 180

Program name

Function executing Records function That fires

No function name = Wild card, all matches

Execname function count

Latency Latency crucial to performance analysis.

Latency = delta = end_time – start_time

Dtrace probes have • Entry, exit • Start , done Take time at beginning and time at end and take

Latency: how long does I/O take? Latency = delta = end_time – start_time

– start_time io:::start – end_time io:::done

Array to hold each I/O start time:

• Array needs a unique key for each I/O • Key could be based on

– device = args[0]->b_edev – block = args[0]->b_blkno

Array: tm_start[device,block]=timestamp

Look these up in source

Latency

#!/usr/sbin/dtrace -s #pragma D option quiet io:::start /* device block number */ { tm_start[ args[0]->b_edev, args[0]->b_blkno] = timestamp; } io:::done / tm_start[ args[0]->b_edev, args[0]->b_blkno] / { this->delta = (timestamp - tm_start[args[0]->b_edev,args[0]->b_blkno] ); @io = quantize(this->delta); tm_start[ args[0]->b_edev, args[0]->b_blkno] = 0; }

comment

Array index

quantize Clear Timestamp Array entry

filter

Output array

Timestamp array

Nano-second

Other ways of keying start/end

1. We used a global array – tm_start[device,block]=timestamp – Probably best general way

2. Some people use arg0

– tm_start[arg0]=timestamp – Not as clear that this is valid

3. Others use

– self->start = timestamp; – This only works if the same thread that does the begin

probe is the same the does the end probe • Doesn’t work for io:::start , io:::done • Does work for nfs:::start , nfs:::done

Tracing vs Profiling Tracing • Programs run until ^C • Can print every probe • At ^C all unprinted variables are printed Profiling • Take action every X seconds • Special probe name

profile:::tick-1sec

Can profile at hz or ns, us, ms, sec

profile:::tick-1 profile:::tick-1ms

Latency: output every second

#!/usr/sbin/dtrace -s #pragma D option quiet io:::start /* device block number */ { tm_start[ args[0]->b_edev, args[0]->b_blkno] = timestamp; } io:::done / tm_start[ args[0]->b_edev, args[0]->b_blkno] / { this->delta = (timestamp - tm_start[args[0]->b_edev,args[0]->b_blkno] ); @io = quantize(this->delta); tm_start[ args[0]->b_edev, args[0]->b_blkno] = 0; } profile:::tick-1sec { printa(@io); trunc(@io); }

Every second

clear print quantize clear

User Process Tracing

User Processes

899 731 21

$ dtrace –l pid21

User Land

Tracing User Processes • What can you trace in Oracle

– $ ps –ef | grep oracle – Get a process id – $ dtrace –l pid[process_id] – Lists program functions

• What do these functions do? – Source code for Mysql – Guess if you are on Oracle – Some good blogs out there

Overhead User process tracing (from Brendan Gregg ) • Don't worry too much about pid provider probe cost at < 1000 events/sec. • At > 10,000 events/sec, pid provider probe cost will be noticeable. • At > 100,000 events/sec, pid provider probe cost may be painful. User process probes 2-15us typical, could be slower

Kernel and system calls are cheaper to trace • > 1,000,000 20% impact

For non CPU work loads impact may be greater • TCP tests showed 50% throughput drop at 160K events/sec

– 40K interupts/sec

Formatting data Problem : Formating data difficult in Dtrace DTrace has printf and printa (for arrays) but …

• No floating point • No “if-then-else” , no “for-loop”

– type = probename == "op-write-done" ? "W" : "R";

• No way to access index of an aggregate array (ex sum of time by sum of counts)

Solution: do formatting and calculations in perl

dtrace -n ‘ … ‘ | perl –e ‘ … ‘

Summary • Stucture

• List of Probes

• Arguments to probes

• Look up args in source code http://scr.illumos.org • Use Aggregates @ – they make DTrace easy • Google Dtrace

– Find example programs

#!/usr/sbin/dtrace -s Name_of_something_to_trace / filters / { actions }

dtrace -l

dtrace –lnv prov:mod:func:name

Resources • Oracle Wiki

– wikis.oracle.com/display/Dtrace

• DTrace book: – www.dtracebook.com

• Brendan Gregg’s Blog – dtrace.org/blogs/brendan/

• Oracle examples – alexanderanokhin.wordpress.com/2011/11/13 – andreynikolaev.wordpress.com/2010/10/28/ – blog.tanelpoder.com/2009/04/24

A22 Introduction to DTrace by Kyle Hailey

Technology

Transcript of A22 Introduction to DTrace by Kyle Hailey

Leveraging DTrace for Runtime Veriﬁcation

Dtrace User Guide

Dtrace Ganadero Manual

DTrace for FreeBSD - BSDCan · DTrace Terminology Probe – Is a named object which, when enabled and triggered, causes dtrace(9) to execute code dynamically added to that probe.

DTrace Workshop - Lagout system /Solaris/dtrace_workshop01... · DTrace Workshop Context-Switch presents, DTrace on Solaris 10 London, June, 2006 Brendan Gregg

OF7-96ogs.ou.edu/docs/openfile/OF7-96.pdf · 2010. 8. 25. · Rock Island Rock Island Hailey-Ola Hailey-Ola Hailey-Ola Hailey-Ola Hailey-Ola Ferguson Hailey-Ola Palmer and Hubble

DTrace - Univerzita Karlovad3s.mff.cuni.cz/teaching/crash_dump_analysis/slides/10-dtrace.pdf · Crash Dump Analysis 2014/2015 DTrace 2 DTrace Dynamic Tracing Production systems observability

DTrace User Guide - Dartmouth Computer Sciencecs.dartmouth.edu/~sergey/cs258/DTrace-User-Guide.pdf · Preface TheDTraceUserGuideisalightweightintroductiontothepowerfultracingandanalysistool

DTrace - Miracle Scotland Database Forum

Solaris DTrace, An Introduction

What DTrace is

DTrace Topics: Introduction

DTrace Boot Camp

Dtrace Tips

DTrace: Dynamic Tracing For Solaris -

DTrace€¦ · The DTrace architecture Thefollowingdiagramshowsthedi!erentcomponentsoftheDTracearchitecture,including providers,probes,theDTracekernelsoftware,andthe dtrace command.

A21, A22, A22/1

DTrace Internals: Digging into DTrace (1) - BSDCan 2018 · What is DTrace? Safe Dynamic Trace-ing of production systems D language drives instrumentation and reporting Inspired by

DTrace Quick Start Guide

What Is DTrace™? DTRACE BACKGROUND - Black Hat · PDF fileDTRACE BACKGROUND What Is DTrace™? *Dtrace was created by Sun Microsystems, Inc. and released under the Common Development