A22 Introduction to DTrace by Kyle Hailey

52
DTrace Introduction Kyle Hailey

Transcript of A22 Introduction to DTrace by Kyle Hailey

Page 1: A22 Introduction to DTrace by Kyle Hailey

DTrace Introduction Kyle Hailey

Page 2: A22 Introduction to DTrace by Kyle Hailey

Agenda 1. Intro … Me … Delphix 2. What is DTrace 3. Why DTrace

– Make the Impossible be possible – Low overhead

4. Where DTrace can be used 5. How DTrace is used

– Probes – Overhead – Variables – Resources

Page 3: A22 Introduction to DTrace by Kyle Hailey

Kyle Hailey • OEM 10g Performance Monitoring • Visual SQL Tuning (VST) in DB Optimizer

• Delphix

Page 4: A22 Introduction to DTrace by Kyle Hailey

Delphix

25 TB

2 TB

Page 5: A22 Introduction to DTrace by Kyle Hailey

What is DTrace • Way of tracing O/S and Programs

– Making the impossible possible

• Your code unchanged – Optional add static DTrace probes

• No overhead when off – Turning on dynamically changes code path

• Low overhead when on – 1000s of events per second cause less 1% overhead

• Event Driven – Like event 10046, 10053

Page 6: A22 Introduction to DTrace by Kyle Hailey

Shouting at Disks

Page 7: A22 Introduction to DTrace by Kyle Hailey

Where can we trace • Solaris • OpenSolaris • FreeBSD … • MacOS • Linux – announced from Oracle • AIX – working “probevue”

Page 8: A22 Introduction to DTrace by Kyle Hailey

What can we trace? Almost anything

– All system calls “read” – All kernel calls “biodone” – All function calls in a program – All DTrace stable providers

• Example : io:::start • Predefined stable probes • Non-stable Probe names and arguments can change

over time – Custom probes

• Write custom probes in programs to trace

Page 9: A22 Introduction to DTrace by Kyle Hailey

Structure

$ cat mydtrace.d #!/usr/sbin/dtrace -s

Name_of_something_to_trace / filters / { actions }

# additional tracing Something_else_to_trace /optional filters / { take some actions }

(called a probe)

Section1 : •Probe •Filter •Clause

Section 2

Page 10: A22 Introduction to DTrace by Kyle Hailey

Event Driven • DTrace Code run when probes fire in OS

/usr/sbin/dtrace -n ' #pragma D option quiet io:::start { printf(" timestamp %d ¥n",timestamp); }'

• Program runs until canceled $ sudo ./mydtrace.d timestamp 8135515300287183

timestamp 8135515300328512

timestamp 8135515300346769

^C

Probe (multi-threaded, process) when this happens then:

Take action Print variable

Page 11: A22 Introduction to DTrace by Kyle Hailey

What are these What are these probes and variables:?

io:::start { printf(" timestamp %d ¥n",timestamp); }'

– Probes • kernel and system calls • program function calls • predefined by DTrace

– Variables • Variables are either predefined in DTrace like timestamp • defined by user

Probe

Variable

Page 12: A22 Introduction to DTrace by Kyle Hailey

How to list Probes? Two ways to list probes 1. All System and kernel calls

dtrace –l

2. All Process functions dtrace –l pid[pid]

Output will have 4 part name, colon separated Provider:module:function:name

Page 13: A22 Introduction to DTrace by Kyle Hailey

Kernel vs User Space

dtrace –l Kernel Functions

dtrace –l System Calls

User Processes

899 731 21

$ dtrace –l pid21

User Land

$ dtrace –l

Page 14: A22 Introduction to DTrace by Kyle Hailey

dtrace -l

$ sudo dtrace –l

ID PROVIDER MODULE FUNCTION NAME

1 dtrace BEGIN

2 dtrace END

3 dtrace ERROR

16 profile tick-1sec

17 fbt klmops lm_find_sysid entry

18 fbt klmops lm_find_sysid return

19 fbt klmops gister_share_locally entry

Thousands of lines .

Provider Module Function Name

Page 15: A22 Introduction to DTrace by Kyle Hailey

dtrace –l : grouping probes

Provider:module:function:name $ sudo dtrace -l | awk '{print $2 }' | sort | uniq -c | sort -nr

Count provider area 72095 fbt – kernel functions 1283 sdt - system calls 629 mib - system statistics 473 hotspot_jni, hotspot – JVM 466 syscall – system calls 173 nfsv4,nfsv3,tcp,udp,ip – network 61 sysinfo – kernel statistics 55 sched – CPU, io, scheduling 46 fsinfo - file system info 41 vminfo - memory 40 iscsi,fc - iscsi,fibre channel 22 lockstat - locks 15 proc - fork, exit , create 14 profile - timers tick 12 io - io:::start, done 3 dtrace - BEGIN, END, ERROR

Page 16: A22 Introduction to DTrace by Kyle Hailey

Providers:defined interfaces Instead of tracing a kernel function, which could change between O/S

versions, trace a maintained, stable probe

https://wikis.oracle.com/display/DTrace/Providers – I/O io Provider – CPU sched Provider – system calls syscall Provider – memory vminfo Provider – user processes pid Provider – network tcp Provider

Provider definition files in /usr/lib/dtrace, such as io.d, nfs.d, sched.d, tcp.d

Page 17: A22 Introduction to DTrace by Kyle Hailey

Example Network: TCP What if we wanted to look for TCP transmissions for receive ?

Probes have 4 part name Provider:module:function:name

$ dtrace –l | grep tcp | grep receive tcp:ip:tcp_input_data:receive

Or look at wiki https://wikis.oracle.com/display/DTrace/tcp+Provider

Page 18: A22 Introduction to DTrace by Kyle Hailey

Probe arguments: dtrace –lnv What are the arguments for the probe function “tcp:ip:tcp_input_data:receive”

$ dtrace -lvn tcp:ip:tcp_input_data:receive ID PROVIDER MODULE FUNCTION NAME 7301 tcp ip tcp_input_data receive

Argument Types args[0]: pktinfo_t * args[1]: csinfo_t * args[2]: ipinfo_t * args[3]: tcpsinfo_t * args[4]: tcpinfo_t *

What is “tcpsinfo_t ” for example ?

Page 19: A22 Introduction to DTrace by Kyle Hailey

Probe Argument definitions Find out what “tcpsinfo_t ” is

Two ways: 1. Stable Provider

– https://wikis.oracle.com/display/DTrace/Providers – In our case there is a TCP stable provider

https://wikis.oracle.com/display/DTrace/tcp+Provider

2. Look at source code – For OpenSolaris see: http://scr.illumos.org – Otherwise get a copy of the source

• Load into Eclipse or similar for easy search

Let’s look up “tcpsinfo_t ”

Page 20: A22 Introduction to DTrace by Kyle Hailey

src.illumos.org Type in variable

Click on Link

Page 21: A22 Introduction to DTrace by Kyle Hailey

src.illumos.org

example string tcps_raddr = Remote machines IP address

tcpsinfo_t - points to many things

Page 22: A22 Introduction to DTrace by Kyle Hailey

Creating a Program • Find out all the machines we are receiving TCP packets from

$ sudo ./tcpreceive.d address 127.0.0.1 address 172.16.103.58 address 127.0.0.1 address 172.16.100.187 address 172.16.103.58 address 127.0.0.1 ^C

$ cat tcpreceive.d #!/usr/sbin/dtrace -s #pragma D option quiet tcp:ip:tcp_input_data:receive { printf(" address %s ¥n", args[3]->tcps_raddr ); }

args[3]: tcpsinfo_t *

When TCP receive Print remote address

probe action

Page 23: A22 Introduction to DTrace by Kyle Hailey

Using for TCP Window sizes

ip usend ssz send recd 172.16.103.58 564 16028 564 ¥ 172.16.103.58 696 16208 132 ¥ 172.16.103.58 1180 16208 484 ¥ 172.16.103.58 1664 16208 484 ¥ 172.16.103.58 2148 16208 484 ¥ 172.16.103.58 2148 16208 / 0 172.16.103.58 1452 16208 / 0

Remote Machine

Unacknowledged Bytes Sent

Send Window Bytes

Send Bytes

Receive Bytes

If unacknowleged bytes sent goes above send window then transmissions will be delayed

Page 24: A22 Introduction to DTrace by Kyle Hailey

Review so far • DTrace – trace O/S and user programs • Solaris and partially on Linux among others • Code is event driven, structure

– probe – Include optional filter – Action

• Get all event’s with “dtrace –l” • Get event arguments with “dtrace –lnv probe” • Get argument definitions in source or wiki

Page 25: A22 Introduction to DTrace by Kyle Hailey

Variables 1. Globals

• Not thread save X=1; A[1]=1;

2. Aggregates • Thread safe scalars and arrays • Special operations, Count, average, quantize

@ct = count() ; @sm = sum(value); @sm[type]=sum(value); @agg = quantize(value);

3. Self-> var • Thread variable, self->x = value;

4. This->var • Light weight variable for only this probe firing • this->x = value;

Page 26: A22 Introduction to DTrace by Kyle Hailey

Variables: Aggregates are best

dtrace.org/blogs/brendan/2011/11/25/dtrace-variable-types/

Page 27: A22 Introduction to DTrace by Kyle Hailey

What is an aggregate? • Multi CPU safe variable • Light weight • Array or scalar • Denoted by @

– @var= function(value); – @var[array_indice]=function(value);

• Functions pre-defined only, such as – sum() – count() – max() – quantize()***

• Print out with “printa”

Page 28: A22 Introduction to DTrace by Kyle Hailey

Using Aggregates: count()

syscall::write:entry { @counts[execname] = count(); } expr 72 sh 291 tee 814 make.bin 2010

https://wikis.oracle.com/display/DTrace/Aggregations

Count of occurrences doing writes execname = session

What program writes the most often?

Page 29: A22 Introduction to DTrace by Kyle Hailey

$ sudo dtrace -ln io::: ID PROVIDER MODULE FUNCTION NAME 6281 io genunix biodone done 6282 io genunix biowait wait-done 6283 io genunix biowait wait-start 7868 io nfs nfs_bio done 7871 io nfs nfs_bio start

Aggregate: quantize()

Alternately Limit output to specific probes with “-ln” flag:

Get distribution of all I/O sizes

$ sudo dtrace -l | grep io

If the following returns too many rows

Page 30: A22 Introduction to DTrace by Kyle Hailey

Aggregate : quantize() What if we wanted a distribution of all I/O sizes?

$ sudo dtrace -ln io::: ID PROVIDER MODULE FUNCTION NAME 6281 io genunix biodone done 6282 io genunix biowait wait-done 6283 io genunix biowait wait-start 7868 io nfs nfs_bio done 7871 io nfs nfs_bio start

NFS module

bio = block I/O

$ sudo dtrace -lvn io:genunix:biodone:done ID PROVIDER MODULE FUNCTION NAME 6281 io genunix biodone done Argument Types args[0]: bufinfo_t * args[1]: devinfo_t * args[2]: fileinfo_t

What is bufinfo_t? Sounds like Buffer information

Page 31: A22 Introduction to DTrace by Kyle Hailey

Finding what bufinfo_t points to

Page 32: A22 Introduction to DTrace by Kyle Hailey

bufinfo_t arguments $ sudo dtrace -lvn io:genunix:biodone:done

ID PROVIDER MODULE FUNCTION NAME 6281 io genunix biodone done Argument Types args[0]: bufinfo_t * args[1]: devinfo_t * args[2]: fileinfo_t

args[0] = bufinfo_t * bufinfo_t -> b_bcount= number of bytes Use in Dtrace args[0]->b_bcount

Page 33: A22 Introduction to DTrace by Kyle Hailey

Aggregate Example: iosizes.d

$ sudo iosizes.d value --- Distribution -- count 256 | 0 512 |@@@@ 6 1024 |@@@@ 6 2048 |@@@@@@@@@@@@@@@@@@ 31 4096 |@@@ 5 8192 |@@@@@ 9 16384 |@@@@ 6 32768 | 0 65536 | 0 ^C

#!/usr/sbin/dtrace -s #pragma D option quiet io:::done

{ @sizes = quantize(args[0]->b_bcount); } Size of the I/O

Page 34: A22 Introduction to DTrace by Kyle Hailey

Aggregate : iosizes.d with execname

$ sudo iosizes.d sched value --- Distribution -- count 256 | 0 512 |@@@@ 6 1024 |@@@@ 6 2048 |@@@@@@@@@@@@@@@@@@ 31 4096 |@@@ 5 8192 |@@@@@ 9 16384 |@@@@ 6 32768 | 0 ^C

#!/usr/sbin/dtrace -s #pragma D option quiet io:::done { @sizes[execname] = quantize(args[0]->b_bcount); }

Size of the I/O

Only returns I/O for sched Why?

Kernel land I/O

Page 35: A22 Introduction to DTrace by Kyle Hailey

Kernel vs User Space

dtrace –l Kernel Functions

dtrace –l System Calls

899 731 21

User Land

I/O is in kernel done by sched

User programs make a system call “read”

• I/O is done by the kernel so only see “sched” • User I/O is done via a system call to kernel

Page 36: A22 Introduction to DTrace by Kyle Hailey

io:::start : kernel, look for user syscall

• Look for the read system call $ sudo dtrace -l | grep syscall | grep read

5425 syscall read entry 5426 syscall read return

$ sudo dtrace -lvn syscall::read:entry ID PROVIDER MODULE FUNCTION NAME 5425 syscall read entry Argument Types None

Page 37: A22 Introduction to DTrace by Kyle Hailey

User program system call “read”

Arg0 = fd Arg1 = *buf Arg2 = size Instead of args[2]->size Use arg2

$ sudo dtrace -lvn syscall::read:entry Argument Types None

Page 38: A22 Introduction to DTrace by Kyle Hailey

Aggregate Example: readsizes.d

java value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2 16384 | 0 cat value ------------- Distribution ------------- count 16384 | 0 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 65536 | 0 sshd value ------------- Distribution ------------- count 8192 | 0 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 931 32768 | 0

#!/usr/sbin/dtrace -s #pragma D option quiet syscall::read:entry { @read_sizes[execname] = quantize(arg2); }

Size of the I/O

User land I/O

Page 39: A22 Introduction to DTrace by Kyle Hailey

Built in variables • pid – process id • tid – thread id • execname • timestamp – nano-seconds • cwd – current working directory • Probes:

– probeprov – probemod – probefunc – probename

Page 40: A22 Introduction to DTrace by Kyle Hailey

Built in variable examples

# cat exec.d #!/usr/sbin/dtrace -s syscall:::entry { @num[execname, probefunc] = count(); } dtrace:::END { printa(" %-32s %-32s %@8d¥n", @num);} # ./syscall.d dtrace: script './exec.d' matched 236 probes sleep stat64 32 vmtoolsd pollsys 37 java pollsys 72 java lwp_cond_wait 180

Program name

Function executing Records function That fires

No function name = Wild card, all matches

Execname function count

Page 41: A22 Introduction to DTrace by Kyle Hailey

Latency Latency crucial to performance analysis.

Latency = delta = end_time – start_time

Dtrace probes have • Entry, exit • Start , done Take time at beginning and time at end and take

Page 42: A22 Introduction to DTrace by Kyle Hailey

Latency: how long does I/O take? Latency = delta = end_time – start_time

– start_time io:::start – end_time io:::done

Array to hold each I/O start time:

• Array needs a unique key for each I/O • Key could be based on

– device = args[0]->b_edev – block = args[0]->b_blkno

Array: tm_start[device,block]=timestamp

Look these up in source

Page 43: A22 Introduction to DTrace by Kyle Hailey

Latency

#!/usr/sbin/dtrace -s #pragma D option quiet io:::start /* device block number */ { tm_start[ args[0]->b_edev, args[0]->b_blkno] = timestamp; } io:::done / tm_start[ args[0]->b_edev, args[0]->b_blkno] / { this->delta = (timestamp - tm_start[args[0]->b_edev,args[0]->b_blkno] ); @io = quantize(this->delta); tm_start[ args[0]->b_edev, args[0]->b_blkno] = 0; }

comment

start

end

Array index

quantize Clear Timestamp Array entry

filter

Output array

Timestamp array

Nano-second

Page 44: A22 Introduction to DTrace by Kyle Hailey

Other ways of keying start/end

1. We used a global array – tm_start[device,block]=timestamp – Probably best general way

2. Some people use arg0

– tm_start[arg0]=timestamp – Not as clear that this is valid

3. Others use

– self->start = timestamp; – This only works if the same thread that does the begin

probe is the same the does the end probe • Doesn’t work for io:::start , io:::done • Does work for nfs:::start , nfs:::done

Page 45: A22 Introduction to DTrace by Kyle Hailey

Tracing vs Profiling Tracing • Programs run until ^C • Can print every probe • At ^C all unprinted variables are printed Profiling • Take action every X seconds • Special probe name

profile:::tick-1sec

Can profile at hz or ns, us, ms, sec

profile:::tick-1 profile:::tick-1ms

Hz ms

Page 46: A22 Introduction to DTrace by Kyle Hailey

Latency: output every second

#!/usr/sbin/dtrace -s #pragma D option quiet io:::start /* device block number */ { tm_start[ args[0]->b_edev, args[0]->b_blkno] = timestamp; } io:::done / tm_start[ args[0]->b_edev, args[0]->b_blkno] / { this->delta = (timestamp - tm_start[args[0]->b_edev,args[0]->b_blkno] ); @io = quantize(this->delta); tm_start[ args[0]->b_edev, args[0]->b_blkno] = 0; } profile:::tick-1sec { printa(@io); trunc(@io); }

start

end

Every second

clear print quantize clear

Page 47: A22 Introduction to DTrace by Kyle Hailey

User Process Tracing

dtrace –l Kernel Functions

dtrace –l System Calls

User Processes

899 731 21

$ dtrace –l pid21

User Land

Page 48: A22 Introduction to DTrace by Kyle Hailey

Tracing User Processes • What can you trace in Oracle

– $ ps –ef | grep oracle – Get a process id – $ dtrace –l pid[process_id] – Lists program functions

• What do these functions do? – Source code for Mysql – Guess if you are on Oracle – Some good blogs out there

Page 49: A22 Introduction to DTrace by Kyle Hailey

Overhead User process tracing (from Brendan Gregg ) • Don't worry too much about pid provider probe cost at < 1000 events/sec. • At > 10,000 events/sec, pid provider probe cost will be noticeable. • At > 100,000 events/sec, pid provider probe cost may be painful. User process probes 2-15us typical, could be slower

Kernel and system calls are cheaper to trace • > 1,000,000 20% impact

For non CPU work loads impact may be greater • TCP tests showed 50% throughput drop at 160K events/sec

– 40K interupts/sec

Page 50: A22 Introduction to DTrace by Kyle Hailey

Formatting data Problem : Formating data difficult in Dtrace DTrace has printf and printa (for arrays) but …

• No floating point • No “if-then-else” , no “for-loop”

– type = probename == "op-write-done" ? "W" : "R";

• No way to access index of an aggregate array (ex sum of time by sum of counts)

Solution: do formatting and calculations in perl

dtrace -n ‘ … ‘ | perl –e ‘ … ‘

Page 51: A22 Introduction to DTrace by Kyle Hailey

Summary • Stucture

• List of Probes

• Arguments to probes

• Look up args in source code http://scr.illumos.org • Use Aggregates @ – they make DTrace easy • Google Dtrace

– Find example programs

#!/usr/sbin/dtrace -s Name_of_something_to_trace / filters / { actions }

dtrace -l

dtrace –lnv prov:mod:func:name

Page 52: A22 Introduction to DTrace by Kyle Hailey

Resources • Oracle Wiki

– wikis.oracle.com/display/Dtrace

• DTrace book: – www.dtracebook.com

• Brendan Gregg’s Blog – dtrace.org/blogs/brendan/

• Oracle examples – alexanderanokhin.wordpress.com/2011/11/13 – andreynikolaev.wordpress.com/2010/10/28/ – blog.tanelpoder.com/2009/04/24