A22 Introduction to DTrace by Kyle Hailey
-
Upload
insight-technology-inc -
Category
Technology
-
view
1.060 -
download
2
Transcript of A22 Introduction to DTrace by Kyle Hailey
DTrace Introduction Kyle Hailey
Agenda 1. Intro … Me … Delphix 2. What is DTrace 3. Why DTrace
– Make the Impossible be possible – Low overhead
4. Where DTrace can be used 5. How DTrace is used
– Probes – Overhead – Variables – Resources
Kyle Hailey • OEM 10g Performance Monitoring • Visual SQL Tuning (VST) in DB Optimizer
• Delphix
Delphix
25 TB
2 TB
What is DTrace • Way of tracing O/S and Programs
– Making the impossible possible
• Your code unchanged – Optional add static DTrace probes
• No overhead when off – Turning on dynamically changes code path
• Low overhead when on – 1000s of events per second cause less 1% overhead
• Event Driven – Like event 10046, 10053
Shouting at Disks
Where can we trace • Solaris • OpenSolaris • FreeBSD … • MacOS • Linux – announced from Oracle • AIX – working “probevue”
What can we trace? Almost anything
– All system calls “read” – All kernel calls “biodone” – All function calls in a program – All DTrace stable providers
• Example : io:::start • Predefined stable probes • Non-stable Probe names and arguments can change
over time – Custom probes
• Write custom probes in programs to trace
Structure
$ cat mydtrace.d #!/usr/sbin/dtrace -s
Name_of_something_to_trace / filters / { actions }
# additional tracing Something_else_to_trace /optional filters / { take some actions }
(called a probe)
Section1 : •Probe •Filter •Clause
Section 2
Event Driven • DTrace Code run when probes fire in OS
/usr/sbin/dtrace -n ' #pragma D option quiet io:::start { printf(" timestamp %d ¥n",timestamp); }'
• Program runs until canceled $ sudo ./mydtrace.d timestamp 8135515300287183
timestamp 8135515300328512
timestamp 8135515300346769
^C
Probe (multi-threaded, process) when this happens then:
Take action Print variable
What are these What are these probes and variables:?
io:::start { printf(" timestamp %d ¥n",timestamp); }'
– Probes • kernel and system calls • program function calls • predefined by DTrace
– Variables • Variables are either predefined in DTrace like timestamp • defined by user
Probe
Variable
How to list Probes? Two ways to list probes 1. All System and kernel calls
dtrace –l
2. All Process functions dtrace –l pid[pid]
Output will have 4 part name, colon separated Provider:module:function:name
Kernel vs User Space
dtrace –l Kernel Functions
dtrace –l System Calls
User Processes
899 731 21
$ dtrace –l pid21
User Land
$ dtrace –l
dtrace -l
$ sudo dtrace –l
ID PROVIDER MODULE FUNCTION NAME
1 dtrace BEGIN
2 dtrace END
3 dtrace ERROR
16 profile tick-1sec
17 fbt klmops lm_find_sysid entry
18 fbt klmops lm_find_sysid return
19 fbt klmops gister_share_locally entry
…
Thousands of lines .
Provider Module Function Name
dtrace –l : grouping probes
Provider:module:function:name $ sudo dtrace -l | awk '{print $2 }' | sort | uniq -c | sort -nr
Count provider area 72095 fbt – kernel functions 1283 sdt - system calls 629 mib - system statistics 473 hotspot_jni, hotspot – JVM 466 syscall – system calls 173 nfsv4,nfsv3,tcp,udp,ip – network 61 sysinfo – kernel statistics 55 sched – CPU, io, scheduling 46 fsinfo - file system info 41 vminfo - memory 40 iscsi,fc - iscsi,fibre channel 22 lockstat - locks 15 proc - fork, exit , create 14 profile - timers tick 12 io - io:::start, done 3 dtrace - BEGIN, END, ERROR
Providers:defined interfaces Instead of tracing a kernel function, which could change between O/S
versions, trace a maintained, stable probe
https://wikis.oracle.com/display/DTrace/Providers – I/O io Provider – CPU sched Provider – system calls syscall Provider – memory vminfo Provider – user processes pid Provider – network tcp Provider
Provider definition files in /usr/lib/dtrace, such as io.d, nfs.d, sched.d, tcp.d
Example Network: TCP What if we wanted to look for TCP transmissions for receive ?
Probes have 4 part name Provider:module:function:name
$ dtrace –l | grep tcp | grep receive tcp:ip:tcp_input_data:receive
Or look at wiki https://wikis.oracle.com/display/DTrace/tcp+Provider
Probe arguments: dtrace –lnv What are the arguments for the probe function “tcp:ip:tcp_input_data:receive”
$ dtrace -lvn tcp:ip:tcp_input_data:receive ID PROVIDER MODULE FUNCTION NAME 7301 tcp ip tcp_input_data receive
Argument Types args[0]: pktinfo_t * args[1]: csinfo_t * args[2]: ipinfo_t * args[3]: tcpsinfo_t * args[4]: tcpinfo_t *
What is “tcpsinfo_t ” for example ?
Probe Argument definitions Find out what “tcpsinfo_t ” is
Two ways: 1. Stable Provider
– https://wikis.oracle.com/display/DTrace/Providers – In our case there is a TCP stable provider
https://wikis.oracle.com/display/DTrace/tcp+Provider
2. Look at source code – For OpenSolaris see: http://scr.illumos.org – Otherwise get a copy of the source
• Load into Eclipse or similar for easy search
Let’s look up “tcpsinfo_t ”
src.illumos.org
example string tcps_raddr = Remote machines IP address
tcpsinfo_t - points to many things
Creating a Program • Find out all the machines we are receiving TCP packets from
$ sudo ./tcpreceive.d address 127.0.0.1 address 172.16.103.58 address 127.0.0.1 address 172.16.100.187 address 172.16.103.58 address 127.0.0.1 ^C
$ cat tcpreceive.d #!/usr/sbin/dtrace -s #pragma D option quiet tcp:ip:tcp_input_data:receive { printf(" address %s ¥n", args[3]->tcps_raddr ); }
args[3]: tcpsinfo_t *
When TCP receive Print remote address
probe action
Using for TCP Window sizes
ip usend ssz send recd 172.16.103.58 564 16028 564 ¥ 172.16.103.58 696 16208 132 ¥ 172.16.103.58 1180 16208 484 ¥ 172.16.103.58 1664 16208 484 ¥ 172.16.103.58 2148 16208 484 ¥ 172.16.103.58 2148 16208 / 0 172.16.103.58 1452 16208 / 0
Remote Machine
Unacknowledged Bytes Sent
Send Window Bytes
Send Bytes
Receive Bytes
If unacknowleged bytes sent goes above send window then transmissions will be delayed
Review so far • DTrace – trace O/S and user programs • Solaris and partially on Linux among others • Code is event driven, structure
– probe – Include optional filter – Action
• Get all event’s with “dtrace –l” • Get event arguments with “dtrace –lnv probe” • Get argument definitions in source or wiki
Variables 1. Globals
• Not thread save X=1; A[1]=1;
2. Aggregates • Thread safe scalars and arrays • Special operations, Count, average, quantize
@ct = count() ; @sm = sum(value); @sm[type]=sum(value); @agg = quantize(value);
3. Self-> var • Thread variable, self->x = value;
4. This->var • Light weight variable for only this probe firing • this->x = value;
Variables: Aggregates are best
dtrace.org/blogs/brendan/2011/11/25/dtrace-variable-types/
What is an aggregate? • Multi CPU safe variable • Light weight • Array or scalar • Denoted by @
– @var= function(value); – @var[array_indice]=function(value);
• Functions pre-defined only, such as – sum() – count() – max() – quantize()***
• Print out with “printa”
Using Aggregates: count()
syscall::write:entry { @counts[execname] = count(); } expr 72 sh 291 tee 814 make.bin 2010
https://wikis.oracle.com/display/DTrace/Aggregations
Count of occurrences doing writes execname = session
What program writes the most often?
$ sudo dtrace -ln io::: ID PROVIDER MODULE FUNCTION NAME 6281 io genunix biodone done 6282 io genunix biowait wait-done 6283 io genunix biowait wait-start 7868 io nfs nfs_bio done 7871 io nfs nfs_bio start
Aggregate: quantize()
Alternately Limit output to specific probes with “-ln” flag:
Get distribution of all I/O sizes
$ sudo dtrace -l | grep io
If the following returns too many rows
Aggregate : quantize() What if we wanted a distribution of all I/O sizes?
$ sudo dtrace -ln io::: ID PROVIDER MODULE FUNCTION NAME 6281 io genunix biodone done 6282 io genunix biowait wait-done 6283 io genunix biowait wait-start 7868 io nfs nfs_bio done 7871 io nfs nfs_bio start
NFS module
bio = block I/O
$ sudo dtrace -lvn io:genunix:biodone:done ID PROVIDER MODULE FUNCTION NAME 6281 io genunix biodone done Argument Types args[0]: bufinfo_t * args[1]: devinfo_t * args[2]: fileinfo_t
What is bufinfo_t? Sounds like Buffer information
Finding what bufinfo_t points to
bufinfo_t arguments $ sudo dtrace -lvn io:genunix:biodone:done
ID PROVIDER MODULE FUNCTION NAME 6281 io genunix biodone done Argument Types args[0]: bufinfo_t * args[1]: devinfo_t * args[2]: fileinfo_t
args[0] = bufinfo_t * bufinfo_t -> b_bcount= number of bytes Use in Dtrace args[0]->b_bcount
Aggregate Example: iosizes.d
$ sudo iosizes.d value --- Distribution -- count 256 | 0 512 |@@@@ 6 1024 |@@@@ 6 2048 |@@@@@@@@@@@@@@@@@@ 31 4096 |@@@ 5 8192 |@@@@@ 9 16384 |@@@@ 6 32768 | 0 65536 | 0 ^C
#!/usr/sbin/dtrace -s #pragma D option quiet io:::done
{ @sizes = quantize(args[0]->b_bcount); } Size of the I/O
Aggregate : iosizes.d with execname
$ sudo iosizes.d sched value --- Distribution -- count 256 | 0 512 |@@@@ 6 1024 |@@@@ 6 2048 |@@@@@@@@@@@@@@@@@@ 31 4096 |@@@ 5 8192 |@@@@@ 9 16384 |@@@@ 6 32768 | 0 ^C
#!/usr/sbin/dtrace -s #pragma D option quiet io:::done { @sizes[execname] = quantize(args[0]->b_bcount); }
Size of the I/O
Only returns I/O for sched Why?
Kernel land I/O
Kernel vs User Space
dtrace –l Kernel Functions
dtrace –l System Calls
899 731 21
User Land
I/O is in kernel done by sched
User programs make a system call “read”
• I/O is done by the kernel so only see “sched” • User I/O is done via a system call to kernel
io:::start : kernel, look for user syscall
• Look for the read system call $ sudo dtrace -l | grep syscall | grep read
5425 syscall read entry 5426 syscall read return
$ sudo dtrace -lvn syscall::read:entry ID PROVIDER MODULE FUNCTION NAME 5425 syscall read entry Argument Types None
User program system call “read”
Arg0 = fd Arg1 = *buf Arg2 = size Instead of args[2]->size Use arg2
$ sudo dtrace -lvn syscall::read:entry Argument Types None
Aggregate Example: readsizes.d
java value ------------- Distribution ------------- count 4096 | 0 8192 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 2 16384 | 0 cat value ------------- Distribution ------------- count 16384 | 0 32768 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 1 65536 | 0 sshd value ------------- Distribution ------------- count 8192 | 0 16384 |@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ 931 32768 | 0
#!/usr/sbin/dtrace -s #pragma D option quiet syscall::read:entry { @read_sizes[execname] = quantize(arg2); }
Size of the I/O
User land I/O
Built in variables • pid – process id • tid – thread id • execname • timestamp – nano-seconds • cwd – current working directory • Probes:
– probeprov – probemod – probefunc – probename
Built in variable examples
# cat exec.d #!/usr/sbin/dtrace -s syscall:::entry { @num[execname, probefunc] = count(); } dtrace:::END { printa(" %-32s %-32s %@8d¥n", @num);} # ./syscall.d dtrace: script './exec.d' matched 236 probes sleep stat64 32 vmtoolsd pollsys 37 java pollsys 72 java lwp_cond_wait 180
Program name
Function executing Records function That fires
No function name = Wild card, all matches
Execname function count
Latency Latency crucial to performance analysis.
Latency = delta = end_time – start_time
Dtrace probes have • Entry, exit • Start , done Take time at beginning and time at end and take
Latency: how long does I/O take? Latency = delta = end_time – start_time
– start_time io:::start – end_time io:::done
Array to hold each I/O start time:
• Array needs a unique key for each I/O • Key could be based on
– device = args[0]->b_edev – block = args[0]->b_blkno
Array: tm_start[device,block]=timestamp
Look these up in source
Latency
#!/usr/sbin/dtrace -s #pragma D option quiet io:::start /* device block number */ { tm_start[ args[0]->b_edev, args[0]->b_blkno] = timestamp; } io:::done / tm_start[ args[0]->b_edev, args[0]->b_blkno] / { this->delta = (timestamp - tm_start[args[0]->b_edev,args[0]->b_blkno] ); @io = quantize(this->delta); tm_start[ args[0]->b_edev, args[0]->b_blkno] = 0; }
comment
start
end
Array index
quantize Clear Timestamp Array entry
filter
Output array
Timestamp array
Nano-second
Other ways of keying start/end
1. We used a global array – tm_start[device,block]=timestamp – Probably best general way
2. Some people use arg0
– tm_start[arg0]=timestamp – Not as clear that this is valid
3. Others use
– self->start = timestamp; – This only works if the same thread that does the begin
probe is the same the does the end probe • Doesn’t work for io:::start , io:::done • Does work for nfs:::start , nfs:::done
Tracing vs Profiling Tracing • Programs run until ^C • Can print every probe • At ^C all unprinted variables are printed Profiling • Take action every X seconds • Special probe name
profile:::tick-1sec
Can profile at hz or ns, us, ms, sec
profile:::tick-1 profile:::tick-1ms
Hz ms
Latency: output every second
#!/usr/sbin/dtrace -s #pragma D option quiet io:::start /* device block number */ { tm_start[ args[0]->b_edev, args[0]->b_blkno] = timestamp; } io:::done / tm_start[ args[0]->b_edev, args[0]->b_blkno] / { this->delta = (timestamp - tm_start[args[0]->b_edev,args[0]->b_blkno] ); @io = quantize(this->delta); tm_start[ args[0]->b_edev, args[0]->b_blkno] = 0; } profile:::tick-1sec { printa(@io); trunc(@io); }
start
end
Every second
clear print quantize clear
User Process Tracing
dtrace –l Kernel Functions
dtrace –l System Calls
User Processes
899 731 21
$ dtrace –l pid21
User Land
Tracing User Processes • What can you trace in Oracle
– $ ps –ef | grep oracle – Get a process id – $ dtrace –l pid[process_id] – Lists program functions
• What do these functions do? – Source code for Mysql – Guess if you are on Oracle – Some good blogs out there
Overhead User process tracing (from Brendan Gregg ) • Don't worry too much about pid provider probe cost at < 1000 events/sec. • At > 10,000 events/sec, pid provider probe cost will be noticeable. • At > 100,000 events/sec, pid provider probe cost may be painful. User process probes 2-15us typical, could be slower
Kernel and system calls are cheaper to trace • > 1,000,000 20% impact
For non CPU work loads impact may be greater • TCP tests showed 50% throughput drop at 160K events/sec
– 40K interupts/sec
Formatting data Problem : Formating data difficult in Dtrace DTrace has printf and printa (for arrays) but …
• No floating point • No “if-then-else” , no “for-loop”
– type = probename == "op-write-done" ? "W" : "R";
• No way to access index of an aggregate array (ex sum of time by sum of counts)
Solution: do formatting and calculations in perl
dtrace -n ‘ … ‘ | perl –e ‘ … ‘
Summary • Stucture
• List of Probes
• Arguments to probes
• Look up args in source code http://scr.illumos.org • Use Aggregates @ – they make DTrace easy • Google Dtrace
– Find example programs
#!/usr/sbin/dtrace -s Name_of_something_to_trace / filters / { actions }
dtrace -l
dtrace –lnv prov:mod:func:name
Resources • Oracle Wiki
– wikis.oracle.com/display/Dtrace
• DTrace book: – www.dtracebook.com
• Brendan Gregg’s Blog – dtrace.org/blogs/brendan/
• Oracle examples – alexanderanokhin.wordpress.com/2011/11/13 – andreynikolaev.wordpress.com/2010/10/28/ – blog.tanelpoder.com/2009/04/24