8/4/2019 Solaris10-DTraceContainers
1/62
Solaris 10Solaris 10DTrace &DTrace &ContainersContainers
Sang ShinTechnology Architect
Sun Microsystems, [email protected]://www.javapassion.com
8/4/2019 Solaris10-DTraceContainers
2/62
2
Agenda
Why DTrace?
What is DTrace?
How does DTrace work?
D - the language
DTrace Providers
Why Solaris Containers?
Solaris Zones
Resources
8/4/2019 Solaris10-DTraceContainers
3/62
Why DTrace?Why DTrace?
8/4/2019 Solaris10-DTraceContainers
4/62
4
Methods of Debugging Before DTrace
Reproduce problem outside of productionenvironment
> Not easy & expensive
Take crash dump or snapshot of the system
> Not easy to debug a transient problem with snapshots
> Too much irrelevant information
Use tools like truss orpstack> Causes too much overhead for production system> Per process tools hard to debug systemic issues
> How do you find out who killed my process with truss?
8/4/2019 Solaris10-DTraceContainers
5/62
5
Methods of Debugging Before DTrace
Custom instrumented application or kernel> Too heavy weight for production> Takes too many iteration to get to the root cause
> Huge QA cost
> Expensive production interruptions
> Very invasive
8/4/2019 Solaris10-DTraceContainers
6/62
6
None of These Methods Can Be Used
in Production System Effectively We need debugging and analyzing tool that we can
use in production system>
without bringing it down> without slowing it down
8/4/2019 Solaris10-DTraceContainers
7/627
Solution: A Dynamically
Instrumentable System via DTrace Used in live production system!
> No recompilation, no separate testing system
Have enough instrumentation to permit collectingany arbitrary data> Systemic view
Permit dynamically turning on/off instrumentation> Minimal overhead when turned on
> No overhead when turned off
Ensures safety
> Should not crash the system it is observing
8/4/2019 Solaris10-DTraceContainers
8/628
Things You Do With DTrace: DebugTransient Problems in Production System
Who sent a kill signal to my process?
Why thread x gets preempted when it should not?
Why is my application not scaling up above 30,000users in production system?
Why are there so many threads in run queue when theCPU is idle?
Where is the bottleneck in I/O?
Many previously dark-corners of the system andapplication behaviors
8/4/2019 Solaris10-DTraceContainers
9/62
What is DTrace?What is DTrace?Debugging Tool forDebugging Tool for
Production SystemProduction System
8/4/2019 Solaris10-DTraceContainers
10/6210
What is DTrace?
Over 30K probes built into Solaris 10#dtrace -l | wc -l
38485
Can create more probes on the fly New powerful, dynamically interpreted language (D)
to instantiate probes
> Scripts can capture typical debugging scenarios Probes are light weight and low overhead
> No overhead if probe not enabled
Safe to use on live and production system
8/4/2019 Solaris10-DTraceContainers
11/62
How DoesHow DoesDTrace Work?DTrace Work?
8/4/2019 Solaris10-DTraceContainers
12/6212
DTrace Architecture
libDTrace(3LIB)
DTrace(7D)
DTrace
userland
kernel
DTrace(1M)lockstat(1M)
plockstat(1M)
script.d
DTraceconsumers
sysinfo vminfo fasttrap
sdtsyscall fbtproc
DTraceproviders
8/4/2019 Solaris10-DTraceContainers
13/6213
How DTrace Works
dtrace command compiles the D language script intocompiled code
The compiled code is checked for safety (like Java)
The compiled code is executed in the kernel by DTrace
DTrace instructs the provider to enable the probes
As soon as the D program exits all instrumentation removed
No limit (except system resources) on number of D scriptsthat can be run simultaneously
Different users can debug the system simultaneously
without causing data corruption or collision issues
8/4/2019 Solaris10-DTraceContainers
14/6214
Probe
Probes are points of instrumentation
Probes are made available by Providers
Probes identify the module and function that theyinstruments
Each probe has a name
These four attributes define a tuple that uniquelyidentifies each probeprovider:module:function:name
Example
syscall::open:entry
8/4/2019 Solaris10-DTraceContainers
15/6215
Probe
Probes can be listed with the -l option todtrace command> in a specific function with -f function
> in a specific module with -m module> with a specific name with -n name
> from a specific provider with -P provider
Empty components match all possibleprobes
Wild card can be used in naming probe>syscall::open*:entry
8/4/2019 Solaris10-DTraceContainers
16/62
D-Language:D-Language:Quick OverviewQuick Overview
8/4/2019 Solaris10-DTraceContainers
17/62
17
D Language: Format
probe description/ predicate /{
action statements}
When a probe fires then action is executed ifpredicate evaluates true
Print all the system calls executed by ksh
syscalls.d
#!/usr/sbin/dtrace -ssyscall:::entry
/execname==ksh/{
printf(%s called\n,probefunc);
}
8/4/2019 Solaris10-DTraceContainers
18/62
18
Example: Enable the probes BEGIN & END
#dtrace -n BEGIN -n ENDdtrace: description BEGIN matched 1 probedtrace: description END matched 1 probeCPU ID FUNCTION:NAME0 1 :BEGIN
^C0 2 :END#
Output
> the probes that were enabled> CPU in which probe was executed
> ID of probe
> Function & name of the fired probe
8/4/2019 Solaris10-DTraceContainers
19/62
19
Hello World in D
#!/usr/sbin/dtrace -sBEGIN{
printf(Hello World\n);exit(0);
}
END{
printf(Goodbye Cruel World\n);}
First line similar to any shell script
Note the -s : Following lines are a script
The script in plain English> When the BEGIN probe is fired print hello world and exit.
hello.d
8/4/2019 Solaris10-DTraceContainers
20/62
20
Predicates
A predicate is a D expression
Actions will only be executed if the predicateexpression evaluates to true
A predicate takes the form /expression/ and is placedbetween the probe description and the action
Print the pid of every ls process that is started
#!/usr/sbin/dtrace -s
proc:::exec-success/execname == "ls"/{
/* actions */
}
pred.d
8/4/2019 Solaris10-DTraceContainers
21/62
21
Actions
Actions are executed when a probe fires
Actions are completely programmable
Most actions record some specified state in thesystem
Probes may provide parameters than can be used
in the actions
8/4/2019 Solaris10-DTraceContainers
22/62
D-Language:D-Language:AggregationAggregation
8/4/2019 Solaris10-DTraceContainers
23/62
23
Aggregation
Think of a case when you want to know the totaltime the system spends in a function.> We can save the amount of time spent by the function
every time it is called and then add the total.> If the function was called 1000 times that is 1000 pieces of info
stored in the buffer just for us to finally add to get the total.
> Instead if we just keep a running total then it is just one
piece of info that is stored in the buffer.> We can use the same concept when we want average,
count, min ormax.
Aggregation is a D construct for this purpose.
8/4/2019 Solaris10-DTraceContainers
24/62
24
Aggregation - Format
@name[keys] = aggfunc(args);
'@' - key to show that name is an aggregation.
keys comma separated list of D expressions. aggfunc could be one of...
> sum(expr) total value of specified expression
> count() number of times called.> avg(expr) average of expression
> min(expr)/max(expr) min and max of expressions
> quantize()/lquantize() - power of two & linear distribution
8/4/2019 Solaris10-DTraceContainers
25/62
25
Aggregation Example 1.
#!/usr/bin/dtrace -ssysinfo:::pswitch{
@[execname] = count();
}
aggr.d
bash-3.00$ ./aggr.ddtrace: script './aggr.d' matched 3 probes^C
soffice.bin 1dtrace 2
java 4
sched 9
8/4/2019 Solaris10-DTraceContainers
26/62
26
Aggregation Example 2.#!/usr/sbin/dtrace -spid$target:libc:malloc:entry{
@["Malloc Distribution"]=quantize(arg0);}
$ aggr2.d -c whodtrace: script './aggr2.d' matched 1 probe
...dtrace: pid 6906 has exited
Malloc Distributionvalue ------------- Distribution --------------------------------------------------count
1 | 02 |@@@@@@@@@@@@@@@@@ 34 | 08 |@@@@@@ 1
16 |@@@@@@ 132 | 064 | 0
128 | 0256 | 0512 | 0
1024 | 02048 | 0
4096 | 08192 |@@@@@@@@@@@ 216384 | 0
aggr2.d
8/4/2019 Solaris10-DTraceContainers
27/62
27
Calculating time spent
One of the most common request is to find timespent in a given function
Here is how this can be done#!/usr/sbin/dtrace -ssyscall::open*:entry,syscall::close*:entry{
ts=timestamp;
}
syscall::open*:return,syscall::close*:return{
timespent = timestamp - ts;printf("ThreadID %d spent %d nsecs in %s", tid, timespent, probefunc);ts=0; /*allow DTrace to reclaim the storage */
timespent = 0;}
Whats wrong with this??
8/4/2019 Solaris10-DTraceContainers
28/62
28
Thread Local Variable
self->variable = expression;> self keyword to indicate that the variable is thread local
> A boon to multi-threaded debugging
> As name indicates this is specific to the thread.> See code re-written
#!/usr/sbin/dtrace -ssyscall::open*:entry,syscall::close*:entry{
self->ts=timestamp;}
syscall::open*:return,syscall::close*:return{
timespent = timestamp - self->ts;printf("ThreadID %d spent %d nsecs in %s", tid, timespent, probefunc);self->ts=0; /*allow DTrace to reclaim the storage */timespent = 0;
}
8/4/2019 Solaris10-DTraceContainers
29/62
29
Built-in Variable
Here are a few built-in variables.
arg0 ... arg9 Arguments represented in int64_t format
args[ ] - Arguments represented in correct type based on functioncpu current cpu id
cwd current working directory
errno error code from last system call
gid, uid real group id, user idpid, ppid, tid process id, parent proc id & thread id
probeprov, probemod, probefunc, probename - probe info
timestamp, walltimestamp, vtimestamp time stamp nano sec from
an arbitary point and nano sec from epoc
8/4/2019 Solaris10-DTraceContainers
30/62
30
External Variable
DTrace provides access to kernel & externalvariables.
To access value of external variable use `#!/usr/sbin/dtrace -qsdtrace:::BEGIN{
printf("physmem is %d\n", `physmem);printf("maxusers is %d\n", `maxusers);printf("ufs:freebehind is %d\n", ufs`freebehind);exit(0);
}
Note: ufs`freebehind indicates kernel variablefreebehind in the ufs module
These variables cannot be lvalue. They cannot be
modified from within a D Script
ext.d
8/4/2019 Solaris10-DTraceContainers
31/62
D-Language:D-Language:ProvidersProviders
8/4/2019 Solaris10-DTraceContainers
32/62
32
Providers
Providers represent a methodology for instrumentingthe system
Providers make probes available to the DTrace
framework DTrace informs providers when a probe is to be
enabled
Providers transfer control to DTrace when an enabled
probe is fired
8/4/2019 Solaris10-DTraceContainers
33/62
33
List of Providers
syscall Provider probes at entry/return of every syscall
profile Provider probe for firing at fixed intervals
lockstat Provider lock contention probes
fbt Provider function boundary tracing provider sdt Provider statically defined probes user definable probe
sysinfo Provider probe kernel stats for mpstat and sysinfo tools
vminfo Provider probe for vm kernel stats proc Provider process/LWP creation and termination probes
sched Provider probes for CPU scheduling
dtrace Provider provider probes related to DTrace itself
8/4/2019 Solaris10-DTraceContainers
34/62
34
List of Providers
io Provider provider probes related to disk IO
mib Provider probes in network layer in kernel
fpuinfo Provider probe into kernel software FPprocessing
pid Provider probe into any function or instruction in usercode.
plockstat Provider probes user level sync and lock code
8/4/2019 Solaris10-DTraceContainers
35/62
35
profile Provider
Profile providers has probes that will fire at regular intervals.
These probes are not associated with any kernel or user codeexecution
format for profile probe: profile-n> The probe will fire n times a second on every CPU.
> An optional ns or nsec (nano sec), us or usec (microsec), msec orms (milli sec), sec or s (seconds), min or m (minutes), hour or h
(hours), day or d (days) can be added to change the meaning of 'n'.
8/4/2019 Solaris10-DTraceContainers
36/62
36
profile probe - examples
Prints out frequency at which proc execute on a processor.#!/usr/sbin/dtrace -qsprofile-100{
@procs[pid, execname] = count();
}
This one tracks how the priority of process changes over time.#!/usr/sbin/dtrace -qsprofile-1001/pid == $1/{
@proc[execname]=lquantize(curlwpsinfo->pr_pri,0,100,10);}
try this with a shell that is running...
$ while true ; do i=0; done
prof.d
prio.d
8/4/2019 Solaris10-DTraceContainers
37/62
37
tick-n probe
Very similar to profile-n probe
Only difference is that the probe only fires on one
CPU. The meaning of n is similar to the profile-n probe.
8/4/2019 Solaris10-DTraceContainers
38/62
38
proc Provider
The proc Provider has probes forprocess/lwp lifecycle> create fires when a proc is created using fork and its variants
> exec fires when exec and its variants are called
> exec-failure & exec-success when exec fails or succeeds> lwp-create, lwp-start, lwp-exit lwp life cycle probes
> signal-send, signal-handle, signal-clear probes for varioussignal states
> start fires when a process starts before the first instruction isexecuted.
8/4/2019 Solaris10-DTraceContainers
39/62
39
Examples
#!/usr/sbin/dtrace -qsproc:::exec{
self->parent = execname;
}proc:::exec-success/self->parent != NULL/{
@[self->parent, execname] = count();self->parent = NULL;
}
proc:::exec-failure/self->parent != NULL/{
self->parent = NULL;}
END{
printf("%-20s %-20s %s\n", "WHO", "WHAT", "COUNT");
printa("%-20s %-20s %@d\n", @);}
proc1.d
The following script prints all the processes that arecreated. It also prints who created these process as well.
8/4/2019 Solaris10-DTraceContainers
40/62
40
More Examples
#!/usr/sbin/dtrace -qsproc:::signal-send{
@[execname, stringof(args[1]->pr_fname),args[2]] = count();}
END{
printf("%20s %20s %12s %s\n", "SENDER", "RECIPIENT", "SIG", "COUNT");printa("%20s %20s %12d %@d\n", @);
}
$ ./proc2.d^C
SENDER RECIPIENT SIG COUNTsched dtrace 2 1sched ls 2 1sched ksh 18 4sched ksh 2 5ksh ksh 2 5
ksh ksh 20 12
proc2.d
The following script prints all the signals that are sent in thesystem. It also prints who sent the signal to whom.
8/4/2019 Solaris10-DTraceContainers
41/62
41
DTrace and User Process
DTrace provides a lot of features to probe into theuser process
We will look at features in DTrace that is usefulwhen we examine user process
Some examples of using DTrace in user code willbe discussed
8/4/2019 Solaris10-DTraceContainers
42/62
42
The pid Provider
The pid Provider is extremely flexible and allowingyou to instrument any instruction in user landincluding entry and exit
pid provider creates probes on the fly when they areneeded
Used for tracing
> Function Boundaries> Any arbitrary instruction in a given function
8/4/2019 Solaris10-DTraceContainers
43/62
43
pid Function Boundary probes
The probe is constructed using the following formatpid:::
Examples:pid1234:date:main:entry
pid1122:libc:open:return
Count all libc calls made by a program
#!/usr/sbin/dtrace -spid$target:libc::entry{
@[probefunc]=count()}
pid1.d
8/4/2019 Solaris10-DTraceContainers
44/62
44
pid Instruction Level Tracing
The function offset tracing is a very powerfulmechanism.
Print code path followed by a particular func.
pid$1::$2:entry{self->trace_code = 1;printf("%x %x %x %x %x", arg0, arg1, arg2, arg3, arg4);
}pid$1:::/self->trace_code/{ }
pid$1::$2:return/self->trace_code/{
exit(0);}
Execute.
# trace_code.d 1218 printf
trace_code.d
8/4/2019 Solaris10-DTraceContainers
45/62
45
Action & Subroutines
There are a few actions and subroutines in DTracethat helps us examine user land applications> ustack(, ) - records user
process stack> nframes specifies the number of frames to record> strsize if this is specified and non 0 then the address to name
translation is done when the stack is recorded into a buffer of
strsize. This will avoid problem with address to name translation in user land when the process may have exited
For java code analysis you'd need Java 1.5 to use this ustack()functionality (see example)
8/4/2019 Solaris10-DTraceContainers
46/62
46
ustack
#!/usr/sbin/dtrace -s
syscall::write:entry
/pid == $target/
{
ustack(50,500);}
$ ./ustk.d -c "java -version"
dtrace: script './ustk.d' matched 1 probe
java version "1.5.0_01"
Java(TM) 2 Runtime Environment, Standard Edition (build 1.5.0_01-b08)
Java HotSpot(TM) Client VM (build 1.5.0_01-b08, mixed mode, sharing)
CPU ID FUNCTION:NAME
0 13 write:entry
libc.so.1`_write+0x8
libjvm.so`JVM_Write+0xb8
libjava.so`0xfe99f580
libjava.so`Java_java_io_FileOutputStream_writeBytes+0x3c
java/io/FileOutputStream.writeBytesjava/io/FileOutputStream.writeBytes
java/io/FileOutputStream.write
java/io/BufferedOutputStream.flushBuffer
java/io/BufferedOutputStream.flush
java/io/PrintStream.write
sun/nio/cs/StreamEncoder$CharsetSE.writeBytes
sun/nio/cs/StreamEncoder$CharsetSE.implFlushBuffer
sun/nio/cs/StreamEncoder.flushBuffer
java/io/OutputStreamWriter.flushBufferjava/io/PrintStream.write
java/io/PrintStream.print
java/io/PrintStream.println
sun/misc/Version.print
0xf8c05764
0xf8c00218
libjvm.so`__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_+0x
libjvm.so`jni_CallStaticVoidMethod+0x4a8
java`main+0x824java`_start+0x108
ustack1.d
8/4/2019 Solaris10-DTraceContainers
47/62
Privilege for RunningPrivilege for RunningDTraceDTrace
8/4/2019 Solaris10-DTraceContainers
48/62
48
Granting privilege to run DTrace
A system admin can grant any user privileges to run DTraceusing the Solaris Least Privilege facility privileges(5).
DTrace provides for three types of privileges.dtrace_proc - provides access to process level tracing, no kernel level
tracing allowed. (pid provider is about all they can run)dtrace_user provides access to process level and kernel level probes
but only for process to which the user has access. (ie) they can usesyscall provider but only for syscalls made by process that theyhave access.
dtrace_kernel provides all access except process access to userlevel procs that they do not have access.
Enable these priv by editing/etc/user_attr.> format user-name::::defaultpriv=basic,privileges
8/4/2019 Solaris10-DTraceContainers
49/62
Why Solaris Containers?Why Solaris Containers?
8/4/2019 Solaris10-DTraceContainers
50/62
50
Common Question From
IT Managers
How can I provide predictable
service levels to my end users, andensure all systems are running cost-effectively and efficiently?
8/4/2019 Solaris10-DTraceContainers
51/62
51
Traditional Resource Management
Nw
Application
Server
UtilizationLevel
Customer
AA
WebWeb
ServerServer
CC
WebWeb
ServerServer
BB
WebWeb
ServerServer
DD
AppApp
ServerServer
EE
DBDB
ServerServer
One application perserver
Size every server forthe peak
Avg. utilization rate is20%30%
8/4/2019 Solaris10-DTraceContainers
52/62
52
S Vz G
Max ResourceUtilization
Security and FaultIsolation
Easy and FlexibleServiceManagement
SS SS
22
4 4 5 5
8/4/2019 Solaris10-DTraceContainers
53/62
53
S
Solaris Zones> An virtualized operating environment
> Over single OS instance
> Global and non-global Zones Solaris Resource Management
> Workloads, project(4)> Resource pools
8/4/2019 Solaris10-DTraceContainers
54/62
54
Doma
ins
Jail/V
servers
VMW
are/IBMV
M
Jail/V
servers
LPAR
s
Serve
rs
VMW
are/IBM
VM
Nake
d
Serve
rs
Nake
d
Doma
ins
LPAR
sJai
l/Vservers
Serve
rs
VMW
are/IBM
VM
Zone
s+RM
Doma
ins
LPAR
sZo
nes+
RM
Zone
s+RM
w, w,
8/4/2019 Solaris10-DTraceContainers
55/62
Solaris ZonesSolaris Zones
8/4/2019 Solaris10-DTraceContainers
56/62
56
S Z
Virtualized Platform> Security> File systems> Network Interfaces
> Devices> Resource Management Controls
Application Environment> Processes
> IPC objects> Identity: node name, time zone, IP address, RPCdomain, locale, NIS, LDAP, etc.
8/4/2019 Solaris10-DTraceContainers
57/62
57
Solaris Zones
network device(hme0)
storage complex
globalzone(serviceprovider.com)blue zone (blueslugs.com)
web services(Apache 1.3.22, J2SE)
enterprise services(Oracle 8i, IAS 6)
foo zone (foo.net)
network services(BIND 8.3, sendmail)
login services(SSH sshd)
zoneadmd
beck zone (beck.org)
web services(Apache 2.0)
network services(BIND 9.2, sendmail)
remote admin/monitoring(SNMP, SunMC, WBEM)
platform administration(syseventd, devfsadm, ...)
core services(ypbind, automountd)
core services(ypbind, inetd, rpcbind)
core services(inetd, ldap_cachemgr)
core services(inetd, rpcbind, ypbind,automountd, snmpd, dtlogin,sendmail, sshd, ...)
zone root: /zone/blueslugs zone root: /zone/foonet zone root: /zone/beck
network device(ce0)
zone management (zonecfg(1M), zoneadm(1M), zlogin(1), ...)
hme0:2
ce0:2
hme0:1
ce0:1
zcons
zcons
zcons
zoneadmdzoneadmd
/usr
/usr
/usr
/opt/yt
App
lication
Env
ironment
Virtual
Platform
8/4/2019 Solaris10-DTraceContainers
58/62
58
Z S Ow
Each zone has a security boundary around it
Zones run with reduced privilege> A zone is not able to escalate its privileges
Important name spaces are isolated Processes running in a zone (even as root) are
not able to affect activity in other zones
By default, cross-zone communication is via thenetwork only
8/4/2019 Solaris10-DTraceContainers
59/62
DTrace &DTrace &Solaris ContainersSolaris Containers
ResourcesResources
8/4/2019 Solaris10-DTraceContainers
60/62
60
DTrace Resources
Solaris DTrace Guidehttp://docs.sun.com/db/doc/817-6223
BigAdmin DTrace web page
http://www.sun.com/bigadmin/content/dtrace/ Solaris DTrace Webcast
http://www.snpnet.com/sun_DTrace/dtrace_flash.html
Open Solaris DTrace community pagehttp://www.opensolaris.org/os/community/dtrace/
DTrace toolkit contains a lot of very useful scriptshttp://www.opensolaris.org/os/community/dtrace/dtraceto
8/4/2019 Solaris10-DTraceContainers
61/62
61
Containers Resources
BigAdmin Solaris Containers and Zones web pagehttp://www.sun.com/bigadmin/content/zones/
Solaris Container Webcast
http://www.snpnet.com/clients/sun/containers06092005/solaris.html
Open Solaris Zones and Containers FAQhttp://opensolaris.org/os/community/zones/faq/
8/4/2019 Solaris10-DTraceContainers
62/62
Solaris 10Solaris 10DTrace &DTrace &ContainersContainers
Sang ShinTechnology ArchitectSun Microsystems, Inc.
[email protected]://www.javapassion.com
Top Related