DTrace in the Non-global Zone

of 13 /13
DTrace in the Non-global Zone Bryan Cantrill SVP Engineering, Joyent @bcantrill [email protected]


My presentation at the BayLISA SmartOS meetup on August 16th, 2012. More details at http://dtrace.org/blogs/bmc/2012/06/07/dtrace-in-the-zone/.

Transcript of DTrace in the Non-global Zone

Page 1: DTrace in the Non-global Zone

DTrace in the Non-global ZoneBryan CantrillSVP Engineering, Joyent

@[email protected]

Page 2: DTrace in the Non-global Zone

DTrace and zones: Fraternal twins

• DTrace and zones were developed in parallel during development of Solaris 10

• DTrace integrated (September 2003) before zones (early 2004)

• When zones integrated, the priority was making DTrace in the global zone be able to meaningfully instrument non-global zones

• DTrace in the non-global zone was hard — and a lower priority than other work on both technologies

Page 3: DTrace in the Non-global Zone

DTrace and zones: Basic functionality

• In 2006, Dan Price (with help from Adam Leventhal and Jonathan Adams) added initial support for DTrace in the non-global zone

• Allowed use of syscall provider, pid provider and (in a deranged, broken way) the profile provider

• This was significant work: required modifications to both the zones privilege model and the DTrace privilege model

• For example, required an implicit predicate on syscall and profile probes

Page 4: DTrace in the Non-global Zone

DTrace and zones in SmartOS

• As the worldʼs heaviest user of zones, we at Joyent ran into (and fixed) a number of annoying bugs:

• USDT probes from the non-global were not properly being enabled in the global zone (illumos#908)

• Tick and profile probes did not properly fire when used in the non-global zone (illumos#1456)

• Fixing the latter required an extension of the DTrace privilege model: introduced a notion of restricted operation in which args could not be referenced

Page 5: DTrace in the Non-global Zone

DTrace and zones in SmartOS

• Other (very) annoying issues still lurked:

• Inability to read “cpu” in the non-global zone

• Inability to read any fields from “curlwpsinfo” and “curpsinfo”— especially “pr_dmodel”

• Inability to read the “fds[]” array

• Failure mode highly obnoxious: [my-non-global-zone ~]# dtrace -n BEGIN'{trace(curpsinfo->pr_psargs)}' dtrace: description 'BEGIN' matched 1 probe dtrace: error on enabled probe ID 1 (ID 1: dtrace:::BEGIN): invalid kernel access in action #1 at DIF offset 44

Page 6: DTrace in the Non-global Zone

Divide and conquer

• curlwpsinfo and curpsinfo both are translators over the current thread (“kthread_t”) and current process (“proc_t”)

• Importantly, the state contained in oneʼs own kthread_t and proc_t:

• Is safe to read while executing (threads cannot disappear out from under themselves)

• Does not represent potential privilege escalation

• This can be fixed by simply allowing the loads where one has privileges to the current process!

Page 7: DTrace in the Non-global Zone

fds[]: A magic bullet?

• Somehow, I convinced myself that the problem with fds[] was the translator that translates the member accesses into kernel accesses: inline fileinfo_t fds[int fd] = xlate ( fd >= 0 && fd < t_procp->p_user.u_finfo.fi_nfiles ? curthread->t_procp->p_user.u_finfo.fi_list[fd].uf_file : NULL);

• If the problem was the static translators, the solution must be dynamic translators — a(n in)famously unimplemented feature of DTrace!

• After dtrace.conf(12), I realized that the expression was orthogonal to the fact that the in-kernel implementation must not allow privilege escalation

Page 8: DTrace in the Non-global Zone

fds[]: No magic bullets

• Focussing on the implementation, allows one to consider the specifics of the fds[] case

• Helped by the fact that the fi_list implementation uses memory retiring for scalability of file descriptor lookups: the array is only freed upon process exit

• Assures that oneʼs own fi_list is always pointing to memory that is (or was) an array of uf_entry_t

• Leaves the file_t itself, which can be freed during probe context (specifically, by another thread in the same process)

Page 9: DTrace in the Non-global Zone

Dealing with file_t

• We can deal with this by forcing everyone out of probe context after a file_t has been removed from the uf_entry_t, but before being freed

• This is done by issuing a dtrace_sync() — a synchronous (empty) cross-call to all CPUs

• This is expensive, and required answering an important question: just how hot is the closef() path, anyway?

• By instrumenting our guinea pigs production cloud, we could answer this concisely: closef() is pretty damned hot (> 5,000/second on some machines!)

Page 10: DTrace in the Non-global Zone

Adding getf()

• To track when fds[] was active in the non-global zone, we added a getf() subroutine (ht: ken)

• Allows us to issue the sync only when we have a closef() from a non-global zone using fds[]

• Had to take the final step of cleaning up the path output to strip off the zone path from the file name (as a cleanliness issue, not a security issue)

• De-mo, de-mo, de-mo!

Page 11: DTrace in the Non-global Zone

sched and proc providers

• With fds[] done, focus turned the only meaningful impediment to DTrace in the non-global zone: enabling the sched and proc providers

• Recall the restricted operation introduced for the profile provider in the non-global zone...

• Used this to have limited (non-global) DTrace privileges imply restricted operation for some SDT providers

• Thanks to the curlwpsinfo/curpsinfo work, these providers can be meaningfully used without access to arguments

Page 12: DTrace in the Non-global Zone
Page 13: DTrace in the Non-global Zone