kAFL: Hardware-Assisted Feedback Fuzzing for OS...

17
This paper is included in the Proceedings of the 26th USENIX Security Symposium August 16–18, 2017 • Vancouver, BC, Canada ISBN 978-1-931971-40-9 Open access to the Proceedings of the 26th USENIX Security Symposium is sponsored by USENIX kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels Sergej Schumilo, Cornelius Aschermann, and Robert Gawlik, Ruhr-Universität Bochum; Sebastian Schinzel, Münster University of Applied Sciences; Thorsten Holz, Ruhr-Universität Bochum https://www.usenix.org/conference/usenixsecurity17/technical-sessions/presentation/schumilo

Transcript of kAFL: Hardware-Assisted Feedback Fuzzing for OS...

Page 1: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

This paper is included in the Proceedings of the 26th USENIX Security SymposiumAugust 16ndash18 2017 bull Vancouver BC Canada

ISBN 978-1-931971-40-9

Open access to the Proceedings of the 26th USENIX Security Symposium

is sponsored by USENIX

kAFL Hardware-Assisted Feedback Fuzzing for OS Kernels

Sergej Schumilo Cornelius Aschermann and Robert Gawlik Ruhr-Universitaumlt Bochum Sebastian Schinzel Muumlnster University of Applied Sciences

Thorsten Holz Ruhr-Universitaumlt Bochum

httpswwwusenixorgconferenceusenixsecurity17technical-sessionspresentationschumilo

kAFL Hardware-Assisted Feedback Fuzzing for OS Kernels

Sergej SchumiloRuhr-Universitaumlt Bochum

Cornelius AschermannRuhr-Universitaumlt Bochum

Robert GawlikRuhr-Universitaumlt Bochum

Sebastian SchinzelMuumlnster University of Applied Sciences

Thorsten HolzRuhr-Universitaumlt Bochum

AbstractMany kinds of memory safety vulnerabilities have

been endangering software systems for decadesAmongst other approaches fuzzing is a promisingtechnique to unveil various software faults Recentlyfeedback-guided fuzzing demonstrated its power pro-ducing a steady stream of security-critical software bugsMost fuzzing effortsmdashespecially feedback fuzzingmdasharelimited to user space components of an operating system(OS) although bugs in kernel components are moresevere because they allow an attacker to gain accessto a system with full privileges Unfortunately kernelcomponents are difficult to fuzz as feedback mechanisms(ie guided code coverage) cannot be easily appliedAdditionally non-determinism due to interrupts kernelthreads statefulness and similar mechanisms posesproblems Furthermore if a process fuzzes its ownkernel a kernel crash highly impacts the performance ofthe fuzzer as the OS needs to reboot

In this paper we approach the problem of coverage-guided kernel fuzzing in an OS-independent andhardware-assisted way We utilize a hypervisor and In-telrsquos Processor Trace (PT) technology This allows usto remain independent of the target OS as we just re-quire a small user space component that interacts withthe targeted OS As a result our approach introducesalmost no performance overhead even in cases wherethe OS crashes and performs up to 17000 executionsper second on an off-the-shelf laptop We developed aframework called kernel-AFL (kAFL) to assess the secu-rity of Linux macOS and Windows kernel componentsAmong many crashes we uncovered several flaws in theext4 driver for Linux the HFS and APFS file system ofmacOS and the NTFS driver of Windows

1 Introduction

Several vulnerability classes such as memory corrup-tions race-conditional memory accesses and use-after-

free vulnerabilities are known threats for programs run-ning in user mode as well as for the operating system(OS) core itself Past experience has shown that attack-ers typically focus on user mode applications This islikely because vulnerabilities in user mode programs arenotoriously easier and more reliable to exploit How-ever with the appearance of different kinds of exploitdefense mechanisms ndash especially in user mode it hasbecome much harder nowadays to exploit known vul-nerabilities Due to those advanced defense mechanismsin user mode the kernel has become even more appeal-ing to an attacker since most kernel defense mechanismsare not widely deployed in practice This is due to morecomplex implementations which may affect the systemperformance Furthermore some of them are not partof the official mainline code base or even require sup-port for the latest CPU extension (eg SMAP SMEPon x86-64) Additionally when compromising the OSan attacker typically gains full access to the system re-sources (except for virtualized systems) Kernel-levelvulnerabilities are usually used for privilege escalationor to gain persistence for kernel-based rootkits

For a long time fuzzing has been a critical compo-nent in testing and establishing the quality of softwareHowever with the development of American Fuzzy Lop(AFL) smarter fuzzers have gained significant tractionin the industry [1] as well as in research [8 14 35 37]This trend was further amplified by Googlersquos OSS Fuzzproject that successfully foundmdashand continues to findmdasha significant number of critical bugs in highly security-relevant software Finally DARPArsquos Cyber Grand Chal-lenge showed that fuzzing remains highly relevant forthe state-of-the-art in bug finding The latest generationof feedback-driven fuzzers generally uses mechanismsto learn which inputs are interesting and which are notInteresting inputs are used to produce more inputs thatmay trigger new execution paths in the target Inputs thatdid not trigger interesting behavior in the program arediscarded Thus the fuzzer is able to ldquolearnrdquo the input

USENIX Association 26th USENIX Security Symposium 167

format This greatly improves efficiency and usabilityof fuzzers especially by reducing the need for an oraclewhich generates semi-valid inputs or an extensive corpusthat covers most paths in the target

Unfortunately AFL is limited to user space applica-tions and lacks kernel support Fuzzing kernels has aset of additional challenges when compared to userland(or ring 3) fuzzing First crashes and timeouts mandatethe use of virtualization to be able to catch faults andcontinue gracefully Second kernel-level code has sig-nificantly more non-determinism than the average ring 3programmdashmostly due to interrupts kernel threads state-fulness and similar mechanisms This makes fuzzingkernel code challenging Furthermore there is no equiv-alent to command line arguments or stdin to interactwith kernels or drivers in a generic way except for plaininterrupt or sysenter instructions In addition the Win-dows kernel and many relevant drivers and core compo-nents (for Windows macOS and even Linux) are closedsource and cannot be instrumented by common tech-niques without a significant performance overhead

Previous approaches to kernel fuzzing were notportable because they relied on certain drivers or recom-pilation [10 34] were very slow due to emulation togather feedback [7] or simply were not feedback-drivenat all [11]

In this paper we introduce a new technique that al-lows applying feedback fuzzing to arbitrary (even closedsource) x86-64 based kernels without any custom ring0 target code or even OS-specific code at all Wediscuss the design and implementation of kernel-AFL(kAFL) our prototype implementation of the proposedtechniques The overhead for feedback generation is verysmall (less than 5) due to a new CPU feature IntelrsquosProcessor Trace (PT) technology provides control flowinformation on running code We use this informationto construct a feedback mechanism similar to AFLrsquos in-strumentation This allows us to obtain up to 17000 ex-ecutions per second on an off-the-shelf laptop (ThinkpadT460p i7-6700HQ and 32 GB RAM) for simple targetdrivers Additionally we describe an efficient way fordealing with the non-determinisms that occur during ker-nel fuzzing Due to the modular design kAFL is exten-sible to fuzz any x86x86-64 OS We have applied kAFLto Linux macOS and Windows and found multiple pre-viously unknown bugs in kernel drivers in those OSs

In summary our contributions in this paper are

bull OS independence We show that feedback-drivenfuzzing of closed-source kernel mode componentsis possible in an (almost) OS-independent mannerby harnessing the hypervisor (VMM) to producecoverage This allows targeting any x86 operatingsystem kernel or user space component of interest

bull Hardware-assisted feedback Our fuzzing ap-proach utilizes Intelrsquos Processor Trace (PT) tech-nology and thus has a very small performance over-head Additionally our PT-decoder is up to 30 timesfaster than Intelrsquos ptxed decoder Thereby we ob-tain complete trace information that we use to guideour evolutionary fuzzing algorithm to maximize testcoverage

bull Extensible and modular design Our modular de-sign separates the fuzzer the tracing engine andthe target to fuzz This allows to support additionalx86 operating systemsrsquo kernel space and user spacecomponents without the need to develop a systemdriver for the target OS

bull kernel-AFL We incorporated our design con-cepts and developed a prototype called kernel-AFL(kAFL) which was able to find several vulnerabili-ties in kernel components of different operating sys-tems To foster research on this topic we make thesource code of our prototype implementation avail-able at httpsgithubcomRUB-SysSeckAFL

2 Technical Background

We use the Intel Processor Trace (Intel PT) extension ofIA-32 CPUs to obtain coverage information for ring 0execution of arbitrary (even closed-source) OS code Tofacilitate efficient and OS-independent fuzzing we alsomake use of Intelrsquos hardware virtualization features (In-tel VT-x) Hence our approach requires a CPU that sup-ports both Intel VT-x and Intel PT This section provides abrief overview of these hardware features and establishesthe technical foundation for the later sections

21 x86-64 Virtual Memory LayoutsEvery commonly used x86-64 OS uses a split virtualmemory layout The kernel is commonly located at theupper half of each virtual memory space whereas eachuser mode process memory is located in the lower halfFor example the virtual memory space of Linux is typ-ically split into kernel space (upper half) and user space(lower half) each with a size of 247 due to the 48-bitvirtual address limit of current x86-64 CPUs Hencethe kernel memory is mapped to any virtual addressspace and therefore it is located always at the samevirtual address If an user mode process executes thesyscallsysenter instruction for kernel interaction orcauses an exception that has to be handled by the OS theOS will keep the current CR3 value and thus does notswitch the virtual memory address space Instead thecurrent virtual memory address space is reused and thekernel handles the current user mode process related taskwithin the same address space

168 26th USENIX Security Symposium USENIX Association

22 Intel VT-x

The kernel fuzzing approach introduced in this paper re-lies on modern x86-64 hardware virtualization technol-ogy Hence we provide a brief overview of Intelrsquos hard-ware virtualization technology Intel VT-x

We differentiate between three kinds of CPUs phys-ical CPUs logical CPUs and virtual CPUs (vCPUs) Aphysical CPU is a CPU that is implemented in hardwareMost modern CPUs support mechanisms to increasemultithreading performance without additional physicalCPU cores on the die (eg Intel Hyper-Threading) Inthis case there are multiple logical CPUs on one phys-ical CPU These different logical CPUs share the phys-ical CPU and thus only one of them can be active ata time However the execution of the different logicalCPUs is interleaved by the hardware and therefore theavailable resources can be utilized more efficiently (egone logical CPU uses the arithmetic logic unit while an-other logical CPU waits for a data fetch) and the oper-ating system can reduce the scheduling overhead Eachlogical CPU is usually treated like a whole CPU by theoperating system Finally it is possible to create multiplehardware-supported virtual machines (VMs) on a singlelogical CPU In this case each VM has a set of its ownvCPUs

The virtualization role model is divided into two com-ponents the virtual machine monitor (VMM) and theVM The VMM also named hypervisor or host is priv-ileged software that has full control over the physicalCPU and provides virtualized guests with restricted ac-cess to physical resources The VM also termed guest isa piece of software that is transparently executed withinthe virtualized context provided by the VMM

To provide full hardware-assisted virtualization sup-port Intel VT-x adds two additional execution modesto the well-known protection ring based standard modeof execution The default mode of executions is calledVMX OFF It does not implement any hardware virtual-ization support When using hardware-supported virtual-ization the CPU switches into the VMX ON state and dis-tinguishes between two different execution modes thehigher-privileged mode of the hypervisor (VMX root orVMM) and the lower privileged execution mode of thevirtual machine guest (VMX non-root or VM)

When running in guest mode several privileged ac-tions or reasons (execution of restricted instructions ex-pired VMX-preemption timer or access to certain em-ulated devices) in the VM guest will trigger a VM-Exitevent and transfer control to the hypervisor This wayit is possible to run arbitrary software that expects priv-ileged access to the hardware (such as an OS) inside aVM At the same time a higher authority can meditate

and control the operations performed with a small per-formance overhead

To create launch and control a VM the VMM has touse a virtual machine control structure (VMCS) for eachvCPU [28] The VMCS contains all essential informa-tion about the current state and how to perform VMXtransitions of the vCPU

23 Intel Processor TraceWith the fifth generation of Intel Core processors (Broad-well architecture) Intel has introduced a new processorfeature called Intel Processor Trace (Intel PT) to provideexecution and branch tracing information Unlike otherbranch tracing technologies such as Intel Last BranchRecord (LBR) the size of the output buffer is no longerstrictly limited by special registers Instead it is onlylimited by the size of the main memory If the outputtarget is repeatedly and timely emptied we can createtraces of arbitrary length The processorrsquos output formatis packet-oriented and separated into two different typesgeneral execution information and control flow informa-tion packets Intel PT produces various types of con-trol flow related packet types during runtime To obtaincontrol-flow information from the trace data we requirea decoder The decoder needs the traced software to in-terpret the packets that contain the addresses of condi-tional branches

Intel specifies five types of control flow affecting in-structions called Change of Flow Instruction (CoFI) Theexecution of different CoFI types results in different se-quences of flow information packets The three CoFItypes relevant to our work are

1 Taken-Not-Taken (TNT) If the processor exe-cutes any conditional jump the decision whetherthis jump was taken or not is encoded in a TNTpacket

2 Target IP (TIP) If the processor executes an indi-rect jump or transfer instruction the decoder willnot be able to recover the control flow There-fore the processor produces a TIP packet upon theexecution of an instruction of the type indirectbranch near ret or far transfer These TIPpackets store the corresponding target instructionpointer executed by the processor after the transferor jump has occurred

3 Flow Update Packets (FUP) Another case wherethe processor must produce a hint packet for thesoftware decoder are asynchronous events such asinterrupts or traps These events are recorded asFUPs and usually followed by a TIP to indicate thefollowing instruction

USENIX Association 26th USENIX Security Symposium 169

To limit the amount of trace data generated Intel PTprovides multiple options for runtime filtering Depend-ing on the given processor it might be possible to con-figure multiple ranges for instruction-pointer filtering (IPFilter) In general these filter ranges only affect virtualaddresses if paging is enabled this is always the casein x86-64 long-mode Therefore it is possible to limittrace generation to selected ranges and thus avoid hugeamounts of superfluous trace data In accordance to theIP filtering mechanism it is possible to filter traces bythe current privilege level (CPL) of the protection ringmodel (eg ring 0 or ring 3) This filter allows us to selectonly the user mode (CPL gt 0) or kernel mode (CPL = 0)activity kAFL utilizes this filter option to limit tracingexplicitly to kernel mode execution In most cases thefocus of tracing is not the whole OS within all user modeprocesses and their kernel interactions To limit tracedata generation to one specific virtual memory addressspace software can use the CR3 Filter Intel PT will onlyproduce trace data if the CR3 value matches the config-ured filter value The CR3 register contains the pointer tothe current page table The value of the CR3 register canthus be used to filter code executed on behalf of a certainring 3 process even in ring 0 mode

Intel PT supports various configurable target domainsfor output data kAFL focuses on the Table of PhysicalAddresses (ToPA) mechanism that enables us to specifymultiple output regions Every ToPA table contains mul-tiple ToPA entries which in turn contain the physical ad-dress of the associated memory chunk used to store tracedata Each ToPA entry contains the physical address asize specifier for the referred memory chunk in physicalmemory and multiple type bits These type bits specifythe CPUrsquos behavior on access of the ToPA entry and howto deal with filled output regions

3 System Overview

We now provide a high-level overview of the design of anOS-independent and hardware-assisted feedback fuzzerbefore presenting the implementation details of our toolcalled kAFL in Section 4

Our system is split into three components thefuzzing logic the VM infrastructure (modified versionsof QEMU and KVM denoted by QEMU-PT and KVM-PT) and the user mode agent The fuzzing logic runsas a ring 3 process on the host OS This logic is also re-ferred to as kAFL The VM infrastructure consists of aring 3 component (QEMU-PT) and a ring 0 component(KVM-PT) This facilitates communication between theother two components and makes the Intel PT trace dataavailable to the fuzzing logic In general the guest onlycommunicates with the host via hypercalls The host canthen read and write guest memory and continues VM ex-

Figure 1 High-level overview of the kAFL architectureThe setup process ( 1copy- 3copy) is not shown

ecution once the request has been handled A overviewof the architecture can be seen in Figure 1

We now outline the events and communication thattake place during a fuzz run as depicted in Figure 2When the VM is started the first part of the user modeagent (the loader) uses the hypercall HC_SUBMIT_PANICto submit the address of the kernel panic handler (or theBugCheck kernel address in Windows) to QEMU-PT 1copyQEMU-PT then patches a hypercall calling routine at theaddress of the panic handler This allows us to get noti-fied and react fast to crashes in the VM (instead of wait-ing for timeouts reboots)

Then the loader uses the hypercall HC_GET_PROGRAM torequest the actual user mode agent and starts it 2copy Nowthe loader setup is complete and the fuzzer begins its ini-tialization The agent triggers a HC_SUBMIT_CR3 hyper-call that will be handled by KVM-PT The hypervisorextracts the CR3 value of the currently running processand hands it over to QEMU-PT for filtering 3copy Finallythe agent uses the hypercall HC_SUBMIT_BUFFER to in-form the host at which address it expects its inputs Thefuzzer setup is now finished and the main fuzzing loopstarts

During the main loop the agent requests a new inputusing the HC_GET_INPUT hypercall 4copy The fuzzing logicproduces a new input and sends it to QEMU-PT SinceQEMU-PT has full access to the guestrsquos memory spaceit can simply copy the input into the buffer specified bythe agent Then it performs a VM-Entry to continue ex-ecuting the VM 5copy At the same time this VM-Entryevent enables the PT tracing mechanism The agent nowconsumes the input and interacts with the kernel (egit interprets the input as a file system image and tries tomount it 6copy) While the kernel is being fuzzed QEMU-PT decodes the trace data and updates the bitmap on de-mand Once the interaction is finished and the kernelhanded control back to the agent the agent notifies thehypervisor via a HC_FINISHED hypercall The resultingVM-Exit stops the tracing and QEMU-PT decodes theremaining trace data 7copy The resulting bitmap is passed

170 26th USENIX Security Symposium USENIX Association

Figure 2 Overview of the kAFL hypercall interaction

to the logic for further processing 8copy Afterwards theagent can continue to run any untraced clean-up routinesbefore issuing another HC_GET_INPUT to start the nextloop iteration

31 Fuzzing Logic

The fuzzing logic is the command and controlling com-ponent of kAFL It manages the queue of interestinginputs creates mutated inputs and schedules them forevaluation In most aspects it is based on the algo-rithms used by AFL Similarly to AFL we use a bitmapto store basic block transitions We gather the AFLbitmap from the VMs through an interface to QEMU-PTand decide which inputs triggered interesting behaviourThe fuzzing logic also coordinates the number of VMsspawned in parallel One of the bigger design differencesto AFL is that kAFL makes extensive use of multipro-cessing and parallelism where AFL simply spawns mul-tiple independent fuzzers which synchronize their inputqueues sporadically1 In contrast kAFL executes the de-terministic stage in parallel and all threads work on themost interesting input A significant amount of time isspent in tasks that are not CPU-bound (such as gueststhat delay execution) Therefore using many parallelprocesses (upto 5-6 per CPU core) drastically improvesperformance of the fuzzing process due to a higher CPUload per core Lastly the fuzzing logic communicateswith the user interface to display current statistics in reg-ular intervals

1AFL recently added experimental support for distributing thedeterministic stage see httpsgithubcommirroreraflblobmasterdocsparallel_fuzzingtxtL60-L66

32 User Mode AgentWe expect a user mode agent to run inside the virtual-ized target OS In principle this component only has tosynchronize and gather new inputs by the fuzzing logicvia the hypercall interface and use it to interact with theguestrsquos kernel Example agents are programs that try tomount inputs as file system images pass specific filessuch as certificates to kernel parser or even execute achain of various syscalls

In theory we only need one such component In prac-tice we use two different components The first programis the loader component Its job is to accept an arbitrarybinary via the hypercall interface This binary representsthe user mode agent and is executed by the loader com-ponent Additionally the loader component will checkif the agent has crashed (which happens often in case ofsyscall fuzzing) and restarts it if necessary This setuphas the advantage that we can pass any binary to theVM and reuse VM snapshots for different fuzzing com-ponents

33 Virtualization InfrastructureThe fuzzing logic uses QEMU-PT to interact with KVM-PT to spawn the target VMs KVM-PT allows us to traceindividual vCPUs instead of logical CPUs This com-ponent configures and enables Intel PT on the respec-tive logical CPU before the CPU switches to guest ex-ecution and disables tracing during the VM-Exit tran-sition This way the associated CPU will only pro-vide trace data of the virtualized kernel itself QEMU-PT is used to interact with the KVM-PT interface toconfigure and toggle Intel PT from user space and ac-cess the output buffer to decode the trace data The

USENIX Association 26th USENIX Security Symposium 171

decoded trace data is directly translated into a streamof addresses of executed conditional branch instruc-tions Moreover QEMU-PT also filters the stream ofexecuted addressesmdashbased on previous knowledge ofnon-deterministic basic blocksmdashto prevent false-positivefuzzing results and makes those available to the fuzzinglogic as AFL-compatible bitmaps We use our own cus-tom Intel PT decoder to cache disassembly results whichleads to significant performance gains compared to theoff-the-shelf solution provided by Intel

34 Stateful and Non-Deterministic Code

Tracing operating systems results in a significant amountof non-determinism The largest source of non-deterministic basic block transitions are interrupts whichcan occur at any point in time Additionally our imple-mentation does not reset the whole state after each exe-cution since reloading the VM from a memory snapshotis costly Thus we have to deal with the stateful and asyn-chronous aspects of the kernel An example for statefulcode might be a simple call to kmalloc() Dependingon the number of previous allocations kmalloc() mightsimply return a fresh pointer or map a whole range ofpages and update a significant amount of metadata Weuse two techniques to deal with these challenges

The first one is to filter out interrupts and the transi-tion caused while handling interrupts This is possibleusing the Intel PT trace data If an interrupt occurs theprocessor emits a TIP instruction since the transfer is notvisible in the code To avoid confusion during an inter-rupt occurring at an indirect control flow instruction theTIP packet is marked with FUP (flow update packet) toindicate an asynchronous event After identifying such asignature the decoder will drop all basic blocks visiteduntil the corresponding iret instruction is encounteredTo link the interrupts with their corresponding iret wetrack all interrupts on a simple call stack This mecha-nism is necessary since the interrupt handler may itselfbe interrupted by another interrupt

The second mechanism is to blacklist any basic blockthat occurs non-deterministically Each time we en-counter a new bit in the AFL bitmap we re-run the in-put several times in a row Every basic block that doesnot show up in all of the trials will be marked as non-deterministic and filtered from further processing Forfast access the results are stored in a bitmap of black-listed basic block addresses During the AFL bitmaptranslation any transition hash valuemdashwhich combinesthe current basic block address and the previous ba-sic block addressmdashinvolving a blacklisted block will beskipped

35 Hypercalls

Hypercalls are a feature introduced by virtualization OnIntel platforms hypercalls are triggered by the vmcallinstruction Hypercalls are to VMMs as syscalls are tokernels If any ring 3 process or the kernel in the VM ex-ecutes a vmcall instruction a VM-Exit event is triggeredand the VMM can decide how to process the hypercallWe patched KVM-PT to pass through our own set of hy-percalls to the fuzzing logic if a magic value is passedin rax and the appropriate hypercall-ID is set in rbxAdditionally we also patched KVM-PT to accept hyper-calls from ring 3 Arguments for specific hypercalls arepassed through rcx We use this mechanism to definean interface that user mode agent can use to communi-cate with the fuzzing logic One example hypercall isHC_SUBMIT_BUFFER Its argument is a guest pointer thatis stored in rcx Upon executing the vmcall instructiona VM-Exit is triggered and QEMU-PT stores the bufferpointer that was passed It will later copy the new inputdata into this buffer (see step 5copy in Figure 2) Finally theexecution of the VM is continued

climov rax KAFL_MAGIC_VALUEmov rbx HC_CRASHmov rcx 0x0vmcall

Listing 1 Hypercall crash notifier

Another use case for this interface is to notify thefuzzing logic when a crash occurs in the target OS kernelIn order to do so we overwrite the kernel crash handlerof the OS with a simple hypercall routine The injectedcode is shown in Listing 1 and displays how the hyper-call interface is used on the assembly level The cli in-struction disables all interrupts to avoid any kind of asyn-chronous interference during the hypercall routine

4 Implementation Details

Based on the design outlined in the previous section webuilt a prototype of our approach called kAFL In the fol-lowing we describe several implementation details Thesource code of our reference implementation is availableat httpsgithubcomRUB-SysSeckAFL

41 KVM-PT

Intel PT allows us to trace branch transitions withoutpatching or recompiling the targeted kernel To the bestof our knowledge no publicly available driver is able totrace only guest executions of a single vCPU using In-tel PT for long periods of time For instance Simple-PT[29] does not support long-term tracing by design The

172 26th USENIX Security Symposium USENIX Association

perf-subsystem [5] supports tracing of VM guest oper-ations and long-term tracing However it is designed totrace logical CPUs not vCPUs Even if VMX executionis traced the data would be associated with logical CPUsand not with vCPUs Hence the VMX context must bereassembled which is a costly task

To address these shortcomings we developed KVM-PT It allows us to trace vCPUs for an indefinite amountof time without any scheduling side effects or any lossof trace data due to overflowing output regions The ex-tension provides a fast and reliable trace mechanism forKVM vCPUs Moreover this extension exposes muchlike KVM an extensive user mode interface to accessthis tracing feature from user space QEMU-PT utilizesthis novel interface to interact with KVM-PT and to ac-cess the resulting trace data

411 vCPU Specific Traces

To enable Intel PT software that runs within ring0 (in our case KVM-PT) has to set the corre-sponding bit of a model specific register (MSR)(IA32_RTIT_CTL_MSRTraceEn) [28] After tracing isenabled the logical CPU will trace any executed code ifit satisfies the configured filter options The modificationhas to be done before the CPU switches from the hostcontext to the VM operation otherwise the CPU will ex-ecute guest code and is technically unable to modify anyhost MSRs The inverse procedure is required after theCPU has left the guest context However enabling or dis-abling Intel PT manually will also yield a trace contain-ing the manual MSR modification To prevent the collec-tion of unwanted trace data within the VMM we use theMSR autoload capabilities of Intel VT-x MSR autoload-ing can be enabled by modifying the corresponding en-tries in the VMCS (eg VM_ENTRY_CONTROL_MSRfor VM-entries) This forces the CPU to load a list of pre-configured values for defined MSRs after either a VM-entry or VM-exit has occurred By enabling tracing viaMSR autoloading we only gather Intel PT trace data forone specific vCPU

412 Continuous Tracing

Once we have enabled Intel PT the CPU will write theresulting trace data into a memory buffer until it is fullThe physical addresses of this buffer and how to han-dle full buffers is specified by an array of data structurescalled Table of Physical Addresses (ToPA) entries

The array can contain multiple entries and has to beterminated by a single END entry 3copy There are twodifferent ways the CPU can handle an overflow It canstop the tracing (while continuing the executionmdashthus

Figure 3 KVM-PT ToPA configuration

resulting in incomplete traces) or it can raise an inter-rupt This interrupt causes a VM-exit since it is not mask-able We catch the interrupt on the host and consume thetrace data Finally we reset the buffers and continue withthe VM execution Unfortunately this interrupt might beraised at an unspecified time after the buffer was filled2Our configuration of the ToPA entries can be seen in Fig-ure 3 To avoid losing trace data we use two differentToPA entries The first one is the main buffer 1copy Itsoverflow behavior is to trigger the interrupt Once themain buffer is filled a second entry is used until the in-terrupt is actually delivered The ToPA specifies anothersmaller buffer 2copy Overflowing the second buffer wouldlead to the stop of the tracing To avoid the resultingdata loss we chose the second buffer to be about fourtimes larger than the largest overflowing trace we haveever seen in our tests (4 KB)

In case the second buffer also overflows the followingtrace will contain a packet indicating that some data ismissing In that case the size of the second buffer cansimply be increased This way we manage to obtain pre-cise traces for any amount of trace data

42 QEMU-PTTo make use of the KVM extension KVM-PT an userspace counterpart is required QEMU-PT is an extensionof QEMU and provides full support for KVM-PTrsquos userspace interface This interface provides mechanisms toenable disable and configure Intel PT at runtime as wellas a periodic ToPA status check to avoid overruns KVM-PT is accessible from user mode via ioctl() commandsand an mmap() interface

In addition to being a userland interface to KVM-PTQEMU-PT includes a component that decodes trace datainto a form more suitable for the fuzzing logic We de-code the Intel PT packets and turn them into an AFL-likebitmap

421 PT Decoder

Extensive kernel fuzzing may generate several hundredsof megabytes of trace data per second To deal with

2This is due to the current implementation of this interrupt Intelspecifies the interrupt as not precise which means it is likely that fur-ther data will be written to the next buffer or tracing will be terminatedand data will be discarded

USENIX Association 26th USENIX Security Symposium 173

Figure 4 Overview of the pipeline that converts Intel PT traces to kAFL bitmaps

such large amounts of incoming data the decoder mustbe implemented with a focus on efficiency Otherwisethe decoder may become the major bottleneck duringthe fuzzing process Nevertheless the decoder must alsobe precise as inaccuracies during the decoding processwould result in further errors This is due to the nature ofIntel PT decoding since the decoding process is sequen-tial and is affected by previously decoded packets

To ease efforts to implement an Intel PT softwaredecoder Intel provides its own decoding engine calledlibipt [4] libipt is a general-purpose Intel PT decod-ing engine However it does not fit our purposes verywell because libipt decodes trace data in order to pro-vide execution data and flow information Furthermorelibipt does not cache disassembled instructions and hasperformed poorly in our use cases

Since kAFL only relies on flow information and thefuzzing process is repeatedly applied to the same codeit is possible to optimize the decoding process Our In-tel PT software decoder acts like a just-in-time decoderwhich means that code sections are only considered ifthey are executed according to the decoded trace dataTo optimize further look-ups all disassembled code sec-tions are cached In addition we simply ignore packetsthat are not relevant for our use case

Since our PT decoder is part of QEMU-PT trace datais directly processed if the ToPA base region is filledThe decoding process is applied in-place since the bufferis directly accessible from user space via mmap() Un-like other Intel PT drivers we do not need to store largeamounts of trace data in memory or on storage devicesfor post-mortem decoding Eventually the decoded tracedata is translated to the AFL bitmap format

43 AFL Fuzzing Logic

We give a brief description of the fuzzing parts of AFLbecause the logic we use to perform scheduling and mu-tations closely follows that of AFL The most importantaspect of AFL is the bitmap used to trace which basicblock transitions where encountered Each basic blockhas a randomly assigned ID and each transition from ba-sic block A to another basic block B is assigned an offsetinto the bitmap according to the following formula

(id(A)2oplus id(B)) SIZE_OF_BITMAP

Instead of the compile-time random kAFL uses theaddresses of the basic blocks Each time the transition isobserved the corresponding byte in the bitmap is incre-mented After finishing the fuzzing iteration each entryof the bitmap is rounded such that only the highest bit re-mains set Then the bitmap is compared with the globalstatic bitmap to see if any new bit was found If a new bitwas found it is added to the global bitmap and the inputthat triggered the new bit is added to the queue Whena new interesting input is found a deterministic stage isexecuted that tries to mutate each byte individually

Once the deterministic stage is finished the non-deterministic phase is started During this non-deterministic phase multiple mutations are performed atrandom locations If the deterministic phase finds newinputs the non-deterministic phase will be delayed un-til all deterministic phases of all interesting inputs havebeen performed If an input triggers an entirely new tran-sition (as opposed to a change in the number of times thetransition was taken) it will be favored and fuzzed witha higher priority

5 Evaluation

Based on our implementation we now describe thedifferent fuzzing campaigns we performed to evaluatekAFL We evaluate kAFLrsquos fuzzing performance acrossdifferent platforms Section 55 provides an overview ofall reported vulnerabilities crashes and bugs that werefound during the development process of kAFL We alsoevaluate kAFLrsquos ability to find a previously known vul-nerability Finally in Section 56 the overall fuzzingperformance of kAFL is compared to ProjectTriforcethe only other OS-independent feedback fuzzer avail-able TriforceAFL is based on the emulation backendof QEMU instead of hardware-assisted virtualization andIntel PT The performance overhead of KVM-PT is dis-cussed in Section 57 Additionally a performance com-parison of our PT decoder and an Intel implementationof a software decoder is given in Section 58

If not stated otherwise the benchmarks were per-formed on a desktop system with an Intel i7-6700 pro-cessor and 32GB DDR4 RAM To avoid distortions due

174 26th USENIX Security Symposium USENIX Association

to poor IO performance all benchmarks are performedon a RAM disk Similar to AFL we consider a crash-ing input to be unique if it triggered at least one basicblock transition which has not been triggered by any pre-vious crash (ie the bitmap contains at least one newbit) Note this does not imply that the underlying bugsare truly unique

51 Fuzzing Windows

We implemented a small Windows 10 specific user modeagent that mounts any data chunk (fuzzed payload) asNTFS-partitioned volume (289 lines of C code) Weused the Virtual Hard Disk (VHD) API and variousIOCTLS to mount and unmount volumes programmat-ically [31 32] Unfortunately mounting volumes is aslow operation under Windows and we only managedto achieve a throughput of 20 executions per secondNonetheless kAFL managed to find a crash in the NTFSdriver The fuzzer ran for 4 days and 14 hours and re-ported 59 unique crashes all of which were division byzero crashes After manual investigation we suspect thatthere is only one unique bug While it does not allowcode execution it is still a denial-of-service vulnerabilityas for example a USB stick with that malicious NTFSvolume plugged into a critical system will crash that sys-tem with a blue screen It seems that we only scratchedthe surface and NTFS was not thoroughly fuzzed yetHence we assume that the NTFS driver under Windowsis a valuable target for coverage-based feedback fuzzing

Furthermore we implemented a generic system call(syscall) fuzzing agent that simply passes a block of datato a syscall by setting all registers and the top stack re-gion (55 lines of C and 46 lines of assembly code) Thisallows to set parameters for a syscall with a fuzzing pay-load independent of the OS ABI The same fuzzer canbe used to attack syscalls on different operation sys-tems such as Linux or macOS However we evaluatedit against the Windows kernel given the proprietary na-ture of this OS We did not find any bugs in 13 hoursof fuzzing with approx 63M executions since manysyscalls cause the userspace agent to terminate Dueto the coverage-guided feedback kAFL quickly learnedhow to generate payloads to execute valid syscalls andthis led to the unexpected execution of user mode call-backs via the kernel within the fuzzing agent Thesecrashes require rather expensive restarts of the agent andtherefore we only achieved approx 134 executions persecond while normally kAFL achieves a throughput of1000 to 4000 tests per second (see Section 52) Ad-ditionally the Windows syscall interface has already re-ceived much attention by the security community

Figure 5 Fuzzing the ext4 kernel module for 32 hours

52 Fuzzing Linux

We implemented a similar agent for Linux whichmounts data as ext4 volumes (66 lines of C code) Westarted the fuzz campaign with a minimal 64KB ext4 im-age as initial input However we configured the fuzzersuch that it only fuzzes the first two kilobytes duringthe deterministic phase In contrast to Windows theLinux mount process is very fast and we reached 1000to 2000 tests per second on a Thinkpad laptop with ai7-6700HQ26GHz CPU and 32GB RAM Due to thishigh performance we obtained significantly better cov-erage and managed to discover 160 unique crashes andmultiple (confirmed) bugs in the ext4 driver during atwelve-day fuzzing campaign Figure 5 shows the first32 hours of another fuzzing run The fuzzing process wasstill finding new paths and crashes on a fairly regular ba-sis after 32 hours An interesting observation is that therewas no new coverage produced between hours 16 and 25yet the number of inputs increased due a higher numberof loop iterations After hour 25 a truly new input wasfound that unlocked significant parts of the codebase

53 Fuzzing macOS

Similarly to Windows and Linux we targeted multiplefile systems for macOS So far we found approximately150 crashes in the HFS driver and manually confirmedthat at least three of them are unique bugs that lead toa kernel panic Those bugs can be triggered by unprivi-leged users and therefore could very well be abused forlocal denial-of-service attacks One of these bugs seemsto be a use-after-free vulnerability that leads to full con-trol of the rip register Additionally kAFL found 220unique crashes in the APFS kernel extension All 3 HFSvulnerabilities and multiple APFS flaws have been re-ported to Apple

USENIX Association 26th USENIX Security Symposium 175

54 Rediscovery of Known BugsWe evaluated kAFL on the keyctl interface which al-lows a user space program to store and manage vari-ous kinds of key material in the kernel More specif-ically it features a DER (see RFC5280) parser to loadcertificates This functionality had a known bug (CVE-2016-07583) We tested kAFL against the same interfaceon a vulnerable kernel (version 432) kAFL was ableto uncover the same problem and one additional previ-ously unknown bug that was assigned CVE-2016-86504kAFL managed to trigger 17 unique KASan reports and15 unique panics in just one hour of execution time Dur-ing this experiment kAFL generated over 34 million in-puts found 295 interesting inputs and performed nearly9000 executions per second This experiment was per-formed while running 8 processes in parallel

55 Detected VulnerabilitiesDuring the evaluation kAFL found more than a thou-sand unique crashes We evaluated some manually andfound multiple security vulnerabilities in all tested op-erating systems such as Linux Windows and macOSSo far eight bugs were reported and three of them wereconfirmed by the maintainers

bull Linux keyctl Null Pointer Dereference5 (CVE-2016-86506)

bull Linux ext4 Memory Corruption7

bull Linux ext4 Error Handling8

bull Windows NTFS Div-by-Zero9

bull macOS HFS Div-by-Zero10

bull macOS HFS Assertion Fail10

bull macOS HFS Use-After-Free10

bull macOS APFS Memory Corruption10

Red Hat has assigned a CVE number for the first re-ported security flaw which triggers a null pointer def-erence and a partial memory corruption in the kernelASN1 parser if an RSA certificate with a zero expo-nent is presented For the second reported vulnerabil-ity which triggers a memory corruption in the ext4 file

3httpsaccessredhatcomsecuritycvecve-2016-07584httpsaccessredhatcomsecuritycvecve-2016-86505httpseclistsorgfulldisclosure2016Nov766httpsaccessredhatcomsecuritycvecve-2016-86507httpseclistsorgfulldisclosure2016Nov758httpseclistsorgbugtraq2016Nov19Reported to Microsoft Security

10Reported to Apple Product Security

system a mainline patch was proposed The last re-ported Linux vulnerability which calls in the ext4 errorhandling routine panic() and hence results in a kernelpanic was at the time of writing not investigated any fur-ther The NTFS bug in Windows 10 is a non-recoverableerror condition which leads to a blue screen This bugwas reported to Microsoft but has not been confirmedyet Similarly Apple has not yet verified or confirmedour reported macOS bugs

56 Fuzzing PerformanceWe compare the overall performance of kAFL across dif-ferent operating systems To ensure comparable resultswe created a simple driver that contains a JSON parserbased on jsmn11 for each aforementioned operating sys-tem and used it to decode user input (see Listing 2) Ifthe user input is a JSON string starting with KAFL acrash is triggered We traced both the JSON parser aswell as the final check This way kAFL was able to learncorrect JSON syntax We measured the time used to findthe crash the number of executions per second and thespeed for new paths to be discovered on all three targetoperating systems

1 jsmn_parser parser2 jsmntok_t tokens [5]3 jsmn_init (amp parser)4

5 int res=jsmn_parse (ampparser input size tokens 5)6 if(res gt= 2)7 if(tokens [0] type == JSMN_STRING)8 int json_len = tokens [0] end - tokens [0]

start9 int s = tokens [0] start

10 if(json_len gt 0 ampamp input[s+0] == K)11 if(json_len gt 1 ampamp input[s+1] == A)12 if(json_len gt 2 ampamp input[s+2] == F)13 if(json_len gt 3 ampamp input[s+3] == L)14 panic(KERN_INFO KAFL n)15 16

Listing 2 The JSON parser kernel module used for thecoverage benchmarks

We performed five repeated experiments for each op-erating system Additionally we tested TriforceAFL withthe Linux target driver TriforceAFL is unable to fuzzWindows and macOS To compare TriforceAFL withkAFL the associated TriforceLinuxSyscallFuzzer wasslightly modified to work with our vulnerable Linux ker-nel module Unfortunately it was not possible to com-pare kAFL against Oraclersquos file system fuzzer [34] dueto technical issues with its setup

During each run we fuzzed the JSON parser for 30minutes The averaged and rounded results are displayedin Table 1 As we can see the performance of kAFLis very similar across different systems It should be re-marked that the variance in this experiment is rather high

11httpzsergecomjsmnhtml

176 26th USENIX Security Symposium USENIX Association

ExecsSec(1 Process)

ExecsSec(8 Processes)

Time to Crash(1 Process)

Time to Crash(8 Processes)

PathsMin(1 Process)

PathsMin(8 Processes)

TriforceAFL 150 320 -a -a 1008 -b

Linux (initramfs) 3000 5700 750 600 1584 1562Debian 8 3000 5700 455 630 1620 1600Debian 8 (KASan) 4300 5700 920 600 1622 1590macOS (10124) 5100 8100 743 510 1450 1506Windows 10 4300 8700 414 450 1150 1202

a Not found during 30-minute experimentsb This value cannot be obtained since TriforceAFL does not synchronize in such short time frames

Table 1 kAFL and TriforceAFL fuzzing performance on the JSON sample driver

Figure 6 Coverage comparison of kAFL (initramfs) andTriforceAFL kAFL takes less than 3 minutes to find thesame number of paths as TriforceAFL does in 30 minutes(each running 1 process)

and explains some of the surprising results This is due tothe stochastic nature of fuzzing since each fuzzer findsvastly different paths some of which may take signifi-cantly longer to process especially crashing paths andloops One example for high variance is the fact that onDebian 8 (initramfs) the multiprocessing configurationon average needed more time to find the crash than oneprocess

TriforceAFL We used the JSON driver to comparekAFL and TriforceAFL with respect to execution speedand code coverage However the results where biasedheavily in two ways TriforceAFL did not manage to finda path that triggers the crash within 30 minutes (usuallyit takes approximately 2 hours) making it very hard tocompare the code coverage of kAFL and TriforceAFLThe number of discovered paths is not a good indica-tor for the amount of coverage With increasing runningtime it becomes more difficult to discover new pathsSecondly the number of executions per second is also bi-ased by slower and harder to reach paths and especiallycrashing inputs The coverage reached over time can beseen in Figure 6 It is obvious from the figure that kAFLfound a significant number of paths that are very hard to

Figure 7 Raw execution performance comparison

reach for TriforceAFL kAFL mostly stops finding newpaths around the 10-15 minute mark because the targetdriver simply doesnrsquot contain any more paths to be un-covered Therefore the coverage value in Table 1 (statedas PathsMin) is limited to the first 10 minutes of each30-minute run

We also compare raw execution performance insteadof overall fuzzing performance which is biased becauseof the execution of different paths the sampling processfor the non-determinism-filter and various synchroniza-tion mechanisms Especially on smaller inputs thesefactors disproportionately affect the overall fuzzing per-formance To avoid this we compared the performanceduring the first havoc stage Figure 7 shows the raw ex-ecution performance of kAFL compared to TriforceAFLduring this havoc phase kAFL provides up to 54 timesbetter performance compared to TriforceAFLrsquos QEMUCPU emulation Slightly lower performance boosts areseen in single-process execution (48 times faster)

syzkaller We did not perform a performance compari-son against syzkaller [10] This has two reasons First ofall syzkaller is a highly specific syscall fuzzer that en-codes a significant amount of domain knowledge and istherefore not applicable to other domains such as filesys-tem images On the other hand syzkaller would mostlikely generate a significantly higher code coverage evenwithout any feedback since it knows how to generate

USENIX Association 26th USENIX Security Symposium 177

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 2: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

kAFL Hardware-Assisted Feedback Fuzzing for OS Kernels

Sergej SchumiloRuhr-Universitaumlt Bochum

Cornelius AschermannRuhr-Universitaumlt Bochum

Robert GawlikRuhr-Universitaumlt Bochum

Sebastian SchinzelMuumlnster University of Applied Sciences

Thorsten HolzRuhr-Universitaumlt Bochum

AbstractMany kinds of memory safety vulnerabilities have

been endangering software systems for decadesAmongst other approaches fuzzing is a promisingtechnique to unveil various software faults Recentlyfeedback-guided fuzzing demonstrated its power pro-ducing a steady stream of security-critical software bugsMost fuzzing effortsmdashespecially feedback fuzzingmdasharelimited to user space components of an operating system(OS) although bugs in kernel components are moresevere because they allow an attacker to gain accessto a system with full privileges Unfortunately kernelcomponents are difficult to fuzz as feedback mechanisms(ie guided code coverage) cannot be easily appliedAdditionally non-determinism due to interrupts kernelthreads statefulness and similar mechanisms posesproblems Furthermore if a process fuzzes its ownkernel a kernel crash highly impacts the performance ofthe fuzzer as the OS needs to reboot

In this paper we approach the problem of coverage-guided kernel fuzzing in an OS-independent andhardware-assisted way We utilize a hypervisor and In-telrsquos Processor Trace (PT) technology This allows usto remain independent of the target OS as we just re-quire a small user space component that interacts withthe targeted OS As a result our approach introducesalmost no performance overhead even in cases wherethe OS crashes and performs up to 17000 executionsper second on an off-the-shelf laptop We developed aframework called kernel-AFL (kAFL) to assess the secu-rity of Linux macOS and Windows kernel componentsAmong many crashes we uncovered several flaws in theext4 driver for Linux the HFS and APFS file system ofmacOS and the NTFS driver of Windows

1 Introduction

Several vulnerability classes such as memory corrup-tions race-conditional memory accesses and use-after-

free vulnerabilities are known threats for programs run-ning in user mode as well as for the operating system(OS) core itself Past experience has shown that attack-ers typically focus on user mode applications This islikely because vulnerabilities in user mode programs arenotoriously easier and more reliable to exploit How-ever with the appearance of different kinds of exploitdefense mechanisms ndash especially in user mode it hasbecome much harder nowadays to exploit known vul-nerabilities Due to those advanced defense mechanismsin user mode the kernel has become even more appeal-ing to an attacker since most kernel defense mechanismsare not widely deployed in practice This is due to morecomplex implementations which may affect the systemperformance Furthermore some of them are not partof the official mainline code base or even require sup-port for the latest CPU extension (eg SMAP SMEPon x86-64) Additionally when compromising the OSan attacker typically gains full access to the system re-sources (except for virtualized systems) Kernel-levelvulnerabilities are usually used for privilege escalationor to gain persistence for kernel-based rootkits

For a long time fuzzing has been a critical compo-nent in testing and establishing the quality of softwareHowever with the development of American Fuzzy Lop(AFL) smarter fuzzers have gained significant tractionin the industry [1] as well as in research [8 14 35 37]This trend was further amplified by Googlersquos OSS Fuzzproject that successfully foundmdashand continues to findmdasha significant number of critical bugs in highly security-relevant software Finally DARPArsquos Cyber Grand Chal-lenge showed that fuzzing remains highly relevant forthe state-of-the-art in bug finding The latest generationof feedback-driven fuzzers generally uses mechanismsto learn which inputs are interesting and which are notInteresting inputs are used to produce more inputs thatmay trigger new execution paths in the target Inputs thatdid not trigger interesting behavior in the program arediscarded Thus the fuzzer is able to ldquolearnrdquo the input

USENIX Association 26th USENIX Security Symposium 167

format This greatly improves efficiency and usabilityof fuzzers especially by reducing the need for an oraclewhich generates semi-valid inputs or an extensive corpusthat covers most paths in the target

Unfortunately AFL is limited to user space applica-tions and lacks kernel support Fuzzing kernels has aset of additional challenges when compared to userland(or ring 3) fuzzing First crashes and timeouts mandatethe use of virtualization to be able to catch faults andcontinue gracefully Second kernel-level code has sig-nificantly more non-determinism than the average ring 3programmdashmostly due to interrupts kernel threads state-fulness and similar mechanisms This makes fuzzingkernel code challenging Furthermore there is no equiv-alent to command line arguments or stdin to interactwith kernels or drivers in a generic way except for plaininterrupt or sysenter instructions In addition the Win-dows kernel and many relevant drivers and core compo-nents (for Windows macOS and even Linux) are closedsource and cannot be instrumented by common tech-niques without a significant performance overhead

Previous approaches to kernel fuzzing were notportable because they relied on certain drivers or recom-pilation [10 34] were very slow due to emulation togather feedback [7] or simply were not feedback-drivenat all [11]

In this paper we introduce a new technique that al-lows applying feedback fuzzing to arbitrary (even closedsource) x86-64 based kernels without any custom ring0 target code or even OS-specific code at all Wediscuss the design and implementation of kernel-AFL(kAFL) our prototype implementation of the proposedtechniques The overhead for feedback generation is verysmall (less than 5) due to a new CPU feature IntelrsquosProcessor Trace (PT) technology provides control flowinformation on running code We use this informationto construct a feedback mechanism similar to AFLrsquos in-strumentation This allows us to obtain up to 17000 ex-ecutions per second on an off-the-shelf laptop (ThinkpadT460p i7-6700HQ and 32 GB RAM) for simple targetdrivers Additionally we describe an efficient way fordealing with the non-determinisms that occur during ker-nel fuzzing Due to the modular design kAFL is exten-sible to fuzz any x86x86-64 OS We have applied kAFLto Linux macOS and Windows and found multiple pre-viously unknown bugs in kernel drivers in those OSs

In summary our contributions in this paper are

bull OS independence We show that feedback-drivenfuzzing of closed-source kernel mode componentsis possible in an (almost) OS-independent mannerby harnessing the hypervisor (VMM) to producecoverage This allows targeting any x86 operatingsystem kernel or user space component of interest

bull Hardware-assisted feedback Our fuzzing ap-proach utilizes Intelrsquos Processor Trace (PT) tech-nology and thus has a very small performance over-head Additionally our PT-decoder is up to 30 timesfaster than Intelrsquos ptxed decoder Thereby we ob-tain complete trace information that we use to guideour evolutionary fuzzing algorithm to maximize testcoverage

bull Extensible and modular design Our modular de-sign separates the fuzzer the tracing engine andthe target to fuzz This allows to support additionalx86 operating systemsrsquo kernel space and user spacecomponents without the need to develop a systemdriver for the target OS

bull kernel-AFL We incorporated our design con-cepts and developed a prototype called kernel-AFL(kAFL) which was able to find several vulnerabili-ties in kernel components of different operating sys-tems To foster research on this topic we make thesource code of our prototype implementation avail-able at httpsgithubcomRUB-SysSeckAFL

2 Technical Background

We use the Intel Processor Trace (Intel PT) extension ofIA-32 CPUs to obtain coverage information for ring 0execution of arbitrary (even closed-source) OS code Tofacilitate efficient and OS-independent fuzzing we alsomake use of Intelrsquos hardware virtualization features (In-tel VT-x) Hence our approach requires a CPU that sup-ports both Intel VT-x and Intel PT This section provides abrief overview of these hardware features and establishesthe technical foundation for the later sections

21 x86-64 Virtual Memory LayoutsEvery commonly used x86-64 OS uses a split virtualmemory layout The kernel is commonly located at theupper half of each virtual memory space whereas eachuser mode process memory is located in the lower halfFor example the virtual memory space of Linux is typ-ically split into kernel space (upper half) and user space(lower half) each with a size of 247 due to the 48-bitvirtual address limit of current x86-64 CPUs Hencethe kernel memory is mapped to any virtual addressspace and therefore it is located always at the samevirtual address If an user mode process executes thesyscallsysenter instruction for kernel interaction orcauses an exception that has to be handled by the OS theOS will keep the current CR3 value and thus does notswitch the virtual memory address space Instead thecurrent virtual memory address space is reused and thekernel handles the current user mode process related taskwithin the same address space

168 26th USENIX Security Symposium USENIX Association

22 Intel VT-x

The kernel fuzzing approach introduced in this paper re-lies on modern x86-64 hardware virtualization technol-ogy Hence we provide a brief overview of Intelrsquos hard-ware virtualization technology Intel VT-x

We differentiate between three kinds of CPUs phys-ical CPUs logical CPUs and virtual CPUs (vCPUs) Aphysical CPU is a CPU that is implemented in hardwareMost modern CPUs support mechanisms to increasemultithreading performance without additional physicalCPU cores on the die (eg Intel Hyper-Threading) Inthis case there are multiple logical CPUs on one phys-ical CPU These different logical CPUs share the phys-ical CPU and thus only one of them can be active ata time However the execution of the different logicalCPUs is interleaved by the hardware and therefore theavailable resources can be utilized more efficiently (egone logical CPU uses the arithmetic logic unit while an-other logical CPU waits for a data fetch) and the oper-ating system can reduce the scheduling overhead Eachlogical CPU is usually treated like a whole CPU by theoperating system Finally it is possible to create multiplehardware-supported virtual machines (VMs) on a singlelogical CPU In this case each VM has a set of its ownvCPUs

The virtualization role model is divided into two com-ponents the virtual machine monitor (VMM) and theVM The VMM also named hypervisor or host is priv-ileged software that has full control over the physicalCPU and provides virtualized guests with restricted ac-cess to physical resources The VM also termed guest isa piece of software that is transparently executed withinthe virtualized context provided by the VMM

To provide full hardware-assisted virtualization sup-port Intel VT-x adds two additional execution modesto the well-known protection ring based standard modeof execution The default mode of executions is calledVMX OFF It does not implement any hardware virtual-ization support When using hardware-supported virtual-ization the CPU switches into the VMX ON state and dis-tinguishes between two different execution modes thehigher-privileged mode of the hypervisor (VMX root orVMM) and the lower privileged execution mode of thevirtual machine guest (VMX non-root or VM)

When running in guest mode several privileged ac-tions or reasons (execution of restricted instructions ex-pired VMX-preemption timer or access to certain em-ulated devices) in the VM guest will trigger a VM-Exitevent and transfer control to the hypervisor This wayit is possible to run arbitrary software that expects priv-ileged access to the hardware (such as an OS) inside aVM At the same time a higher authority can meditate

and control the operations performed with a small per-formance overhead

To create launch and control a VM the VMM has touse a virtual machine control structure (VMCS) for eachvCPU [28] The VMCS contains all essential informa-tion about the current state and how to perform VMXtransitions of the vCPU

23 Intel Processor TraceWith the fifth generation of Intel Core processors (Broad-well architecture) Intel has introduced a new processorfeature called Intel Processor Trace (Intel PT) to provideexecution and branch tracing information Unlike otherbranch tracing technologies such as Intel Last BranchRecord (LBR) the size of the output buffer is no longerstrictly limited by special registers Instead it is onlylimited by the size of the main memory If the outputtarget is repeatedly and timely emptied we can createtraces of arbitrary length The processorrsquos output formatis packet-oriented and separated into two different typesgeneral execution information and control flow informa-tion packets Intel PT produces various types of con-trol flow related packet types during runtime To obtaincontrol-flow information from the trace data we requirea decoder The decoder needs the traced software to in-terpret the packets that contain the addresses of condi-tional branches

Intel specifies five types of control flow affecting in-structions called Change of Flow Instruction (CoFI) Theexecution of different CoFI types results in different se-quences of flow information packets The three CoFItypes relevant to our work are

1 Taken-Not-Taken (TNT) If the processor exe-cutes any conditional jump the decision whetherthis jump was taken or not is encoded in a TNTpacket

2 Target IP (TIP) If the processor executes an indi-rect jump or transfer instruction the decoder willnot be able to recover the control flow There-fore the processor produces a TIP packet upon theexecution of an instruction of the type indirectbranch near ret or far transfer These TIPpackets store the corresponding target instructionpointer executed by the processor after the transferor jump has occurred

3 Flow Update Packets (FUP) Another case wherethe processor must produce a hint packet for thesoftware decoder are asynchronous events such asinterrupts or traps These events are recorded asFUPs and usually followed by a TIP to indicate thefollowing instruction

USENIX Association 26th USENIX Security Symposium 169

To limit the amount of trace data generated Intel PTprovides multiple options for runtime filtering Depend-ing on the given processor it might be possible to con-figure multiple ranges for instruction-pointer filtering (IPFilter) In general these filter ranges only affect virtualaddresses if paging is enabled this is always the casein x86-64 long-mode Therefore it is possible to limittrace generation to selected ranges and thus avoid hugeamounts of superfluous trace data In accordance to theIP filtering mechanism it is possible to filter traces bythe current privilege level (CPL) of the protection ringmodel (eg ring 0 or ring 3) This filter allows us to selectonly the user mode (CPL gt 0) or kernel mode (CPL = 0)activity kAFL utilizes this filter option to limit tracingexplicitly to kernel mode execution In most cases thefocus of tracing is not the whole OS within all user modeprocesses and their kernel interactions To limit tracedata generation to one specific virtual memory addressspace software can use the CR3 Filter Intel PT will onlyproduce trace data if the CR3 value matches the config-ured filter value The CR3 register contains the pointer tothe current page table The value of the CR3 register canthus be used to filter code executed on behalf of a certainring 3 process even in ring 0 mode

Intel PT supports various configurable target domainsfor output data kAFL focuses on the Table of PhysicalAddresses (ToPA) mechanism that enables us to specifymultiple output regions Every ToPA table contains mul-tiple ToPA entries which in turn contain the physical ad-dress of the associated memory chunk used to store tracedata Each ToPA entry contains the physical address asize specifier for the referred memory chunk in physicalmemory and multiple type bits These type bits specifythe CPUrsquos behavior on access of the ToPA entry and howto deal with filled output regions

3 System Overview

We now provide a high-level overview of the design of anOS-independent and hardware-assisted feedback fuzzerbefore presenting the implementation details of our toolcalled kAFL in Section 4

Our system is split into three components thefuzzing logic the VM infrastructure (modified versionsof QEMU and KVM denoted by QEMU-PT and KVM-PT) and the user mode agent The fuzzing logic runsas a ring 3 process on the host OS This logic is also re-ferred to as kAFL The VM infrastructure consists of aring 3 component (QEMU-PT) and a ring 0 component(KVM-PT) This facilitates communication between theother two components and makes the Intel PT trace dataavailable to the fuzzing logic In general the guest onlycommunicates with the host via hypercalls The host canthen read and write guest memory and continues VM ex-

Figure 1 High-level overview of the kAFL architectureThe setup process ( 1copy- 3copy) is not shown

ecution once the request has been handled A overviewof the architecture can be seen in Figure 1

We now outline the events and communication thattake place during a fuzz run as depicted in Figure 2When the VM is started the first part of the user modeagent (the loader) uses the hypercall HC_SUBMIT_PANICto submit the address of the kernel panic handler (or theBugCheck kernel address in Windows) to QEMU-PT 1copyQEMU-PT then patches a hypercall calling routine at theaddress of the panic handler This allows us to get noti-fied and react fast to crashes in the VM (instead of wait-ing for timeouts reboots)

Then the loader uses the hypercall HC_GET_PROGRAM torequest the actual user mode agent and starts it 2copy Nowthe loader setup is complete and the fuzzer begins its ini-tialization The agent triggers a HC_SUBMIT_CR3 hyper-call that will be handled by KVM-PT The hypervisorextracts the CR3 value of the currently running processand hands it over to QEMU-PT for filtering 3copy Finallythe agent uses the hypercall HC_SUBMIT_BUFFER to in-form the host at which address it expects its inputs Thefuzzer setup is now finished and the main fuzzing loopstarts

During the main loop the agent requests a new inputusing the HC_GET_INPUT hypercall 4copy The fuzzing logicproduces a new input and sends it to QEMU-PT SinceQEMU-PT has full access to the guestrsquos memory spaceit can simply copy the input into the buffer specified bythe agent Then it performs a VM-Entry to continue ex-ecuting the VM 5copy At the same time this VM-Entryevent enables the PT tracing mechanism The agent nowconsumes the input and interacts with the kernel (egit interprets the input as a file system image and tries tomount it 6copy) While the kernel is being fuzzed QEMU-PT decodes the trace data and updates the bitmap on de-mand Once the interaction is finished and the kernelhanded control back to the agent the agent notifies thehypervisor via a HC_FINISHED hypercall The resultingVM-Exit stops the tracing and QEMU-PT decodes theremaining trace data 7copy The resulting bitmap is passed

170 26th USENIX Security Symposium USENIX Association

Figure 2 Overview of the kAFL hypercall interaction

to the logic for further processing 8copy Afterwards theagent can continue to run any untraced clean-up routinesbefore issuing another HC_GET_INPUT to start the nextloop iteration

31 Fuzzing Logic

The fuzzing logic is the command and controlling com-ponent of kAFL It manages the queue of interestinginputs creates mutated inputs and schedules them forevaluation In most aspects it is based on the algo-rithms used by AFL Similarly to AFL we use a bitmapto store basic block transitions We gather the AFLbitmap from the VMs through an interface to QEMU-PTand decide which inputs triggered interesting behaviourThe fuzzing logic also coordinates the number of VMsspawned in parallel One of the bigger design differencesto AFL is that kAFL makes extensive use of multipro-cessing and parallelism where AFL simply spawns mul-tiple independent fuzzers which synchronize their inputqueues sporadically1 In contrast kAFL executes the de-terministic stage in parallel and all threads work on themost interesting input A significant amount of time isspent in tasks that are not CPU-bound (such as gueststhat delay execution) Therefore using many parallelprocesses (upto 5-6 per CPU core) drastically improvesperformance of the fuzzing process due to a higher CPUload per core Lastly the fuzzing logic communicateswith the user interface to display current statistics in reg-ular intervals

1AFL recently added experimental support for distributing thedeterministic stage see httpsgithubcommirroreraflblobmasterdocsparallel_fuzzingtxtL60-L66

32 User Mode AgentWe expect a user mode agent to run inside the virtual-ized target OS In principle this component only has tosynchronize and gather new inputs by the fuzzing logicvia the hypercall interface and use it to interact with theguestrsquos kernel Example agents are programs that try tomount inputs as file system images pass specific filessuch as certificates to kernel parser or even execute achain of various syscalls

In theory we only need one such component In prac-tice we use two different components The first programis the loader component Its job is to accept an arbitrarybinary via the hypercall interface This binary representsthe user mode agent and is executed by the loader com-ponent Additionally the loader component will checkif the agent has crashed (which happens often in case ofsyscall fuzzing) and restarts it if necessary This setuphas the advantage that we can pass any binary to theVM and reuse VM snapshots for different fuzzing com-ponents

33 Virtualization InfrastructureThe fuzzing logic uses QEMU-PT to interact with KVM-PT to spawn the target VMs KVM-PT allows us to traceindividual vCPUs instead of logical CPUs This com-ponent configures and enables Intel PT on the respec-tive logical CPU before the CPU switches to guest ex-ecution and disables tracing during the VM-Exit tran-sition This way the associated CPU will only pro-vide trace data of the virtualized kernel itself QEMU-PT is used to interact with the KVM-PT interface toconfigure and toggle Intel PT from user space and ac-cess the output buffer to decode the trace data The

USENIX Association 26th USENIX Security Symposium 171

decoded trace data is directly translated into a streamof addresses of executed conditional branch instruc-tions Moreover QEMU-PT also filters the stream ofexecuted addressesmdashbased on previous knowledge ofnon-deterministic basic blocksmdashto prevent false-positivefuzzing results and makes those available to the fuzzinglogic as AFL-compatible bitmaps We use our own cus-tom Intel PT decoder to cache disassembly results whichleads to significant performance gains compared to theoff-the-shelf solution provided by Intel

34 Stateful and Non-Deterministic Code

Tracing operating systems results in a significant amountof non-determinism The largest source of non-deterministic basic block transitions are interrupts whichcan occur at any point in time Additionally our imple-mentation does not reset the whole state after each exe-cution since reloading the VM from a memory snapshotis costly Thus we have to deal with the stateful and asyn-chronous aspects of the kernel An example for statefulcode might be a simple call to kmalloc() Dependingon the number of previous allocations kmalloc() mightsimply return a fresh pointer or map a whole range ofpages and update a significant amount of metadata Weuse two techniques to deal with these challenges

The first one is to filter out interrupts and the transi-tion caused while handling interrupts This is possibleusing the Intel PT trace data If an interrupt occurs theprocessor emits a TIP instruction since the transfer is notvisible in the code To avoid confusion during an inter-rupt occurring at an indirect control flow instruction theTIP packet is marked with FUP (flow update packet) toindicate an asynchronous event After identifying such asignature the decoder will drop all basic blocks visiteduntil the corresponding iret instruction is encounteredTo link the interrupts with their corresponding iret wetrack all interrupts on a simple call stack This mecha-nism is necessary since the interrupt handler may itselfbe interrupted by another interrupt

The second mechanism is to blacklist any basic blockthat occurs non-deterministically Each time we en-counter a new bit in the AFL bitmap we re-run the in-put several times in a row Every basic block that doesnot show up in all of the trials will be marked as non-deterministic and filtered from further processing Forfast access the results are stored in a bitmap of black-listed basic block addresses During the AFL bitmaptranslation any transition hash valuemdashwhich combinesthe current basic block address and the previous ba-sic block addressmdashinvolving a blacklisted block will beskipped

35 Hypercalls

Hypercalls are a feature introduced by virtualization OnIntel platforms hypercalls are triggered by the vmcallinstruction Hypercalls are to VMMs as syscalls are tokernels If any ring 3 process or the kernel in the VM ex-ecutes a vmcall instruction a VM-Exit event is triggeredand the VMM can decide how to process the hypercallWe patched KVM-PT to pass through our own set of hy-percalls to the fuzzing logic if a magic value is passedin rax and the appropriate hypercall-ID is set in rbxAdditionally we also patched KVM-PT to accept hyper-calls from ring 3 Arguments for specific hypercalls arepassed through rcx We use this mechanism to definean interface that user mode agent can use to communi-cate with the fuzzing logic One example hypercall isHC_SUBMIT_BUFFER Its argument is a guest pointer thatis stored in rcx Upon executing the vmcall instructiona VM-Exit is triggered and QEMU-PT stores the bufferpointer that was passed It will later copy the new inputdata into this buffer (see step 5copy in Figure 2) Finally theexecution of the VM is continued

climov rax KAFL_MAGIC_VALUEmov rbx HC_CRASHmov rcx 0x0vmcall

Listing 1 Hypercall crash notifier

Another use case for this interface is to notify thefuzzing logic when a crash occurs in the target OS kernelIn order to do so we overwrite the kernel crash handlerof the OS with a simple hypercall routine The injectedcode is shown in Listing 1 and displays how the hyper-call interface is used on the assembly level The cli in-struction disables all interrupts to avoid any kind of asyn-chronous interference during the hypercall routine

4 Implementation Details

Based on the design outlined in the previous section webuilt a prototype of our approach called kAFL In the fol-lowing we describe several implementation details Thesource code of our reference implementation is availableat httpsgithubcomRUB-SysSeckAFL

41 KVM-PT

Intel PT allows us to trace branch transitions withoutpatching or recompiling the targeted kernel To the bestof our knowledge no publicly available driver is able totrace only guest executions of a single vCPU using In-tel PT for long periods of time For instance Simple-PT[29] does not support long-term tracing by design The

172 26th USENIX Security Symposium USENIX Association

perf-subsystem [5] supports tracing of VM guest oper-ations and long-term tracing However it is designed totrace logical CPUs not vCPUs Even if VMX executionis traced the data would be associated with logical CPUsand not with vCPUs Hence the VMX context must bereassembled which is a costly task

To address these shortcomings we developed KVM-PT It allows us to trace vCPUs for an indefinite amountof time without any scheduling side effects or any lossof trace data due to overflowing output regions The ex-tension provides a fast and reliable trace mechanism forKVM vCPUs Moreover this extension exposes muchlike KVM an extensive user mode interface to accessthis tracing feature from user space QEMU-PT utilizesthis novel interface to interact with KVM-PT and to ac-cess the resulting trace data

411 vCPU Specific Traces

To enable Intel PT software that runs within ring0 (in our case KVM-PT) has to set the corre-sponding bit of a model specific register (MSR)(IA32_RTIT_CTL_MSRTraceEn) [28] After tracing isenabled the logical CPU will trace any executed code ifit satisfies the configured filter options The modificationhas to be done before the CPU switches from the hostcontext to the VM operation otherwise the CPU will ex-ecute guest code and is technically unable to modify anyhost MSRs The inverse procedure is required after theCPU has left the guest context However enabling or dis-abling Intel PT manually will also yield a trace contain-ing the manual MSR modification To prevent the collec-tion of unwanted trace data within the VMM we use theMSR autoload capabilities of Intel VT-x MSR autoload-ing can be enabled by modifying the corresponding en-tries in the VMCS (eg VM_ENTRY_CONTROL_MSRfor VM-entries) This forces the CPU to load a list of pre-configured values for defined MSRs after either a VM-entry or VM-exit has occurred By enabling tracing viaMSR autoloading we only gather Intel PT trace data forone specific vCPU

412 Continuous Tracing

Once we have enabled Intel PT the CPU will write theresulting trace data into a memory buffer until it is fullThe physical addresses of this buffer and how to han-dle full buffers is specified by an array of data structurescalled Table of Physical Addresses (ToPA) entries

The array can contain multiple entries and has to beterminated by a single END entry 3copy There are twodifferent ways the CPU can handle an overflow It canstop the tracing (while continuing the executionmdashthus

Figure 3 KVM-PT ToPA configuration

resulting in incomplete traces) or it can raise an inter-rupt This interrupt causes a VM-exit since it is not mask-able We catch the interrupt on the host and consume thetrace data Finally we reset the buffers and continue withthe VM execution Unfortunately this interrupt might beraised at an unspecified time after the buffer was filled2Our configuration of the ToPA entries can be seen in Fig-ure 3 To avoid losing trace data we use two differentToPA entries The first one is the main buffer 1copy Itsoverflow behavior is to trigger the interrupt Once themain buffer is filled a second entry is used until the in-terrupt is actually delivered The ToPA specifies anothersmaller buffer 2copy Overflowing the second buffer wouldlead to the stop of the tracing To avoid the resultingdata loss we chose the second buffer to be about fourtimes larger than the largest overflowing trace we haveever seen in our tests (4 KB)

In case the second buffer also overflows the followingtrace will contain a packet indicating that some data ismissing In that case the size of the second buffer cansimply be increased This way we manage to obtain pre-cise traces for any amount of trace data

42 QEMU-PTTo make use of the KVM extension KVM-PT an userspace counterpart is required QEMU-PT is an extensionof QEMU and provides full support for KVM-PTrsquos userspace interface This interface provides mechanisms toenable disable and configure Intel PT at runtime as wellas a periodic ToPA status check to avoid overruns KVM-PT is accessible from user mode via ioctl() commandsand an mmap() interface

In addition to being a userland interface to KVM-PTQEMU-PT includes a component that decodes trace datainto a form more suitable for the fuzzing logic We de-code the Intel PT packets and turn them into an AFL-likebitmap

421 PT Decoder

Extensive kernel fuzzing may generate several hundredsof megabytes of trace data per second To deal with

2This is due to the current implementation of this interrupt Intelspecifies the interrupt as not precise which means it is likely that fur-ther data will be written to the next buffer or tracing will be terminatedand data will be discarded

USENIX Association 26th USENIX Security Symposium 173

Figure 4 Overview of the pipeline that converts Intel PT traces to kAFL bitmaps

such large amounts of incoming data the decoder mustbe implemented with a focus on efficiency Otherwisethe decoder may become the major bottleneck duringthe fuzzing process Nevertheless the decoder must alsobe precise as inaccuracies during the decoding processwould result in further errors This is due to the nature ofIntel PT decoding since the decoding process is sequen-tial and is affected by previously decoded packets

To ease efforts to implement an Intel PT softwaredecoder Intel provides its own decoding engine calledlibipt [4] libipt is a general-purpose Intel PT decod-ing engine However it does not fit our purposes verywell because libipt decodes trace data in order to pro-vide execution data and flow information Furthermorelibipt does not cache disassembled instructions and hasperformed poorly in our use cases

Since kAFL only relies on flow information and thefuzzing process is repeatedly applied to the same codeit is possible to optimize the decoding process Our In-tel PT software decoder acts like a just-in-time decoderwhich means that code sections are only considered ifthey are executed according to the decoded trace dataTo optimize further look-ups all disassembled code sec-tions are cached In addition we simply ignore packetsthat are not relevant for our use case

Since our PT decoder is part of QEMU-PT trace datais directly processed if the ToPA base region is filledThe decoding process is applied in-place since the bufferis directly accessible from user space via mmap() Un-like other Intel PT drivers we do not need to store largeamounts of trace data in memory or on storage devicesfor post-mortem decoding Eventually the decoded tracedata is translated to the AFL bitmap format

43 AFL Fuzzing Logic

We give a brief description of the fuzzing parts of AFLbecause the logic we use to perform scheduling and mu-tations closely follows that of AFL The most importantaspect of AFL is the bitmap used to trace which basicblock transitions where encountered Each basic blockhas a randomly assigned ID and each transition from ba-sic block A to another basic block B is assigned an offsetinto the bitmap according to the following formula

(id(A)2oplus id(B)) SIZE_OF_BITMAP

Instead of the compile-time random kAFL uses theaddresses of the basic blocks Each time the transition isobserved the corresponding byte in the bitmap is incre-mented After finishing the fuzzing iteration each entryof the bitmap is rounded such that only the highest bit re-mains set Then the bitmap is compared with the globalstatic bitmap to see if any new bit was found If a new bitwas found it is added to the global bitmap and the inputthat triggered the new bit is added to the queue Whena new interesting input is found a deterministic stage isexecuted that tries to mutate each byte individually

Once the deterministic stage is finished the non-deterministic phase is started During this non-deterministic phase multiple mutations are performed atrandom locations If the deterministic phase finds newinputs the non-deterministic phase will be delayed un-til all deterministic phases of all interesting inputs havebeen performed If an input triggers an entirely new tran-sition (as opposed to a change in the number of times thetransition was taken) it will be favored and fuzzed witha higher priority

5 Evaluation

Based on our implementation we now describe thedifferent fuzzing campaigns we performed to evaluatekAFL We evaluate kAFLrsquos fuzzing performance acrossdifferent platforms Section 55 provides an overview ofall reported vulnerabilities crashes and bugs that werefound during the development process of kAFL We alsoevaluate kAFLrsquos ability to find a previously known vul-nerability Finally in Section 56 the overall fuzzingperformance of kAFL is compared to ProjectTriforcethe only other OS-independent feedback fuzzer avail-able TriforceAFL is based on the emulation backendof QEMU instead of hardware-assisted virtualization andIntel PT The performance overhead of KVM-PT is dis-cussed in Section 57 Additionally a performance com-parison of our PT decoder and an Intel implementationof a software decoder is given in Section 58

If not stated otherwise the benchmarks were per-formed on a desktop system with an Intel i7-6700 pro-cessor and 32GB DDR4 RAM To avoid distortions due

174 26th USENIX Security Symposium USENIX Association

to poor IO performance all benchmarks are performedon a RAM disk Similar to AFL we consider a crash-ing input to be unique if it triggered at least one basicblock transition which has not been triggered by any pre-vious crash (ie the bitmap contains at least one newbit) Note this does not imply that the underlying bugsare truly unique

51 Fuzzing Windows

We implemented a small Windows 10 specific user modeagent that mounts any data chunk (fuzzed payload) asNTFS-partitioned volume (289 lines of C code) Weused the Virtual Hard Disk (VHD) API and variousIOCTLS to mount and unmount volumes programmat-ically [31 32] Unfortunately mounting volumes is aslow operation under Windows and we only managedto achieve a throughput of 20 executions per secondNonetheless kAFL managed to find a crash in the NTFSdriver The fuzzer ran for 4 days and 14 hours and re-ported 59 unique crashes all of which were division byzero crashes After manual investigation we suspect thatthere is only one unique bug While it does not allowcode execution it is still a denial-of-service vulnerabilityas for example a USB stick with that malicious NTFSvolume plugged into a critical system will crash that sys-tem with a blue screen It seems that we only scratchedthe surface and NTFS was not thoroughly fuzzed yetHence we assume that the NTFS driver under Windowsis a valuable target for coverage-based feedback fuzzing

Furthermore we implemented a generic system call(syscall) fuzzing agent that simply passes a block of datato a syscall by setting all registers and the top stack re-gion (55 lines of C and 46 lines of assembly code) Thisallows to set parameters for a syscall with a fuzzing pay-load independent of the OS ABI The same fuzzer canbe used to attack syscalls on different operation sys-tems such as Linux or macOS However we evaluatedit against the Windows kernel given the proprietary na-ture of this OS We did not find any bugs in 13 hoursof fuzzing with approx 63M executions since manysyscalls cause the userspace agent to terminate Dueto the coverage-guided feedback kAFL quickly learnedhow to generate payloads to execute valid syscalls andthis led to the unexpected execution of user mode call-backs via the kernel within the fuzzing agent Thesecrashes require rather expensive restarts of the agent andtherefore we only achieved approx 134 executions persecond while normally kAFL achieves a throughput of1000 to 4000 tests per second (see Section 52) Ad-ditionally the Windows syscall interface has already re-ceived much attention by the security community

Figure 5 Fuzzing the ext4 kernel module for 32 hours

52 Fuzzing Linux

We implemented a similar agent for Linux whichmounts data as ext4 volumes (66 lines of C code) Westarted the fuzz campaign with a minimal 64KB ext4 im-age as initial input However we configured the fuzzersuch that it only fuzzes the first two kilobytes duringthe deterministic phase In contrast to Windows theLinux mount process is very fast and we reached 1000to 2000 tests per second on a Thinkpad laptop with ai7-6700HQ26GHz CPU and 32GB RAM Due to thishigh performance we obtained significantly better cov-erage and managed to discover 160 unique crashes andmultiple (confirmed) bugs in the ext4 driver during atwelve-day fuzzing campaign Figure 5 shows the first32 hours of another fuzzing run The fuzzing process wasstill finding new paths and crashes on a fairly regular ba-sis after 32 hours An interesting observation is that therewas no new coverage produced between hours 16 and 25yet the number of inputs increased due a higher numberof loop iterations After hour 25 a truly new input wasfound that unlocked significant parts of the codebase

53 Fuzzing macOS

Similarly to Windows and Linux we targeted multiplefile systems for macOS So far we found approximately150 crashes in the HFS driver and manually confirmedthat at least three of them are unique bugs that lead toa kernel panic Those bugs can be triggered by unprivi-leged users and therefore could very well be abused forlocal denial-of-service attacks One of these bugs seemsto be a use-after-free vulnerability that leads to full con-trol of the rip register Additionally kAFL found 220unique crashes in the APFS kernel extension All 3 HFSvulnerabilities and multiple APFS flaws have been re-ported to Apple

USENIX Association 26th USENIX Security Symposium 175

54 Rediscovery of Known BugsWe evaluated kAFL on the keyctl interface which al-lows a user space program to store and manage vari-ous kinds of key material in the kernel More specif-ically it features a DER (see RFC5280) parser to loadcertificates This functionality had a known bug (CVE-2016-07583) We tested kAFL against the same interfaceon a vulnerable kernel (version 432) kAFL was ableto uncover the same problem and one additional previ-ously unknown bug that was assigned CVE-2016-86504kAFL managed to trigger 17 unique KASan reports and15 unique panics in just one hour of execution time Dur-ing this experiment kAFL generated over 34 million in-puts found 295 interesting inputs and performed nearly9000 executions per second This experiment was per-formed while running 8 processes in parallel

55 Detected VulnerabilitiesDuring the evaluation kAFL found more than a thou-sand unique crashes We evaluated some manually andfound multiple security vulnerabilities in all tested op-erating systems such as Linux Windows and macOSSo far eight bugs were reported and three of them wereconfirmed by the maintainers

bull Linux keyctl Null Pointer Dereference5 (CVE-2016-86506)

bull Linux ext4 Memory Corruption7

bull Linux ext4 Error Handling8

bull Windows NTFS Div-by-Zero9

bull macOS HFS Div-by-Zero10

bull macOS HFS Assertion Fail10

bull macOS HFS Use-After-Free10

bull macOS APFS Memory Corruption10

Red Hat has assigned a CVE number for the first re-ported security flaw which triggers a null pointer def-erence and a partial memory corruption in the kernelASN1 parser if an RSA certificate with a zero expo-nent is presented For the second reported vulnerabil-ity which triggers a memory corruption in the ext4 file

3httpsaccessredhatcomsecuritycvecve-2016-07584httpsaccessredhatcomsecuritycvecve-2016-86505httpseclistsorgfulldisclosure2016Nov766httpsaccessredhatcomsecuritycvecve-2016-86507httpseclistsorgfulldisclosure2016Nov758httpseclistsorgbugtraq2016Nov19Reported to Microsoft Security

10Reported to Apple Product Security

system a mainline patch was proposed The last re-ported Linux vulnerability which calls in the ext4 errorhandling routine panic() and hence results in a kernelpanic was at the time of writing not investigated any fur-ther The NTFS bug in Windows 10 is a non-recoverableerror condition which leads to a blue screen This bugwas reported to Microsoft but has not been confirmedyet Similarly Apple has not yet verified or confirmedour reported macOS bugs

56 Fuzzing PerformanceWe compare the overall performance of kAFL across dif-ferent operating systems To ensure comparable resultswe created a simple driver that contains a JSON parserbased on jsmn11 for each aforementioned operating sys-tem and used it to decode user input (see Listing 2) Ifthe user input is a JSON string starting with KAFL acrash is triggered We traced both the JSON parser aswell as the final check This way kAFL was able to learncorrect JSON syntax We measured the time used to findthe crash the number of executions per second and thespeed for new paths to be discovered on all three targetoperating systems

1 jsmn_parser parser2 jsmntok_t tokens [5]3 jsmn_init (amp parser)4

5 int res=jsmn_parse (ampparser input size tokens 5)6 if(res gt= 2)7 if(tokens [0] type == JSMN_STRING)8 int json_len = tokens [0] end - tokens [0]

start9 int s = tokens [0] start

10 if(json_len gt 0 ampamp input[s+0] == K)11 if(json_len gt 1 ampamp input[s+1] == A)12 if(json_len gt 2 ampamp input[s+2] == F)13 if(json_len gt 3 ampamp input[s+3] == L)14 panic(KERN_INFO KAFL n)15 16

Listing 2 The JSON parser kernel module used for thecoverage benchmarks

We performed five repeated experiments for each op-erating system Additionally we tested TriforceAFL withthe Linux target driver TriforceAFL is unable to fuzzWindows and macOS To compare TriforceAFL withkAFL the associated TriforceLinuxSyscallFuzzer wasslightly modified to work with our vulnerable Linux ker-nel module Unfortunately it was not possible to com-pare kAFL against Oraclersquos file system fuzzer [34] dueto technical issues with its setup

During each run we fuzzed the JSON parser for 30minutes The averaged and rounded results are displayedin Table 1 As we can see the performance of kAFLis very similar across different systems It should be re-marked that the variance in this experiment is rather high

11httpzsergecomjsmnhtml

176 26th USENIX Security Symposium USENIX Association

ExecsSec(1 Process)

ExecsSec(8 Processes)

Time to Crash(1 Process)

Time to Crash(8 Processes)

PathsMin(1 Process)

PathsMin(8 Processes)

TriforceAFL 150 320 -a -a 1008 -b

Linux (initramfs) 3000 5700 750 600 1584 1562Debian 8 3000 5700 455 630 1620 1600Debian 8 (KASan) 4300 5700 920 600 1622 1590macOS (10124) 5100 8100 743 510 1450 1506Windows 10 4300 8700 414 450 1150 1202

a Not found during 30-minute experimentsb This value cannot be obtained since TriforceAFL does not synchronize in such short time frames

Table 1 kAFL and TriforceAFL fuzzing performance on the JSON sample driver

Figure 6 Coverage comparison of kAFL (initramfs) andTriforceAFL kAFL takes less than 3 minutes to find thesame number of paths as TriforceAFL does in 30 minutes(each running 1 process)

and explains some of the surprising results This is due tothe stochastic nature of fuzzing since each fuzzer findsvastly different paths some of which may take signifi-cantly longer to process especially crashing paths andloops One example for high variance is the fact that onDebian 8 (initramfs) the multiprocessing configurationon average needed more time to find the crash than oneprocess

TriforceAFL We used the JSON driver to comparekAFL and TriforceAFL with respect to execution speedand code coverage However the results where biasedheavily in two ways TriforceAFL did not manage to finda path that triggers the crash within 30 minutes (usuallyit takes approximately 2 hours) making it very hard tocompare the code coverage of kAFL and TriforceAFLThe number of discovered paths is not a good indica-tor for the amount of coverage With increasing runningtime it becomes more difficult to discover new pathsSecondly the number of executions per second is also bi-ased by slower and harder to reach paths and especiallycrashing inputs The coverage reached over time can beseen in Figure 6 It is obvious from the figure that kAFLfound a significant number of paths that are very hard to

Figure 7 Raw execution performance comparison

reach for TriforceAFL kAFL mostly stops finding newpaths around the 10-15 minute mark because the targetdriver simply doesnrsquot contain any more paths to be un-covered Therefore the coverage value in Table 1 (statedas PathsMin) is limited to the first 10 minutes of each30-minute run

We also compare raw execution performance insteadof overall fuzzing performance which is biased becauseof the execution of different paths the sampling processfor the non-determinism-filter and various synchroniza-tion mechanisms Especially on smaller inputs thesefactors disproportionately affect the overall fuzzing per-formance To avoid this we compared the performanceduring the first havoc stage Figure 7 shows the raw ex-ecution performance of kAFL compared to TriforceAFLduring this havoc phase kAFL provides up to 54 timesbetter performance compared to TriforceAFLrsquos QEMUCPU emulation Slightly lower performance boosts areseen in single-process execution (48 times faster)

syzkaller We did not perform a performance compari-son against syzkaller [10] This has two reasons First ofall syzkaller is a highly specific syscall fuzzer that en-codes a significant amount of domain knowledge and istherefore not applicable to other domains such as filesys-tem images On the other hand syzkaller would mostlikely generate a significantly higher code coverage evenwithout any feedback since it knows how to generate

USENIX Association 26th USENIX Security Symposium 177

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 3: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

format This greatly improves efficiency and usabilityof fuzzers especially by reducing the need for an oraclewhich generates semi-valid inputs or an extensive corpusthat covers most paths in the target

Unfortunately AFL is limited to user space applica-tions and lacks kernel support Fuzzing kernels has aset of additional challenges when compared to userland(or ring 3) fuzzing First crashes and timeouts mandatethe use of virtualization to be able to catch faults andcontinue gracefully Second kernel-level code has sig-nificantly more non-determinism than the average ring 3programmdashmostly due to interrupts kernel threads state-fulness and similar mechanisms This makes fuzzingkernel code challenging Furthermore there is no equiv-alent to command line arguments or stdin to interactwith kernels or drivers in a generic way except for plaininterrupt or sysenter instructions In addition the Win-dows kernel and many relevant drivers and core compo-nents (for Windows macOS and even Linux) are closedsource and cannot be instrumented by common tech-niques without a significant performance overhead

Previous approaches to kernel fuzzing were notportable because they relied on certain drivers or recom-pilation [10 34] were very slow due to emulation togather feedback [7] or simply were not feedback-drivenat all [11]

In this paper we introduce a new technique that al-lows applying feedback fuzzing to arbitrary (even closedsource) x86-64 based kernels without any custom ring0 target code or even OS-specific code at all Wediscuss the design and implementation of kernel-AFL(kAFL) our prototype implementation of the proposedtechniques The overhead for feedback generation is verysmall (less than 5) due to a new CPU feature IntelrsquosProcessor Trace (PT) technology provides control flowinformation on running code We use this informationto construct a feedback mechanism similar to AFLrsquos in-strumentation This allows us to obtain up to 17000 ex-ecutions per second on an off-the-shelf laptop (ThinkpadT460p i7-6700HQ and 32 GB RAM) for simple targetdrivers Additionally we describe an efficient way fordealing with the non-determinisms that occur during ker-nel fuzzing Due to the modular design kAFL is exten-sible to fuzz any x86x86-64 OS We have applied kAFLto Linux macOS and Windows and found multiple pre-viously unknown bugs in kernel drivers in those OSs

In summary our contributions in this paper are

bull OS independence We show that feedback-drivenfuzzing of closed-source kernel mode componentsis possible in an (almost) OS-independent mannerby harnessing the hypervisor (VMM) to producecoverage This allows targeting any x86 operatingsystem kernel or user space component of interest

bull Hardware-assisted feedback Our fuzzing ap-proach utilizes Intelrsquos Processor Trace (PT) tech-nology and thus has a very small performance over-head Additionally our PT-decoder is up to 30 timesfaster than Intelrsquos ptxed decoder Thereby we ob-tain complete trace information that we use to guideour evolutionary fuzzing algorithm to maximize testcoverage

bull Extensible and modular design Our modular de-sign separates the fuzzer the tracing engine andthe target to fuzz This allows to support additionalx86 operating systemsrsquo kernel space and user spacecomponents without the need to develop a systemdriver for the target OS

bull kernel-AFL We incorporated our design con-cepts and developed a prototype called kernel-AFL(kAFL) which was able to find several vulnerabili-ties in kernel components of different operating sys-tems To foster research on this topic we make thesource code of our prototype implementation avail-able at httpsgithubcomRUB-SysSeckAFL

2 Technical Background

We use the Intel Processor Trace (Intel PT) extension ofIA-32 CPUs to obtain coverage information for ring 0execution of arbitrary (even closed-source) OS code Tofacilitate efficient and OS-independent fuzzing we alsomake use of Intelrsquos hardware virtualization features (In-tel VT-x) Hence our approach requires a CPU that sup-ports both Intel VT-x and Intel PT This section provides abrief overview of these hardware features and establishesthe technical foundation for the later sections

21 x86-64 Virtual Memory LayoutsEvery commonly used x86-64 OS uses a split virtualmemory layout The kernel is commonly located at theupper half of each virtual memory space whereas eachuser mode process memory is located in the lower halfFor example the virtual memory space of Linux is typ-ically split into kernel space (upper half) and user space(lower half) each with a size of 247 due to the 48-bitvirtual address limit of current x86-64 CPUs Hencethe kernel memory is mapped to any virtual addressspace and therefore it is located always at the samevirtual address If an user mode process executes thesyscallsysenter instruction for kernel interaction orcauses an exception that has to be handled by the OS theOS will keep the current CR3 value and thus does notswitch the virtual memory address space Instead thecurrent virtual memory address space is reused and thekernel handles the current user mode process related taskwithin the same address space

168 26th USENIX Security Symposium USENIX Association

22 Intel VT-x

The kernel fuzzing approach introduced in this paper re-lies on modern x86-64 hardware virtualization technol-ogy Hence we provide a brief overview of Intelrsquos hard-ware virtualization technology Intel VT-x

We differentiate between three kinds of CPUs phys-ical CPUs logical CPUs and virtual CPUs (vCPUs) Aphysical CPU is a CPU that is implemented in hardwareMost modern CPUs support mechanisms to increasemultithreading performance without additional physicalCPU cores on the die (eg Intel Hyper-Threading) Inthis case there are multiple logical CPUs on one phys-ical CPU These different logical CPUs share the phys-ical CPU and thus only one of them can be active ata time However the execution of the different logicalCPUs is interleaved by the hardware and therefore theavailable resources can be utilized more efficiently (egone logical CPU uses the arithmetic logic unit while an-other logical CPU waits for a data fetch) and the oper-ating system can reduce the scheduling overhead Eachlogical CPU is usually treated like a whole CPU by theoperating system Finally it is possible to create multiplehardware-supported virtual machines (VMs) on a singlelogical CPU In this case each VM has a set of its ownvCPUs

The virtualization role model is divided into two com-ponents the virtual machine monitor (VMM) and theVM The VMM also named hypervisor or host is priv-ileged software that has full control over the physicalCPU and provides virtualized guests with restricted ac-cess to physical resources The VM also termed guest isa piece of software that is transparently executed withinthe virtualized context provided by the VMM

To provide full hardware-assisted virtualization sup-port Intel VT-x adds two additional execution modesto the well-known protection ring based standard modeof execution The default mode of executions is calledVMX OFF It does not implement any hardware virtual-ization support When using hardware-supported virtual-ization the CPU switches into the VMX ON state and dis-tinguishes between two different execution modes thehigher-privileged mode of the hypervisor (VMX root orVMM) and the lower privileged execution mode of thevirtual machine guest (VMX non-root or VM)

When running in guest mode several privileged ac-tions or reasons (execution of restricted instructions ex-pired VMX-preemption timer or access to certain em-ulated devices) in the VM guest will trigger a VM-Exitevent and transfer control to the hypervisor This wayit is possible to run arbitrary software that expects priv-ileged access to the hardware (such as an OS) inside aVM At the same time a higher authority can meditate

and control the operations performed with a small per-formance overhead

To create launch and control a VM the VMM has touse a virtual machine control structure (VMCS) for eachvCPU [28] The VMCS contains all essential informa-tion about the current state and how to perform VMXtransitions of the vCPU

23 Intel Processor TraceWith the fifth generation of Intel Core processors (Broad-well architecture) Intel has introduced a new processorfeature called Intel Processor Trace (Intel PT) to provideexecution and branch tracing information Unlike otherbranch tracing technologies such as Intel Last BranchRecord (LBR) the size of the output buffer is no longerstrictly limited by special registers Instead it is onlylimited by the size of the main memory If the outputtarget is repeatedly and timely emptied we can createtraces of arbitrary length The processorrsquos output formatis packet-oriented and separated into two different typesgeneral execution information and control flow informa-tion packets Intel PT produces various types of con-trol flow related packet types during runtime To obtaincontrol-flow information from the trace data we requirea decoder The decoder needs the traced software to in-terpret the packets that contain the addresses of condi-tional branches

Intel specifies five types of control flow affecting in-structions called Change of Flow Instruction (CoFI) Theexecution of different CoFI types results in different se-quences of flow information packets The three CoFItypes relevant to our work are

1 Taken-Not-Taken (TNT) If the processor exe-cutes any conditional jump the decision whetherthis jump was taken or not is encoded in a TNTpacket

2 Target IP (TIP) If the processor executes an indi-rect jump or transfer instruction the decoder willnot be able to recover the control flow There-fore the processor produces a TIP packet upon theexecution of an instruction of the type indirectbranch near ret or far transfer These TIPpackets store the corresponding target instructionpointer executed by the processor after the transferor jump has occurred

3 Flow Update Packets (FUP) Another case wherethe processor must produce a hint packet for thesoftware decoder are asynchronous events such asinterrupts or traps These events are recorded asFUPs and usually followed by a TIP to indicate thefollowing instruction

USENIX Association 26th USENIX Security Symposium 169

To limit the amount of trace data generated Intel PTprovides multiple options for runtime filtering Depend-ing on the given processor it might be possible to con-figure multiple ranges for instruction-pointer filtering (IPFilter) In general these filter ranges only affect virtualaddresses if paging is enabled this is always the casein x86-64 long-mode Therefore it is possible to limittrace generation to selected ranges and thus avoid hugeamounts of superfluous trace data In accordance to theIP filtering mechanism it is possible to filter traces bythe current privilege level (CPL) of the protection ringmodel (eg ring 0 or ring 3) This filter allows us to selectonly the user mode (CPL gt 0) or kernel mode (CPL = 0)activity kAFL utilizes this filter option to limit tracingexplicitly to kernel mode execution In most cases thefocus of tracing is not the whole OS within all user modeprocesses and their kernel interactions To limit tracedata generation to one specific virtual memory addressspace software can use the CR3 Filter Intel PT will onlyproduce trace data if the CR3 value matches the config-ured filter value The CR3 register contains the pointer tothe current page table The value of the CR3 register canthus be used to filter code executed on behalf of a certainring 3 process even in ring 0 mode

Intel PT supports various configurable target domainsfor output data kAFL focuses on the Table of PhysicalAddresses (ToPA) mechanism that enables us to specifymultiple output regions Every ToPA table contains mul-tiple ToPA entries which in turn contain the physical ad-dress of the associated memory chunk used to store tracedata Each ToPA entry contains the physical address asize specifier for the referred memory chunk in physicalmemory and multiple type bits These type bits specifythe CPUrsquos behavior on access of the ToPA entry and howto deal with filled output regions

3 System Overview

We now provide a high-level overview of the design of anOS-independent and hardware-assisted feedback fuzzerbefore presenting the implementation details of our toolcalled kAFL in Section 4

Our system is split into three components thefuzzing logic the VM infrastructure (modified versionsof QEMU and KVM denoted by QEMU-PT and KVM-PT) and the user mode agent The fuzzing logic runsas a ring 3 process on the host OS This logic is also re-ferred to as kAFL The VM infrastructure consists of aring 3 component (QEMU-PT) and a ring 0 component(KVM-PT) This facilitates communication between theother two components and makes the Intel PT trace dataavailable to the fuzzing logic In general the guest onlycommunicates with the host via hypercalls The host canthen read and write guest memory and continues VM ex-

Figure 1 High-level overview of the kAFL architectureThe setup process ( 1copy- 3copy) is not shown

ecution once the request has been handled A overviewof the architecture can be seen in Figure 1

We now outline the events and communication thattake place during a fuzz run as depicted in Figure 2When the VM is started the first part of the user modeagent (the loader) uses the hypercall HC_SUBMIT_PANICto submit the address of the kernel panic handler (or theBugCheck kernel address in Windows) to QEMU-PT 1copyQEMU-PT then patches a hypercall calling routine at theaddress of the panic handler This allows us to get noti-fied and react fast to crashes in the VM (instead of wait-ing for timeouts reboots)

Then the loader uses the hypercall HC_GET_PROGRAM torequest the actual user mode agent and starts it 2copy Nowthe loader setup is complete and the fuzzer begins its ini-tialization The agent triggers a HC_SUBMIT_CR3 hyper-call that will be handled by KVM-PT The hypervisorextracts the CR3 value of the currently running processand hands it over to QEMU-PT for filtering 3copy Finallythe agent uses the hypercall HC_SUBMIT_BUFFER to in-form the host at which address it expects its inputs Thefuzzer setup is now finished and the main fuzzing loopstarts

During the main loop the agent requests a new inputusing the HC_GET_INPUT hypercall 4copy The fuzzing logicproduces a new input and sends it to QEMU-PT SinceQEMU-PT has full access to the guestrsquos memory spaceit can simply copy the input into the buffer specified bythe agent Then it performs a VM-Entry to continue ex-ecuting the VM 5copy At the same time this VM-Entryevent enables the PT tracing mechanism The agent nowconsumes the input and interacts with the kernel (egit interprets the input as a file system image and tries tomount it 6copy) While the kernel is being fuzzed QEMU-PT decodes the trace data and updates the bitmap on de-mand Once the interaction is finished and the kernelhanded control back to the agent the agent notifies thehypervisor via a HC_FINISHED hypercall The resultingVM-Exit stops the tracing and QEMU-PT decodes theremaining trace data 7copy The resulting bitmap is passed

170 26th USENIX Security Symposium USENIX Association

Figure 2 Overview of the kAFL hypercall interaction

to the logic for further processing 8copy Afterwards theagent can continue to run any untraced clean-up routinesbefore issuing another HC_GET_INPUT to start the nextloop iteration

31 Fuzzing Logic

The fuzzing logic is the command and controlling com-ponent of kAFL It manages the queue of interestinginputs creates mutated inputs and schedules them forevaluation In most aspects it is based on the algo-rithms used by AFL Similarly to AFL we use a bitmapto store basic block transitions We gather the AFLbitmap from the VMs through an interface to QEMU-PTand decide which inputs triggered interesting behaviourThe fuzzing logic also coordinates the number of VMsspawned in parallel One of the bigger design differencesto AFL is that kAFL makes extensive use of multipro-cessing and parallelism where AFL simply spawns mul-tiple independent fuzzers which synchronize their inputqueues sporadically1 In contrast kAFL executes the de-terministic stage in parallel and all threads work on themost interesting input A significant amount of time isspent in tasks that are not CPU-bound (such as gueststhat delay execution) Therefore using many parallelprocesses (upto 5-6 per CPU core) drastically improvesperformance of the fuzzing process due to a higher CPUload per core Lastly the fuzzing logic communicateswith the user interface to display current statistics in reg-ular intervals

1AFL recently added experimental support for distributing thedeterministic stage see httpsgithubcommirroreraflblobmasterdocsparallel_fuzzingtxtL60-L66

32 User Mode AgentWe expect a user mode agent to run inside the virtual-ized target OS In principle this component only has tosynchronize and gather new inputs by the fuzzing logicvia the hypercall interface and use it to interact with theguestrsquos kernel Example agents are programs that try tomount inputs as file system images pass specific filessuch as certificates to kernel parser or even execute achain of various syscalls

In theory we only need one such component In prac-tice we use two different components The first programis the loader component Its job is to accept an arbitrarybinary via the hypercall interface This binary representsthe user mode agent and is executed by the loader com-ponent Additionally the loader component will checkif the agent has crashed (which happens often in case ofsyscall fuzzing) and restarts it if necessary This setuphas the advantage that we can pass any binary to theVM and reuse VM snapshots for different fuzzing com-ponents

33 Virtualization InfrastructureThe fuzzing logic uses QEMU-PT to interact with KVM-PT to spawn the target VMs KVM-PT allows us to traceindividual vCPUs instead of logical CPUs This com-ponent configures and enables Intel PT on the respec-tive logical CPU before the CPU switches to guest ex-ecution and disables tracing during the VM-Exit tran-sition This way the associated CPU will only pro-vide trace data of the virtualized kernel itself QEMU-PT is used to interact with the KVM-PT interface toconfigure and toggle Intel PT from user space and ac-cess the output buffer to decode the trace data The

USENIX Association 26th USENIX Security Symposium 171

decoded trace data is directly translated into a streamof addresses of executed conditional branch instruc-tions Moreover QEMU-PT also filters the stream ofexecuted addressesmdashbased on previous knowledge ofnon-deterministic basic blocksmdashto prevent false-positivefuzzing results and makes those available to the fuzzinglogic as AFL-compatible bitmaps We use our own cus-tom Intel PT decoder to cache disassembly results whichleads to significant performance gains compared to theoff-the-shelf solution provided by Intel

34 Stateful and Non-Deterministic Code

Tracing operating systems results in a significant amountof non-determinism The largest source of non-deterministic basic block transitions are interrupts whichcan occur at any point in time Additionally our imple-mentation does not reset the whole state after each exe-cution since reloading the VM from a memory snapshotis costly Thus we have to deal with the stateful and asyn-chronous aspects of the kernel An example for statefulcode might be a simple call to kmalloc() Dependingon the number of previous allocations kmalloc() mightsimply return a fresh pointer or map a whole range ofpages and update a significant amount of metadata Weuse two techniques to deal with these challenges

The first one is to filter out interrupts and the transi-tion caused while handling interrupts This is possibleusing the Intel PT trace data If an interrupt occurs theprocessor emits a TIP instruction since the transfer is notvisible in the code To avoid confusion during an inter-rupt occurring at an indirect control flow instruction theTIP packet is marked with FUP (flow update packet) toindicate an asynchronous event After identifying such asignature the decoder will drop all basic blocks visiteduntil the corresponding iret instruction is encounteredTo link the interrupts with their corresponding iret wetrack all interrupts on a simple call stack This mecha-nism is necessary since the interrupt handler may itselfbe interrupted by another interrupt

The second mechanism is to blacklist any basic blockthat occurs non-deterministically Each time we en-counter a new bit in the AFL bitmap we re-run the in-put several times in a row Every basic block that doesnot show up in all of the trials will be marked as non-deterministic and filtered from further processing Forfast access the results are stored in a bitmap of black-listed basic block addresses During the AFL bitmaptranslation any transition hash valuemdashwhich combinesthe current basic block address and the previous ba-sic block addressmdashinvolving a blacklisted block will beskipped

35 Hypercalls

Hypercalls are a feature introduced by virtualization OnIntel platforms hypercalls are triggered by the vmcallinstruction Hypercalls are to VMMs as syscalls are tokernels If any ring 3 process or the kernel in the VM ex-ecutes a vmcall instruction a VM-Exit event is triggeredand the VMM can decide how to process the hypercallWe patched KVM-PT to pass through our own set of hy-percalls to the fuzzing logic if a magic value is passedin rax and the appropriate hypercall-ID is set in rbxAdditionally we also patched KVM-PT to accept hyper-calls from ring 3 Arguments for specific hypercalls arepassed through rcx We use this mechanism to definean interface that user mode agent can use to communi-cate with the fuzzing logic One example hypercall isHC_SUBMIT_BUFFER Its argument is a guest pointer thatis stored in rcx Upon executing the vmcall instructiona VM-Exit is triggered and QEMU-PT stores the bufferpointer that was passed It will later copy the new inputdata into this buffer (see step 5copy in Figure 2) Finally theexecution of the VM is continued

climov rax KAFL_MAGIC_VALUEmov rbx HC_CRASHmov rcx 0x0vmcall

Listing 1 Hypercall crash notifier

Another use case for this interface is to notify thefuzzing logic when a crash occurs in the target OS kernelIn order to do so we overwrite the kernel crash handlerof the OS with a simple hypercall routine The injectedcode is shown in Listing 1 and displays how the hyper-call interface is used on the assembly level The cli in-struction disables all interrupts to avoid any kind of asyn-chronous interference during the hypercall routine

4 Implementation Details

Based on the design outlined in the previous section webuilt a prototype of our approach called kAFL In the fol-lowing we describe several implementation details Thesource code of our reference implementation is availableat httpsgithubcomRUB-SysSeckAFL

41 KVM-PT

Intel PT allows us to trace branch transitions withoutpatching or recompiling the targeted kernel To the bestof our knowledge no publicly available driver is able totrace only guest executions of a single vCPU using In-tel PT for long periods of time For instance Simple-PT[29] does not support long-term tracing by design The

172 26th USENIX Security Symposium USENIX Association

perf-subsystem [5] supports tracing of VM guest oper-ations and long-term tracing However it is designed totrace logical CPUs not vCPUs Even if VMX executionis traced the data would be associated with logical CPUsand not with vCPUs Hence the VMX context must bereassembled which is a costly task

To address these shortcomings we developed KVM-PT It allows us to trace vCPUs for an indefinite amountof time without any scheduling side effects or any lossof trace data due to overflowing output regions The ex-tension provides a fast and reliable trace mechanism forKVM vCPUs Moreover this extension exposes muchlike KVM an extensive user mode interface to accessthis tracing feature from user space QEMU-PT utilizesthis novel interface to interact with KVM-PT and to ac-cess the resulting trace data

411 vCPU Specific Traces

To enable Intel PT software that runs within ring0 (in our case KVM-PT) has to set the corre-sponding bit of a model specific register (MSR)(IA32_RTIT_CTL_MSRTraceEn) [28] After tracing isenabled the logical CPU will trace any executed code ifit satisfies the configured filter options The modificationhas to be done before the CPU switches from the hostcontext to the VM operation otherwise the CPU will ex-ecute guest code and is technically unable to modify anyhost MSRs The inverse procedure is required after theCPU has left the guest context However enabling or dis-abling Intel PT manually will also yield a trace contain-ing the manual MSR modification To prevent the collec-tion of unwanted trace data within the VMM we use theMSR autoload capabilities of Intel VT-x MSR autoload-ing can be enabled by modifying the corresponding en-tries in the VMCS (eg VM_ENTRY_CONTROL_MSRfor VM-entries) This forces the CPU to load a list of pre-configured values for defined MSRs after either a VM-entry or VM-exit has occurred By enabling tracing viaMSR autoloading we only gather Intel PT trace data forone specific vCPU

412 Continuous Tracing

Once we have enabled Intel PT the CPU will write theresulting trace data into a memory buffer until it is fullThe physical addresses of this buffer and how to han-dle full buffers is specified by an array of data structurescalled Table of Physical Addresses (ToPA) entries

The array can contain multiple entries and has to beterminated by a single END entry 3copy There are twodifferent ways the CPU can handle an overflow It canstop the tracing (while continuing the executionmdashthus

Figure 3 KVM-PT ToPA configuration

resulting in incomplete traces) or it can raise an inter-rupt This interrupt causes a VM-exit since it is not mask-able We catch the interrupt on the host and consume thetrace data Finally we reset the buffers and continue withthe VM execution Unfortunately this interrupt might beraised at an unspecified time after the buffer was filled2Our configuration of the ToPA entries can be seen in Fig-ure 3 To avoid losing trace data we use two differentToPA entries The first one is the main buffer 1copy Itsoverflow behavior is to trigger the interrupt Once themain buffer is filled a second entry is used until the in-terrupt is actually delivered The ToPA specifies anothersmaller buffer 2copy Overflowing the second buffer wouldlead to the stop of the tracing To avoid the resultingdata loss we chose the second buffer to be about fourtimes larger than the largest overflowing trace we haveever seen in our tests (4 KB)

In case the second buffer also overflows the followingtrace will contain a packet indicating that some data ismissing In that case the size of the second buffer cansimply be increased This way we manage to obtain pre-cise traces for any amount of trace data

42 QEMU-PTTo make use of the KVM extension KVM-PT an userspace counterpart is required QEMU-PT is an extensionof QEMU and provides full support for KVM-PTrsquos userspace interface This interface provides mechanisms toenable disable and configure Intel PT at runtime as wellas a periodic ToPA status check to avoid overruns KVM-PT is accessible from user mode via ioctl() commandsand an mmap() interface

In addition to being a userland interface to KVM-PTQEMU-PT includes a component that decodes trace datainto a form more suitable for the fuzzing logic We de-code the Intel PT packets and turn them into an AFL-likebitmap

421 PT Decoder

Extensive kernel fuzzing may generate several hundredsof megabytes of trace data per second To deal with

2This is due to the current implementation of this interrupt Intelspecifies the interrupt as not precise which means it is likely that fur-ther data will be written to the next buffer or tracing will be terminatedand data will be discarded

USENIX Association 26th USENIX Security Symposium 173

Figure 4 Overview of the pipeline that converts Intel PT traces to kAFL bitmaps

such large amounts of incoming data the decoder mustbe implemented with a focus on efficiency Otherwisethe decoder may become the major bottleneck duringthe fuzzing process Nevertheless the decoder must alsobe precise as inaccuracies during the decoding processwould result in further errors This is due to the nature ofIntel PT decoding since the decoding process is sequen-tial and is affected by previously decoded packets

To ease efforts to implement an Intel PT softwaredecoder Intel provides its own decoding engine calledlibipt [4] libipt is a general-purpose Intel PT decod-ing engine However it does not fit our purposes verywell because libipt decodes trace data in order to pro-vide execution data and flow information Furthermorelibipt does not cache disassembled instructions and hasperformed poorly in our use cases

Since kAFL only relies on flow information and thefuzzing process is repeatedly applied to the same codeit is possible to optimize the decoding process Our In-tel PT software decoder acts like a just-in-time decoderwhich means that code sections are only considered ifthey are executed according to the decoded trace dataTo optimize further look-ups all disassembled code sec-tions are cached In addition we simply ignore packetsthat are not relevant for our use case

Since our PT decoder is part of QEMU-PT trace datais directly processed if the ToPA base region is filledThe decoding process is applied in-place since the bufferis directly accessible from user space via mmap() Un-like other Intel PT drivers we do not need to store largeamounts of trace data in memory or on storage devicesfor post-mortem decoding Eventually the decoded tracedata is translated to the AFL bitmap format

43 AFL Fuzzing Logic

We give a brief description of the fuzzing parts of AFLbecause the logic we use to perform scheduling and mu-tations closely follows that of AFL The most importantaspect of AFL is the bitmap used to trace which basicblock transitions where encountered Each basic blockhas a randomly assigned ID and each transition from ba-sic block A to another basic block B is assigned an offsetinto the bitmap according to the following formula

(id(A)2oplus id(B)) SIZE_OF_BITMAP

Instead of the compile-time random kAFL uses theaddresses of the basic blocks Each time the transition isobserved the corresponding byte in the bitmap is incre-mented After finishing the fuzzing iteration each entryof the bitmap is rounded such that only the highest bit re-mains set Then the bitmap is compared with the globalstatic bitmap to see if any new bit was found If a new bitwas found it is added to the global bitmap and the inputthat triggered the new bit is added to the queue Whena new interesting input is found a deterministic stage isexecuted that tries to mutate each byte individually

Once the deterministic stage is finished the non-deterministic phase is started During this non-deterministic phase multiple mutations are performed atrandom locations If the deterministic phase finds newinputs the non-deterministic phase will be delayed un-til all deterministic phases of all interesting inputs havebeen performed If an input triggers an entirely new tran-sition (as opposed to a change in the number of times thetransition was taken) it will be favored and fuzzed witha higher priority

5 Evaluation

Based on our implementation we now describe thedifferent fuzzing campaigns we performed to evaluatekAFL We evaluate kAFLrsquos fuzzing performance acrossdifferent platforms Section 55 provides an overview ofall reported vulnerabilities crashes and bugs that werefound during the development process of kAFL We alsoevaluate kAFLrsquos ability to find a previously known vul-nerability Finally in Section 56 the overall fuzzingperformance of kAFL is compared to ProjectTriforcethe only other OS-independent feedback fuzzer avail-able TriforceAFL is based on the emulation backendof QEMU instead of hardware-assisted virtualization andIntel PT The performance overhead of KVM-PT is dis-cussed in Section 57 Additionally a performance com-parison of our PT decoder and an Intel implementationof a software decoder is given in Section 58

If not stated otherwise the benchmarks were per-formed on a desktop system with an Intel i7-6700 pro-cessor and 32GB DDR4 RAM To avoid distortions due

174 26th USENIX Security Symposium USENIX Association

to poor IO performance all benchmarks are performedon a RAM disk Similar to AFL we consider a crash-ing input to be unique if it triggered at least one basicblock transition which has not been triggered by any pre-vious crash (ie the bitmap contains at least one newbit) Note this does not imply that the underlying bugsare truly unique

51 Fuzzing Windows

We implemented a small Windows 10 specific user modeagent that mounts any data chunk (fuzzed payload) asNTFS-partitioned volume (289 lines of C code) Weused the Virtual Hard Disk (VHD) API and variousIOCTLS to mount and unmount volumes programmat-ically [31 32] Unfortunately mounting volumes is aslow operation under Windows and we only managedto achieve a throughput of 20 executions per secondNonetheless kAFL managed to find a crash in the NTFSdriver The fuzzer ran for 4 days and 14 hours and re-ported 59 unique crashes all of which were division byzero crashes After manual investigation we suspect thatthere is only one unique bug While it does not allowcode execution it is still a denial-of-service vulnerabilityas for example a USB stick with that malicious NTFSvolume plugged into a critical system will crash that sys-tem with a blue screen It seems that we only scratchedthe surface and NTFS was not thoroughly fuzzed yetHence we assume that the NTFS driver under Windowsis a valuable target for coverage-based feedback fuzzing

Furthermore we implemented a generic system call(syscall) fuzzing agent that simply passes a block of datato a syscall by setting all registers and the top stack re-gion (55 lines of C and 46 lines of assembly code) Thisallows to set parameters for a syscall with a fuzzing pay-load independent of the OS ABI The same fuzzer canbe used to attack syscalls on different operation sys-tems such as Linux or macOS However we evaluatedit against the Windows kernel given the proprietary na-ture of this OS We did not find any bugs in 13 hoursof fuzzing with approx 63M executions since manysyscalls cause the userspace agent to terminate Dueto the coverage-guided feedback kAFL quickly learnedhow to generate payloads to execute valid syscalls andthis led to the unexpected execution of user mode call-backs via the kernel within the fuzzing agent Thesecrashes require rather expensive restarts of the agent andtherefore we only achieved approx 134 executions persecond while normally kAFL achieves a throughput of1000 to 4000 tests per second (see Section 52) Ad-ditionally the Windows syscall interface has already re-ceived much attention by the security community

Figure 5 Fuzzing the ext4 kernel module for 32 hours

52 Fuzzing Linux

We implemented a similar agent for Linux whichmounts data as ext4 volumes (66 lines of C code) Westarted the fuzz campaign with a minimal 64KB ext4 im-age as initial input However we configured the fuzzersuch that it only fuzzes the first two kilobytes duringthe deterministic phase In contrast to Windows theLinux mount process is very fast and we reached 1000to 2000 tests per second on a Thinkpad laptop with ai7-6700HQ26GHz CPU and 32GB RAM Due to thishigh performance we obtained significantly better cov-erage and managed to discover 160 unique crashes andmultiple (confirmed) bugs in the ext4 driver during atwelve-day fuzzing campaign Figure 5 shows the first32 hours of another fuzzing run The fuzzing process wasstill finding new paths and crashes on a fairly regular ba-sis after 32 hours An interesting observation is that therewas no new coverage produced between hours 16 and 25yet the number of inputs increased due a higher numberof loop iterations After hour 25 a truly new input wasfound that unlocked significant parts of the codebase

53 Fuzzing macOS

Similarly to Windows and Linux we targeted multiplefile systems for macOS So far we found approximately150 crashes in the HFS driver and manually confirmedthat at least three of them are unique bugs that lead toa kernel panic Those bugs can be triggered by unprivi-leged users and therefore could very well be abused forlocal denial-of-service attacks One of these bugs seemsto be a use-after-free vulnerability that leads to full con-trol of the rip register Additionally kAFL found 220unique crashes in the APFS kernel extension All 3 HFSvulnerabilities and multiple APFS flaws have been re-ported to Apple

USENIX Association 26th USENIX Security Symposium 175

54 Rediscovery of Known BugsWe evaluated kAFL on the keyctl interface which al-lows a user space program to store and manage vari-ous kinds of key material in the kernel More specif-ically it features a DER (see RFC5280) parser to loadcertificates This functionality had a known bug (CVE-2016-07583) We tested kAFL against the same interfaceon a vulnerable kernel (version 432) kAFL was ableto uncover the same problem and one additional previ-ously unknown bug that was assigned CVE-2016-86504kAFL managed to trigger 17 unique KASan reports and15 unique panics in just one hour of execution time Dur-ing this experiment kAFL generated over 34 million in-puts found 295 interesting inputs and performed nearly9000 executions per second This experiment was per-formed while running 8 processes in parallel

55 Detected VulnerabilitiesDuring the evaluation kAFL found more than a thou-sand unique crashes We evaluated some manually andfound multiple security vulnerabilities in all tested op-erating systems such as Linux Windows and macOSSo far eight bugs were reported and three of them wereconfirmed by the maintainers

bull Linux keyctl Null Pointer Dereference5 (CVE-2016-86506)

bull Linux ext4 Memory Corruption7

bull Linux ext4 Error Handling8

bull Windows NTFS Div-by-Zero9

bull macOS HFS Div-by-Zero10

bull macOS HFS Assertion Fail10

bull macOS HFS Use-After-Free10

bull macOS APFS Memory Corruption10

Red Hat has assigned a CVE number for the first re-ported security flaw which triggers a null pointer def-erence and a partial memory corruption in the kernelASN1 parser if an RSA certificate with a zero expo-nent is presented For the second reported vulnerabil-ity which triggers a memory corruption in the ext4 file

3httpsaccessredhatcomsecuritycvecve-2016-07584httpsaccessredhatcomsecuritycvecve-2016-86505httpseclistsorgfulldisclosure2016Nov766httpsaccessredhatcomsecuritycvecve-2016-86507httpseclistsorgfulldisclosure2016Nov758httpseclistsorgbugtraq2016Nov19Reported to Microsoft Security

10Reported to Apple Product Security

system a mainline patch was proposed The last re-ported Linux vulnerability which calls in the ext4 errorhandling routine panic() and hence results in a kernelpanic was at the time of writing not investigated any fur-ther The NTFS bug in Windows 10 is a non-recoverableerror condition which leads to a blue screen This bugwas reported to Microsoft but has not been confirmedyet Similarly Apple has not yet verified or confirmedour reported macOS bugs

56 Fuzzing PerformanceWe compare the overall performance of kAFL across dif-ferent operating systems To ensure comparable resultswe created a simple driver that contains a JSON parserbased on jsmn11 for each aforementioned operating sys-tem and used it to decode user input (see Listing 2) Ifthe user input is a JSON string starting with KAFL acrash is triggered We traced both the JSON parser aswell as the final check This way kAFL was able to learncorrect JSON syntax We measured the time used to findthe crash the number of executions per second and thespeed for new paths to be discovered on all three targetoperating systems

1 jsmn_parser parser2 jsmntok_t tokens [5]3 jsmn_init (amp parser)4

5 int res=jsmn_parse (ampparser input size tokens 5)6 if(res gt= 2)7 if(tokens [0] type == JSMN_STRING)8 int json_len = tokens [0] end - tokens [0]

start9 int s = tokens [0] start

10 if(json_len gt 0 ampamp input[s+0] == K)11 if(json_len gt 1 ampamp input[s+1] == A)12 if(json_len gt 2 ampamp input[s+2] == F)13 if(json_len gt 3 ampamp input[s+3] == L)14 panic(KERN_INFO KAFL n)15 16

Listing 2 The JSON parser kernel module used for thecoverage benchmarks

We performed five repeated experiments for each op-erating system Additionally we tested TriforceAFL withthe Linux target driver TriforceAFL is unable to fuzzWindows and macOS To compare TriforceAFL withkAFL the associated TriforceLinuxSyscallFuzzer wasslightly modified to work with our vulnerable Linux ker-nel module Unfortunately it was not possible to com-pare kAFL against Oraclersquos file system fuzzer [34] dueto technical issues with its setup

During each run we fuzzed the JSON parser for 30minutes The averaged and rounded results are displayedin Table 1 As we can see the performance of kAFLis very similar across different systems It should be re-marked that the variance in this experiment is rather high

11httpzsergecomjsmnhtml

176 26th USENIX Security Symposium USENIX Association

ExecsSec(1 Process)

ExecsSec(8 Processes)

Time to Crash(1 Process)

Time to Crash(8 Processes)

PathsMin(1 Process)

PathsMin(8 Processes)

TriforceAFL 150 320 -a -a 1008 -b

Linux (initramfs) 3000 5700 750 600 1584 1562Debian 8 3000 5700 455 630 1620 1600Debian 8 (KASan) 4300 5700 920 600 1622 1590macOS (10124) 5100 8100 743 510 1450 1506Windows 10 4300 8700 414 450 1150 1202

a Not found during 30-minute experimentsb This value cannot be obtained since TriforceAFL does not synchronize in such short time frames

Table 1 kAFL and TriforceAFL fuzzing performance on the JSON sample driver

Figure 6 Coverage comparison of kAFL (initramfs) andTriforceAFL kAFL takes less than 3 minutes to find thesame number of paths as TriforceAFL does in 30 minutes(each running 1 process)

and explains some of the surprising results This is due tothe stochastic nature of fuzzing since each fuzzer findsvastly different paths some of which may take signifi-cantly longer to process especially crashing paths andloops One example for high variance is the fact that onDebian 8 (initramfs) the multiprocessing configurationon average needed more time to find the crash than oneprocess

TriforceAFL We used the JSON driver to comparekAFL and TriforceAFL with respect to execution speedand code coverage However the results where biasedheavily in two ways TriforceAFL did not manage to finda path that triggers the crash within 30 minutes (usuallyit takes approximately 2 hours) making it very hard tocompare the code coverage of kAFL and TriforceAFLThe number of discovered paths is not a good indica-tor for the amount of coverage With increasing runningtime it becomes more difficult to discover new pathsSecondly the number of executions per second is also bi-ased by slower and harder to reach paths and especiallycrashing inputs The coverage reached over time can beseen in Figure 6 It is obvious from the figure that kAFLfound a significant number of paths that are very hard to

Figure 7 Raw execution performance comparison

reach for TriforceAFL kAFL mostly stops finding newpaths around the 10-15 minute mark because the targetdriver simply doesnrsquot contain any more paths to be un-covered Therefore the coverage value in Table 1 (statedas PathsMin) is limited to the first 10 minutes of each30-minute run

We also compare raw execution performance insteadof overall fuzzing performance which is biased becauseof the execution of different paths the sampling processfor the non-determinism-filter and various synchroniza-tion mechanisms Especially on smaller inputs thesefactors disproportionately affect the overall fuzzing per-formance To avoid this we compared the performanceduring the first havoc stage Figure 7 shows the raw ex-ecution performance of kAFL compared to TriforceAFLduring this havoc phase kAFL provides up to 54 timesbetter performance compared to TriforceAFLrsquos QEMUCPU emulation Slightly lower performance boosts areseen in single-process execution (48 times faster)

syzkaller We did not perform a performance compari-son against syzkaller [10] This has two reasons First ofall syzkaller is a highly specific syscall fuzzer that en-codes a significant amount of domain knowledge and istherefore not applicable to other domains such as filesys-tem images On the other hand syzkaller would mostlikely generate a significantly higher code coverage evenwithout any feedback since it knows how to generate

USENIX Association 26th USENIX Security Symposium 177

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 4: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

22 Intel VT-x

The kernel fuzzing approach introduced in this paper re-lies on modern x86-64 hardware virtualization technol-ogy Hence we provide a brief overview of Intelrsquos hard-ware virtualization technology Intel VT-x

We differentiate between three kinds of CPUs phys-ical CPUs logical CPUs and virtual CPUs (vCPUs) Aphysical CPU is a CPU that is implemented in hardwareMost modern CPUs support mechanisms to increasemultithreading performance without additional physicalCPU cores on the die (eg Intel Hyper-Threading) Inthis case there are multiple logical CPUs on one phys-ical CPU These different logical CPUs share the phys-ical CPU and thus only one of them can be active ata time However the execution of the different logicalCPUs is interleaved by the hardware and therefore theavailable resources can be utilized more efficiently (egone logical CPU uses the arithmetic logic unit while an-other logical CPU waits for a data fetch) and the oper-ating system can reduce the scheduling overhead Eachlogical CPU is usually treated like a whole CPU by theoperating system Finally it is possible to create multiplehardware-supported virtual machines (VMs) on a singlelogical CPU In this case each VM has a set of its ownvCPUs

The virtualization role model is divided into two com-ponents the virtual machine monitor (VMM) and theVM The VMM also named hypervisor or host is priv-ileged software that has full control over the physicalCPU and provides virtualized guests with restricted ac-cess to physical resources The VM also termed guest isa piece of software that is transparently executed withinthe virtualized context provided by the VMM

To provide full hardware-assisted virtualization sup-port Intel VT-x adds two additional execution modesto the well-known protection ring based standard modeof execution The default mode of executions is calledVMX OFF It does not implement any hardware virtual-ization support When using hardware-supported virtual-ization the CPU switches into the VMX ON state and dis-tinguishes between two different execution modes thehigher-privileged mode of the hypervisor (VMX root orVMM) and the lower privileged execution mode of thevirtual machine guest (VMX non-root or VM)

When running in guest mode several privileged ac-tions or reasons (execution of restricted instructions ex-pired VMX-preemption timer or access to certain em-ulated devices) in the VM guest will trigger a VM-Exitevent and transfer control to the hypervisor This wayit is possible to run arbitrary software that expects priv-ileged access to the hardware (such as an OS) inside aVM At the same time a higher authority can meditate

and control the operations performed with a small per-formance overhead

To create launch and control a VM the VMM has touse a virtual machine control structure (VMCS) for eachvCPU [28] The VMCS contains all essential informa-tion about the current state and how to perform VMXtransitions of the vCPU

23 Intel Processor TraceWith the fifth generation of Intel Core processors (Broad-well architecture) Intel has introduced a new processorfeature called Intel Processor Trace (Intel PT) to provideexecution and branch tracing information Unlike otherbranch tracing technologies such as Intel Last BranchRecord (LBR) the size of the output buffer is no longerstrictly limited by special registers Instead it is onlylimited by the size of the main memory If the outputtarget is repeatedly and timely emptied we can createtraces of arbitrary length The processorrsquos output formatis packet-oriented and separated into two different typesgeneral execution information and control flow informa-tion packets Intel PT produces various types of con-trol flow related packet types during runtime To obtaincontrol-flow information from the trace data we requirea decoder The decoder needs the traced software to in-terpret the packets that contain the addresses of condi-tional branches

Intel specifies five types of control flow affecting in-structions called Change of Flow Instruction (CoFI) Theexecution of different CoFI types results in different se-quences of flow information packets The three CoFItypes relevant to our work are

1 Taken-Not-Taken (TNT) If the processor exe-cutes any conditional jump the decision whetherthis jump was taken or not is encoded in a TNTpacket

2 Target IP (TIP) If the processor executes an indi-rect jump or transfer instruction the decoder willnot be able to recover the control flow There-fore the processor produces a TIP packet upon theexecution of an instruction of the type indirectbranch near ret or far transfer These TIPpackets store the corresponding target instructionpointer executed by the processor after the transferor jump has occurred

3 Flow Update Packets (FUP) Another case wherethe processor must produce a hint packet for thesoftware decoder are asynchronous events such asinterrupts or traps These events are recorded asFUPs and usually followed by a TIP to indicate thefollowing instruction

USENIX Association 26th USENIX Security Symposium 169

To limit the amount of trace data generated Intel PTprovides multiple options for runtime filtering Depend-ing on the given processor it might be possible to con-figure multiple ranges for instruction-pointer filtering (IPFilter) In general these filter ranges only affect virtualaddresses if paging is enabled this is always the casein x86-64 long-mode Therefore it is possible to limittrace generation to selected ranges and thus avoid hugeamounts of superfluous trace data In accordance to theIP filtering mechanism it is possible to filter traces bythe current privilege level (CPL) of the protection ringmodel (eg ring 0 or ring 3) This filter allows us to selectonly the user mode (CPL gt 0) or kernel mode (CPL = 0)activity kAFL utilizes this filter option to limit tracingexplicitly to kernel mode execution In most cases thefocus of tracing is not the whole OS within all user modeprocesses and their kernel interactions To limit tracedata generation to one specific virtual memory addressspace software can use the CR3 Filter Intel PT will onlyproduce trace data if the CR3 value matches the config-ured filter value The CR3 register contains the pointer tothe current page table The value of the CR3 register canthus be used to filter code executed on behalf of a certainring 3 process even in ring 0 mode

Intel PT supports various configurable target domainsfor output data kAFL focuses on the Table of PhysicalAddresses (ToPA) mechanism that enables us to specifymultiple output regions Every ToPA table contains mul-tiple ToPA entries which in turn contain the physical ad-dress of the associated memory chunk used to store tracedata Each ToPA entry contains the physical address asize specifier for the referred memory chunk in physicalmemory and multiple type bits These type bits specifythe CPUrsquos behavior on access of the ToPA entry and howto deal with filled output regions

3 System Overview

We now provide a high-level overview of the design of anOS-independent and hardware-assisted feedback fuzzerbefore presenting the implementation details of our toolcalled kAFL in Section 4

Our system is split into three components thefuzzing logic the VM infrastructure (modified versionsof QEMU and KVM denoted by QEMU-PT and KVM-PT) and the user mode agent The fuzzing logic runsas a ring 3 process on the host OS This logic is also re-ferred to as kAFL The VM infrastructure consists of aring 3 component (QEMU-PT) and a ring 0 component(KVM-PT) This facilitates communication between theother two components and makes the Intel PT trace dataavailable to the fuzzing logic In general the guest onlycommunicates with the host via hypercalls The host canthen read and write guest memory and continues VM ex-

Figure 1 High-level overview of the kAFL architectureThe setup process ( 1copy- 3copy) is not shown

ecution once the request has been handled A overviewof the architecture can be seen in Figure 1

We now outline the events and communication thattake place during a fuzz run as depicted in Figure 2When the VM is started the first part of the user modeagent (the loader) uses the hypercall HC_SUBMIT_PANICto submit the address of the kernel panic handler (or theBugCheck kernel address in Windows) to QEMU-PT 1copyQEMU-PT then patches a hypercall calling routine at theaddress of the panic handler This allows us to get noti-fied and react fast to crashes in the VM (instead of wait-ing for timeouts reboots)

Then the loader uses the hypercall HC_GET_PROGRAM torequest the actual user mode agent and starts it 2copy Nowthe loader setup is complete and the fuzzer begins its ini-tialization The agent triggers a HC_SUBMIT_CR3 hyper-call that will be handled by KVM-PT The hypervisorextracts the CR3 value of the currently running processand hands it over to QEMU-PT for filtering 3copy Finallythe agent uses the hypercall HC_SUBMIT_BUFFER to in-form the host at which address it expects its inputs Thefuzzer setup is now finished and the main fuzzing loopstarts

During the main loop the agent requests a new inputusing the HC_GET_INPUT hypercall 4copy The fuzzing logicproduces a new input and sends it to QEMU-PT SinceQEMU-PT has full access to the guestrsquos memory spaceit can simply copy the input into the buffer specified bythe agent Then it performs a VM-Entry to continue ex-ecuting the VM 5copy At the same time this VM-Entryevent enables the PT tracing mechanism The agent nowconsumes the input and interacts with the kernel (egit interprets the input as a file system image and tries tomount it 6copy) While the kernel is being fuzzed QEMU-PT decodes the trace data and updates the bitmap on de-mand Once the interaction is finished and the kernelhanded control back to the agent the agent notifies thehypervisor via a HC_FINISHED hypercall The resultingVM-Exit stops the tracing and QEMU-PT decodes theremaining trace data 7copy The resulting bitmap is passed

170 26th USENIX Security Symposium USENIX Association

Figure 2 Overview of the kAFL hypercall interaction

to the logic for further processing 8copy Afterwards theagent can continue to run any untraced clean-up routinesbefore issuing another HC_GET_INPUT to start the nextloop iteration

31 Fuzzing Logic

The fuzzing logic is the command and controlling com-ponent of kAFL It manages the queue of interestinginputs creates mutated inputs and schedules them forevaluation In most aspects it is based on the algo-rithms used by AFL Similarly to AFL we use a bitmapto store basic block transitions We gather the AFLbitmap from the VMs through an interface to QEMU-PTand decide which inputs triggered interesting behaviourThe fuzzing logic also coordinates the number of VMsspawned in parallel One of the bigger design differencesto AFL is that kAFL makes extensive use of multipro-cessing and parallelism where AFL simply spawns mul-tiple independent fuzzers which synchronize their inputqueues sporadically1 In contrast kAFL executes the de-terministic stage in parallel and all threads work on themost interesting input A significant amount of time isspent in tasks that are not CPU-bound (such as gueststhat delay execution) Therefore using many parallelprocesses (upto 5-6 per CPU core) drastically improvesperformance of the fuzzing process due to a higher CPUload per core Lastly the fuzzing logic communicateswith the user interface to display current statistics in reg-ular intervals

1AFL recently added experimental support for distributing thedeterministic stage see httpsgithubcommirroreraflblobmasterdocsparallel_fuzzingtxtL60-L66

32 User Mode AgentWe expect a user mode agent to run inside the virtual-ized target OS In principle this component only has tosynchronize and gather new inputs by the fuzzing logicvia the hypercall interface and use it to interact with theguestrsquos kernel Example agents are programs that try tomount inputs as file system images pass specific filessuch as certificates to kernel parser or even execute achain of various syscalls

In theory we only need one such component In prac-tice we use two different components The first programis the loader component Its job is to accept an arbitrarybinary via the hypercall interface This binary representsthe user mode agent and is executed by the loader com-ponent Additionally the loader component will checkif the agent has crashed (which happens often in case ofsyscall fuzzing) and restarts it if necessary This setuphas the advantage that we can pass any binary to theVM and reuse VM snapshots for different fuzzing com-ponents

33 Virtualization InfrastructureThe fuzzing logic uses QEMU-PT to interact with KVM-PT to spawn the target VMs KVM-PT allows us to traceindividual vCPUs instead of logical CPUs This com-ponent configures and enables Intel PT on the respec-tive logical CPU before the CPU switches to guest ex-ecution and disables tracing during the VM-Exit tran-sition This way the associated CPU will only pro-vide trace data of the virtualized kernel itself QEMU-PT is used to interact with the KVM-PT interface toconfigure and toggle Intel PT from user space and ac-cess the output buffer to decode the trace data The

USENIX Association 26th USENIX Security Symposium 171

decoded trace data is directly translated into a streamof addresses of executed conditional branch instruc-tions Moreover QEMU-PT also filters the stream ofexecuted addressesmdashbased on previous knowledge ofnon-deterministic basic blocksmdashto prevent false-positivefuzzing results and makes those available to the fuzzinglogic as AFL-compatible bitmaps We use our own cus-tom Intel PT decoder to cache disassembly results whichleads to significant performance gains compared to theoff-the-shelf solution provided by Intel

34 Stateful and Non-Deterministic Code

Tracing operating systems results in a significant amountof non-determinism The largest source of non-deterministic basic block transitions are interrupts whichcan occur at any point in time Additionally our imple-mentation does not reset the whole state after each exe-cution since reloading the VM from a memory snapshotis costly Thus we have to deal with the stateful and asyn-chronous aspects of the kernel An example for statefulcode might be a simple call to kmalloc() Dependingon the number of previous allocations kmalloc() mightsimply return a fresh pointer or map a whole range ofpages and update a significant amount of metadata Weuse two techniques to deal with these challenges

The first one is to filter out interrupts and the transi-tion caused while handling interrupts This is possibleusing the Intel PT trace data If an interrupt occurs theprocessor emits a TIP instruction since the transfer is notvisible in the code To avoid confusion during an inter-rupt occurring at an indirect control flow instruction theTIP packet is marked with FUP (flow update packet) toindicate an asynchronous event After identifying such asignature the decoder will drop all basic blocks visiteduntil the corresponding iret instruction is encounteredTo link the interrupts with their corresponding iret wetrack all interrupts on a simple call stack This mecha-nism is necessary since the interrupt handler may itselfbe interrupted by another interrupt

The second mechanism is to blacklist any basic blockthat occurs non-deterministically Each time we en-counter a new bit in the AFL bitmap we re-run the in-put several times in a row Every basic block that doesnot show up in all of the trials will be marked as non-deterministic and filtered from further processing Forfast access the results are stored in a bitmap of black-listed basic block addresses During the AFL bitmaptranslation any transition hash valuemdashwhich combinesthe current basic block address and the previous ba-sic block addressmdashinvolving a blacklisted block will beskipped

35 Hypercalls

Hypercalls are a feature introduced by virtualization OnIntel platforms hypercalls are triggered by the vmcallinstruction Hypercalls are to VMMs as syscalls are tokernels If any ring 3 process or the kernel in the VM ex-ecutes a vmcall instruction a VM-Exit event is triggeredand the VMM can decide how to process the hypercallWe patched KVM-PT to pass through our own set of hy-percalls to the fuzzing logic if a magic value is passedin rax and the appropriate hypercall-ID is set in rbxAdditionally we also patched KVM-PT to accept hyper-calls from ring 3 Arguments for specific hypercalls arepassed through rcx We use this mechanism to definean interface that user mode agent can use to communi-cate with the fuzzing logic One example hypercall isHC_SUBMIT_BUFFER Its argument is a guest pointer thatis stored in rcx Upon executing the vmcall instructiona VM-Exit is triggered and QEMU-PT stores the bufferpointer that was passed It will later copy the new inputdata into this buffer (see step 5copy in Figure 2) Finally theexecution of the VM is continued

climov rax KAFL_MAGIC_VALUEmov rbx HC_CRASHmov rcx 0x0vmcall

Listing 1 Hypercall crash notifier

Another use case for this interface is to notify thefuzzing logic when a crash occurs in the target OS kernelIn order to do so we overwrite the kernel crash handlerof the OS with a simple hypercall routine The injectedcode is shown in Listing 1 and displays how the hyper-call interface is used on the assembly level The cli in-struction disables all interrupts to avoid any kind of asyn-chronous interference during the hypercall routine

4 Implementation Details

Based on the design outlined in the previous section webuilt a prototype of our approach called kAFL In the fol-lowing we describe several implementation details Thesource code of our reference implementation is availableat httpsgithubcomRUB-SysSeckAFL

41 KVM-PT

Intel PT allows us to trace branch transitions withoutpatching or recompiling the targeted kernel To the bestof our knowledge no publicly available driver is able totrace only guest executions of a single vCPU using In-tel PT for long periods of time For instance Simple-PT[29] does not support long-term tracing by design The

172 26th USENIX Security Symposium USENIX Association

perf-subsystem [5] supports tracing of VM guest oper-ations and long-term tracing However it is designed totrace logical CPUs not vCPUs Even if VMX executionis traced the data would be associated with logical CPUsand not with vCPUs Hence the VMX context must bereassembled which is a costly task

To address these shortcomings we developed KVM-PT It allows us to trace vCPUs for an indefinite amountof time without any scheduling side effects or any lossof trace data due to overflowing output regions The ex-tension provides a fast and reliable trace mechanism forKVM vCPUs Moreover this extension exposes muchlike KVM an extensive user mode interface to accessthis tracing feature from user space QEMU-PT utilizesthis novel interface to interact with KVM-PT and to ac-cess the resulting trace data

411 vCPU Specific Traces

To enable Intel PT software that runs within ring0 (in our case KVM-PT) has to set the corre-sponding bit of a model specific register (MSR)(IA32_RTIT_CTL_MSRTraceEn) [28] After tracing isenabled the logical CPU will trace any executed code ifit satisfies the configured filter options The modificationhas to be done before the CPU switches from the hostcontext to the VM operation otherwise the CPU will ex-ecute guest code and is technically unable to modify anyhost MSRs The inverse procedure is required after theCPU has left the guest context However enabling or dis-abling Intel PT manually will also yield a trace contain-ing the manual MSR modification To prevent the collec-tion of unwanted trace data within the VMM we use theMSR autoload capabilities of Intel VT-x MSR autoload-ing can be enabled by modifying the corresponding en-tries in the VMCS (eg VM_ENTRY_CONTROL_MSRfor VM-entries) This forces the CPU to load a list of pre-configured values for defined MSRs after either a VM-entry or VM-exit has occurred By enabling tracing viaMSR autoloading we only gather Intel PT trace data forone specific vCPU

412 Continuous Tracing

Once we have enabled Intel PT the CPU will write theresulting trace data into a memory buffer until it is fullThe physical addresses of this buffer and how to han-dle full buffers is specified by an array of data structurescalled Table of Physical Addresses (ToPA) entries

The array can contain multiple entries and has to beterminated by a single END entry 3copy There are twodifferent ways the CPU can handle an overflow It canstop the tracing (while continuing the executionmdashthus

Figure 3 KVM-PT ToPA configuration

resulting in incomplete traces) or it can raise an inter-rupt This interrupt causes a VM-exit since it is not mask-able We catch the interrupt on the host and consume thetrace data Finally we reset the buffers and continue withthe VM execution Unfortunately this interrupt might beraised at an unspecified time after the buffer was filled2Our configuration of the ToPA entries can be seen in Fig-ure 3 To avoid losing trace data we use two differentToPA entries The first one is the main buffer 1copy Itsoverflow behavior is to trigger the interrupt Once themain buffer is filled a second entry is used until the in-terrupt is actually delivered The ToPA specifies anothersmaller buffer 2copy Overflowing the second buffer wouldlead to the stop of the tracing To avoid the resultingdata loss we chose the second buffer to be about fourtimes larger than the largest overflowing trace we haveever seen in our tests (4 KB)

In case the second buffer also overflows the followingtrace will contain a packet indicating that some data ismissing In that case the size of the second buffer cansimply be increased This way we manage to obtain pre-cise traces for any amount of trace data

42 QEMU-PTTo make use of the KVM extension KVM-PT an userspace counterpart is required QEMU-PT is an extensionof QEMU and provides full support for KVM-PTrsquos userspace interface This interface provides mechanisms toenable disable and configure Intel PT at runtime as wellas a periodic ToPA status check to avoid overruns KVM-PT is accessible from user mode via ioctl() commandsand an mmap() interface

In addition to being a userland interface to KVM-PTQEMU-PT includes a component that decodes trace datainto a form more suitable for the fuzzing logic We de-code the Intel PT packets and turn them into an AFL-likebitmap

421 PT Decoder

Extensive kernel fuzzing may generate several hundredsof megabytes of trace data per second To deal with

2This is due to the current implementation of this interrupt Intelspecifies the interrupt as not precise which means it is likely that fur-ther data will be written to the next buffer or tracing will be terminatedand data will be discarded

USENIX Association 26th USENIX Security Symposium 173

Figure 4 Overview of the pipeline that converts Intel PT traces to kAFL bitmaps

such large amounts of incoming data the decoder mustbe implemented with a focus on efficiency Otherwisethe decoder may become the major bottleneck duringthe fuzzing process Nevertheless the decoder must alsobe precise as inaccuracies during the decoding processwould result in further errors This is due to the nature ofIntel PT decoding since the decoding process is sequen-tial and is affected by previously decoded packets

To ease efforts to implement an Intel PT softwaredecoder Intel provides its own decoding engine calledlibipt [4] libipt is a general-purpose Intel PT decod-ing engine However it does not fit our purposes verywell because libipt decodes trace data in order to pro-vide execution data and flow information Furthermorelibipt does not cache disassembled instructions and hasperformed poorly in our use cases

Since kAFL only relies on flow information and thefuzzing process is repeatedly applied to the same codeit is possible to optimize the decoding process Our In-tel PT software decoder acts like a just-in-time decoderwhich means that code sections are only considered ifthey are executed according to the decoded trace dataTo optimize further look-ups all disassembled code sec-tions are cached In addition we simply ignore packetsthat are not relevant for our use case

Since our PT decoder is part of QEMU-PT trace datais directly processed if the ToPA base region is filledThe decoding process is applied in-place since the bufferis directly accessible from user space via mmap() Un-like other Intel PT drivers we do not need to store largeamounts of trace data in memory or on storage devicesfor post-mortem decoding Eventually the decoded tracedata is translated to the AFL bitmap format

43 AFL Fuzzing Logic

We give a brief description of the fuzzing parts of AFLbecause the logic we use to perform scheduling and mu-tations closely follows that of AFL The most importantaspect of AFL is the bitmap used to trace which basicblock transitions where encountered Each basic blockhas a randomly assigned ID and each transition from ba-sic block A to another basic block B is assigned an offsetinto the bitmap according to the following formula

(id(A)2oplus id(B)) SIZE_OF_BITMAP

Instead of the compile-time random kAFL uses theaddresses of the basic blocks Each time the transition isobserved the corresponding byte in the bitmap is incre-mented After finishing the fuzzing iteration each entryof the bitmap is rounded such that only the highest bit re-mains set Then the bitmap is compared with the globalstatic bitmap to see if any new bit was found If a new bitwas found it is added to the global bitmap and the inputthat triggered the new bit is added to the queue Whena new interesting input is found a deterministic stage isexecuted that tries to mutate each byte individually

Once the deterministic stage is finished the non-deterministic phase is started During this non-deterministic phase multiple mutations are performed atrandom locations If the deterministic phase finds newinputs the non-deterministic phase will be delayed un-til all deterministic phases of all interesting inputs havebeen performed If an input triggers an entirely new tran-sition (as opposed to a change in the number of times thetransition was taken) it will be favored and fuzzed witha higher priority

5 Evaluation

Based on our implementation we now describe thedifferent fuzzing campaigns we performed to evaluatekAFL We evaluate kAFLrsquos fuzzing performance acrossdifferent platforms Section 55 provides an overview ofall reported vulnerabilities crashes and bugs that werefound during the development process of kAFL We alsoevaluate kAFLrsquos ability to find a previously known vul-nerability Finally in Section 56 the overall fuzzingperformance of kAFL is compared to ProjectTriforcethe only other OS-independent feedback fuzzer avail-able TriforceAFL is based on the emulation backendof QEMU instead of hardware-assisted virtualization andIntel PT The performance overhead of KVM-PT is dis-cussed in Section 57 Additionally a performance com-parison of our PT decoder and an Intel implementationof a software decoder is given in Section 58

If not stated otherwise the benchmarks were per-formed on a desktop system with an Intel i7-6700 pro-cessor and 32GB DDR4 RAM To avoid distortions due

174 26th USENIX Security Symposium USENIX Association

to poor IO performance all benchmarks are performedon a RAM disk Similar to AFL we consider a crash-ing input to be unique if it triggered at least one basicblock transition which has not been triggered by any pre-vious crash (ie the bitmap contains at least one newbit) Note this does not imply that the underlying bugsare truly unique

51 Fuzzing Windows

We implemented a small Windows 10 specific user modeagent that mounts any data chunk (fuzzed payload) asNTFS-partitioned volume (289 lines of C code) Weused the Virtual Hard Disk (VHD) API and variousIOCTLS to mount and unmount volumes programmat-ically [31 32] Unfortunately mounting volumes is aslow operation under Windows and we only managedto achieve a throughput of 20 executions per secondNonetheless kAFL managed to find a crash in the NTFSdriver The fuzzer ran for 4 days and 14 hours and re-ported 59 unique crashes all of which were division byzero crashes After manual investigation we suspect thatthere is only one unique bug While it does not allowcode execution it is still a denial-of-service vulnerabilityas for example a USB stick with that malicious NTFSvolume plugged into a critical system will crash that sys-tem with a blue screen It seems that we only scratchedthe surface and NTFS was not thoroughly fuzzed yetHence we assume that the NTFS driver under Windowsis a valuable target for coverage-based feedback fuzzing

Furthermore we implemented a generic system call(syscall) fuzzing agent that simply passes a block of datato a syscall by setting all registers and the top stack re-gion (55 lines of C and 46 lines of assembly code) Thisallows to set parameters for a syscall with a fuzzing pay-load independent of the OS ABI The same fuzzer canbe used to attack syscalls on different operation sys-tems such as Linux or macOS However we evaluatedit against the Windows kernel given the proprietary na-ture of this OS We did not find any bugs in 13 hoursof fuzzing with approx 63M executions since manysyscalls cause the userspace agent to terminate Dueto the coverage-guided feedback kAFL quickly learnedhow to generate payloads to execute valid syscalls andthis led to the unexpected execution of user mode call-backs via the kernel within the fuzzing agent Thesecrashes require rather expensive restarts of the agent andtherefore we only achieved approx 134 executions persecond while normally kAFL achieves a throughput of1000 to 4000 tests per second (see Section 52) Ad-ditionally the Windows syscall interface has already re-ceived much attention by the security community

Figure 5 Fuzzing the ext4 kernel module for 32 hours

52 Fuzzing Linux

We implemented a similar agent for Linux whichmounts data as ext4 volumes (66 lines of C code) Westarted the fuzz campaign with a minimal 64KB ext4 im-age as initial input However we configured the fuzzersuch that it only fuzzes the first two kilobytes duringthe deterministic phase In contrast to Windows theLinux mount process is very fast and we reached 1000to 2000 tests per second on a Thinkpad laptop with ai7-6700HQ26GHz CPU and 32GB RAM Due to thishigh performance we obtained significantly better cov-erage and managed to discover 160 unique crashes andmultiple (confirmed) bugs in the ext4 driver during atwelve-day fuzzing campaign Figure 5 shows the first32 hours of another fuzzing run The fuzzing process wasstill finding new paths and crashes on a fairly regular ba-sis after 32 hours An interesting observation is that therewas no new coverage produced between hours 16 and 25yet the number of inputs increased due a higher numberof loop iterations After hour 25 a truly new input wasfound that unlocked significant parts of the codebase

53 Fuzzing macOS

Similarly to Windows and Linux we targeted multiplefile systems for macOS So far we found approximately150 crashes in the HFS driver and manually confirmedthat at least three of them are unique bugs that lead toa kernel panic Those bugs can be triggered by unprivi-leged users and therefore could very well be abused forlocal denial-of-service attacks One of these bugs seemsto be a use-after-free vulnerability that leads to full con-trol of the rip register Additionally kAFL found 220unique crashes in the APFS kernel extension All 3 HFSvulnerabilities and multiple APFS flaws have been re-ported to Apple

USENIX Association 26th USENIX Security Symposium 175

54 Rediscovery of Known BugsWe evaluated kAFL on the keyctl interface which al-lows a user space program to store and manage vari-ous kinds of key material in the kernel More specif-ically it features a DER (see RFC5280) parser to loadcertificates This functionality had a known bug (CVE-2016-07583) We tested kAFL against the same interfaceon a vulnerable kernel (version 432) kAFL was ableto uncover the same problem and one additional previ-ously unknown bug that was assigned CVE-2016-86504kAFL managed to trigger 17 unique KASan reports and15 unique panics in just one hour of execution time Dur-ing this experiment kAFL generated over 34 million in-puts found 295 interesting inputs and performed nearly9000 executions per second This experiment was per-formed while running 8 processes in parallel

55 Detected VulnerabilitiesDuring the evaluation kAFL found more than a thou-sand unique crashes We evaluated some manually andfound multiple security vulnerabilities in all tested op-erating systems such as Linux Windows and macOSSo far eight bugs were reported and three of them wereconfirmed by the maintainers

bull Linux keyctl Null Pointer Dereference5 (CVE-2016-86506)

bull Linux ext4 Memory Corruption7

bull Linux ext4 Error Handling8

bull Windows NTFS Div-by-Zero9

bull macOS HFS Div-by-Zero10

bull macOS HFS Assertion Fail10

bull macOS HFS Use-After-Free10

bull macOS APFS Memory Corruption10

Red Hat has assigned a CVE number for the first re-ported security flaw which triggers a null pointer def-erence and a partial memory corruption in the kernelASN1 parser if an RSA certificate with a zero expo-nent is presented For the second reported vulnerabil-ity which triggers a memory corruption in the ext4 file

3httpsaccessredhatcomsecuritycvecve-2016-07584httpsaccessredhatcomsecuritycvecve-2016-86505httpseclistsorgfulldisclosure2016Nov766httpsaccessredhatcomsecuritycvecve-2016-86507httpseclistsorgfulldisclosure2016Nov758httpseclistsorgbugtraq2016Nov19Reported to Microsoft Security

10Reported to Apple Product Security

system a mainline patch was proposed The last re-ported Linux vulnerability which calls in the ext4 errorhandling routine panic() and hence results in a kernelpanic was at the time of writing not investigated any fur-ther The NTFS bug in Windows 10 is a non-recoverableerror condition which leads to a blue screen This bugwas reported to Microsoft but has not been confirmedyet Similarly Apple has not yet verified or confirmedour reported macOS bugs

56 Fuzzing PerformanceWe compare the overall performance of kAFL across dif-ferent operating systems To ensure comparable resultswe created a simple driver that contains a JSON parserbased on jsmn11 for each aforementioned operating sys-tem and used it to decode user input (see Listing 2) Ifthe user input is a JSON string starting with KAFL acrash is triggered We traced both the JSON parser aswell as the final check This way kAFL was able to learncorrect JSON syntax We measured the time used to findthe crash the number of executions per second and thespeed for new paths to be discovered on all three targetoperating systems

1 jsmn_parser parser2 jsmntok_t tokens [5]3 jsmn_init (amp parser)4

5 int res=jsmn_parse (ampparser input size tokens 5)6 if(res gt= 2)7 if(tokens [0] type == JSMN_STRING)8 int json_len = tokens [0] end - tokens [0]

start9 int s = tokens [0] start

10 if(json_len gt 0 ampamp input[s+0] == K)11 if(json_len gt 1 ampamp input[s+1] == A)12 if(json_len gt 2 ampamp input[s+2] == F)13 if(json_len gt 3 ampamp input[s+3] == L)14 panic(KERN_INFO KAFL n)15 16

Listing 2 The JSON parser kernel module used for thecoverage benchmarks

We performed five repeated experiments for each op-erating system Additionally we tested TriforceAFL withthe Linux target driver TriforceAFL is unable to fuzzWindows and macOS To compare TriforceAFL withkAFL the associated TriforceLinuxSyscallFuzzer wasslightly modified to work with our vulnerable Linux ker-nel module Unfortunately it was not possible to com-pare kAFL against Oraclersquos file system fuzzer [34] dueto technical issues with its setup

During each run we fuzzed the JSON parser for 30minutes The averaged and rounded results are displayedin Table 1 As we can see the performance of kAFLis very similar across different systems It should be re-marked that the variance in this experiment is rather high

11httpzsergecomjsmnhtml

176 26th USENIX Security Symposium USENIX Association

ExecsSec(1 Process)

ExecsSec(8 Processes)

Time to Crash(1 Process)

Time to Crash(8 Processes)

PathsMin(1 Process)

PathsMin(8 Processes)

TriforceAFL 150 320 -a -a 1008 -b

Linux (initramfs) 3000 5700 750 600 1584 1562Debian 8 3000 5700 455 630 1620 1600Debian 8 (KASan) 4300 5700 920 600 1622 1590macOS (10124) 5100 8100 743 510 1450 1506Windows 10 4300 8700 414 450 1150 1202

a Not found during 30-minute experimentsb This value cannot be obtained since TriforceAFL does not synchronize in such short time frames

Table 1 kAFL and TriforceAFL fuzzing performance on the JSON sample driver

Figure 6 Coverage comparison of kAFL (initramfs) andTriforceAFL kAFL takes less than 3 minutes to find thesame number of paths as TriforceAFL does in 30 minutes(each running 1 process)

and explains some of the surprising results This is due tothe stochastic nature of fuzzing since each fuzzer findsvastly different paths some of which may take signifi-cantly longer to process especially crashing paths andloops One example for high variance is the fact that onDebian 8 (initramfs) the multiprocessing configurationon average needed more time to find the crash than oneprocess

TriforceAFL We used the JSON driver to comparekAFL and TriforceAFL with respect to execution speedand code coverage However the results where biasedheavily in two ways TriforceAFL did not manage to finda path that triggers the crash within 30 minutes (usuallyit takes approximately 2 hours) making it very hard tocompare the code coverage of kAFL and TriforceAFLThe number of discovered paths is not a good indica-tor for the amount of coverage With increasing runningtime it becomes more difficult to discover new pathsSecondly the number of executions per second is also bi-ased by slower and harder to reach paths and especiallycrashing inputs The coverage reached over time can beseen in Figure 6 It is obvious from the figure that kAFLfound a significant number of paths that are very hard to

Figure 7 Raw execution performance comparison

reach for TriforceAFL kAFL mostly stops finding newpaths around the 10-15 minute mark because the targetdriver simply doesnrsquot contain any more paths to be un-covered Therefore the coverage value in Table 1 (statedas PathsMin) is limited to the first 10 minutes of each30-minute run

We also compare raw execution performance insteadof overall fuzzing performance which is biased becauseof the execution of different paths the sampling processfor the non-determinism-filter and various synchroniza-tion mechanisms Especially on smaller inputs thesefactors disproportionately affect the overall fuzzing per-formance To avoid this we compared the performanceduring the first havoc stage Figure 7 shows the raw ex-ecution performance of kAFL compared to TriforceAFLduring this havoc phase kAFL provides up to 54 timesbetter performance compared to TriforceAFLrsquos QEMUCPU emulation Slightly lower performance boosts areseen in single-process execution (48 times faster)

syzkaller We did not perform a performance compari-son against syzkaller [10] This has two reasons First ofall syzkaller is a highly specific syscall fuzzer that en-codes a significant amount of domain knowledge and istherefore not applicable to other domains such as filesys-tem images On the other hand syzkaller would mostlikely generate a significantly higher code coverage evenwithout any feedback since it knows how to generate

USENIX Association 26th USENIX Security Symposium 177

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 5: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

To limit the amount of trace data generated Intel PTprovides multiple options for runtime filtering Depend-ing on the given processor it might be possible to con-figure multiple ranges for instruction-pointer filtering (IPFilter) In general these filter ranges only affect virtualaddresses if paging is enabled this is always the casein x86-64 long-mode Therefore it is possible to limittrace generation to selected ranges and thus avoid hugeamounts of superfluous trace data In accordance to theIP filtering mechanism it is possible to filter traces bythe current privilege level (CPL) of the protection ringmodel (eg ring 0 or ring 3) This filter allows us to selectonly the user mode (CPL gt 0) or kernel mode (CPL = 0)activity kAFL utilizes this filter option to limit tracingexplicitly to kernel mode execution In most cases thefocus of tracing is not the whole OS within all user modeprocesses and their kernel interactions To limit tracedata generation to one specific virtual memory addressspace software can use the CR3 Filter Intel PT will onlyproduce trace data if the CR3 value matches the config-ured filter value The CR3 register contains the pointer tothe current page table The value of the CR3 register canthus be used to filter code executed on behalf of a certainring 3 process even in ring 0 mode

Intel PT supports various configurable target domainsfor output data kAFL focuses on the Table of PhysicalAddresses (ToPA) mechanism that enables us to specifymultiple output regions Every ToPA table contains mul-tiple ToPA entries which in turn contain the physical ad-dress of the associated memory chunk used to store tracedata Each ToPA entry contains the physical address asize specifier for the referred memory chunk in physicalmemory and multiple type bits These type bits specifythe CPUrsquos behavior on access of the ToPA entry and howto deal with filled output regions

3 System Overview

We now provide a high-level overview of the design of anOS-independent and hardware-assisted feedback fuzzerbefore presenting the implementation details of our toolcalled kAFL in Section 4

Our system is split into three components thefuzzing logic the VM infrastructure (modified versionsof QEMU and KVM denoted by QEMU-PT and KVM-PT) and the user mode agent The fuzzing logic runsas a ring 3 process on the host OS This logic is also re-ferred to as kAFL The VM infrastructure consists of aring 3 component (QEMU-PT) and a ring 0 component(KVM-PT) This facilitates communication between theother two components and makes the Intel PT trace dataavailable to the fuzzing logic In general the guest onlycommunicates with the host via hypercalls The host canthen read and write guest memory and continues VM ex-

Figure 1 High-level overview of the kAFL architectureThe setup process ( 1copy- 3copy) is not shown

ecution once the request has been handled A overviewof the architecture can be seen in Figure 1

We now outline the events and communication thattake place during a fuzz run as depicted in Figure 2When the VM is started the first part of the user modeagent (the loader) uses the hypercall HC_SUBMIT_PANICto submit the address of the kernel panic handler (or theBugCheck kernel address in Windows) to QEMU-PT 1copyQEMU-PT then patches a hypercall calling routine at theaddress of the panic handler This allows us to get noti-fied and react fast to crashes in the VM (instead of wait-ing for timeouts reboots)

Then the loader uses the hypercall HC_GET_PROGRAM torequest the actual user mode agent and starts it 2copy Nowthe loader setup is complete and the fuzzer begins its ini-tialization The agent triggers a HC_SUBMIT_CR3 hyper-call that will be handled by KVM-PT The hypervisorextracts the CR3 value of the currently running processand hands it over to QEMU-PT for filtering 3copy Finallythe agent uses the hypercall HC_SUBMIT_BUFFER to in-form the host at which address it expects its inputs Thefuzzer setup is now finished and the main fuzzing loopstarts

During the main loop the agent requests a new inputusing the HC_GET_INPUT hypercall 4copy The fuzzing logicproduces a new input and sends it to QEMU-PT SinceQEMU-PT has full access to the guestrsquos memory spaceit can simply copy the input into the buffer specified bythe agent Then it performs a VM-Entry to continue ex-ecuting the VM 5copy At the same time this VM-Entryevent enables the PT tracing mechanism The agent nowconsumes the input and interacts with the kernel (egit interprets the input as a file system image and tries tomount it 6copy) While the kernel is being fuzzed QEMU-PT decodes the trace data and updates the bitmap on de-mand Once the interaction is finished and the kernelhanded control back to the agent the agent notifies thehypervisor via a HC_FINISHED hypercall The resultingVM-Exit stops the tracing and QEMU-PT decodes theremaining trace data 7copy The resulting bitmap is passed

170 26th USENIX Security Symposium USENIX Association

Figure 2 Overview of the kAFL hypercall interaction

to the logic for further processing 8copy Afterwards theagent can continue to run any untraced clean-up routinesbefore issuing another HC_GET_INPUT to start the nextloop iteration

31 Fuzzing Logic

The fuzzing logic is the command and controlling com-ponent of kAFL It manages the queue of interestinginputs creates mutated inputs and schedules them forevaluation In most aspects it is based on the algo-rithms used by AFL Similarly to AFL we use a bitmapto store basic block transitions We gather the AFLbitmap from the VMs through an interface to QEMU-PTand decide which inputs triggered interesting behaviourThe fuzzing logic also coordinates the number of VMsspawned in parallel One of the bigger design differencesto AFL is that kAFL makes extensive use of multipro-cessing and parallelism where AFL simply spawns mul-tiple independent fuzzers which synchronize their inputqueues sporadically1 In contrast kAFL executes the de-terministic stage in parallel and all threads work on themost interesting input A significant amount of time isspent in tasks that are not CPU-bound (such as gueststhat delay execution) Therefore using many parallelprocesses (upto 5-6 per CPU core) drastically improvesperformance of the fuzzing process due to a higher CPUload per core Lastly the fuzzing logic communicateswith the user interface to display current statistics in reg-ular intervals

1AFL recently added experimental support for distributing thedeterministic stage see httpsgithubcommirroreraflblobmasterdocsparallel_fuzzingtxtL60-L66

32 User Mode AgentWe expect a user mode agent to run inside the virtual-ized target OS In principle this component only has tosynchronize and gather new inputs by the fuzzing logicvia the hypercall interface and use it to interact with theguestrsquos kernel Example agents are programs that try tomount inputs as file system images pass specific filessuch as certificates to kernel parser or even execute achain of various syscalls

In theory we only need one such component In prac-tice we use two different components The first programis the loader component Its job is to accept an arbitrarybinary via the hypercall interface This binary representsthe user mode agent and is executed by the loader com-ponent Additionally the loader component will checkif the agent has crashed (which happens often in case ofsyscall fuzzing) and restarts it if necessary This setuphas the advantage that we can pass any binary to theVM and reuse VM snapshots for different fuzzing com-ponents

33 Virtualization InfrastructureThe fuzzing logic uses QEMU-PT to interact with KVM-PT to spawn the target VMs KVM-PT allows us to traceindividual vCPUs instead of logical CPUs This com-ponent configures and enables Intel PT on the respec-tive logical CPU before the CPU switches to guest ex-ecution and disables tracing during the VM-Exit tran-sition This way the associated CPU will only pro-vide trace data of the virtualized kernel itself QEMU-PT is used to interact with the KVM-PT interface toconfigure and toggle Intel PT from user space and ac-cess the output buffer to decode the trace data The

USENIX Association 26th USENIX Security Symposium 171

decoded trace data is directly translated into a streamof addresses of executed conditional branch instruc-tions Moreover QEMU-PT also filters the stream ofexecuted addressesmdashbased on previous knowledge ofnon-deterministic basic blocksmdashto prevent false-positivefuzzing results and makes those available to the fuzzinglogic as AFL-compatible bitmaps We use our own cus-tom Intel PT decoder to cache disassembly results whichleads to significant performance gains compared to theoff-the-shelf solution provided by Intel

34 Stateful and Non-Deterministic Code

Tracing operating systems results in a significant amountof non-determinism The largest source of non-deterministic basic block transitions are interrupts whichcan occur at any point in time Additionally our imple-mentation does not reset the whole state after each exe-cution since reloading the VM from a memory snapshotis costly Thus we have to deal with the stateful and asyn-chronous aspects of the kernel An example for statefulcode might be a simple call to kmalloc() Dependingon the number of previous allocations kmalloc() mightsimply return a fresh pointer or map a whole range ofpages and update a significant amount of metadata Weuse two techniques to deal with these challenges

The first one is to filter out interrupts and the transi-tion caused while handling interrupts This is possibleusing the Intel PT trace data If an interrupt occurs theprocessor emits a TIP instruction since the transfer is notvisible in the code To avoid confusion during an inter-rupt occurring at an indirect control flow instruction theTIP packet is marked with FUP (flow update packet) toindicate an asynchronous event After identifying such asignature the decoder will drop all basic blocks visiteduntil the corresponding iret instruction is encounteredTo link the interrupts with their corresponding iret wetrack all interrupts on a simple call stack This mecha-nism is necessary since the interrupt handler may itselfbe interrupted by another interrupt

The second mechanism is to blacklist any basic blockthat occurs non-deterministically Each time we en-counter a new bit in the AFL bitmap we re-run the in-put several times in a row Every basic block that doesnot show up in all of the trials will be marked as non-deterministic and filtered from further processing Forfast access the results are stored in a bitmap of black-listed basic block addresses During the AFL bitmaptranslation any transition hash valuemdashwhich combinesthe current basic block address and the previous ba-sic block addressmdashinvolving a blacklisted block will beskipped

35 Hypercalls

Hypercalls are a feature introduced by virtualization OnIntel platforms hypercalls are triggered by the vmcallinstruction Hypercalls are to VMMs as syscalls are tokernels If any ring 3 process or the kernel in the VM ex-ecutes a vmcall instruction a VM-Exit event is triggeredand the VMM can decide how to process the hypercallWe patched KVM-PT to pass through our own set of hy-percalls to the fuzzing logic if a magic value is passedin rax and the appropriate hypercall-ID is set in rbxAdditionally we also patched KVM-PT to accept hyper-calls from ring 3 Arguments for specific hypercalls arepassed through rcx We use this mechanism to definean interface that user mode agent can use to communi-cate with the fuzzing logic One example hypercall isHC_SUBMIT_BUFFER Its argument is a guest pointer thatis stored in rcx Upon executing the vmcall instructiona VM-Exit is triggered and QEMU-PT stores the bufferpointer that was passed It will later copy the new inputdata into this buffer (see step 5copy in Figure 2) Finally theexecution of the VM is continued

climov rax KAFL_MAGIC_VALUEmov rbx HC_CRASHmov rcx 0x0vmcall

Listing 1 Hypercall crash notifier

Another use case for this interface is to notify thefuzzing logic when a crash occurs in the target OS kernelIn order to do so we overwrite the kernel crash handlerof the OS with a simple hypercall routine The injectedcode is shown in Listing 1 and displays how the hyper-call interface is used on the assembly level The cli in-struction disables all interrupts to avoid any kind of asyn-chronous interference during the hypercall routine

4 Implementation Details

Based on the design outlined in the previous section webuilt a prototype of our approach called kAFL In the fol-lowing we describe several implementation details Thesource code of our reference implementation is availableat httpsgithubcomRUB-SysSeckAFL

41 KVM-PT

Intel PT allows us to trace branch transitions withoutpatching or recompiling the targeted kernel To the bestof our knowledge no publicly available driver is able totrace only guest executions of a single vCPU using In-tel PT for long periods of time For instance Simple-PT[29] does not support long-term tracing by design The

172 26th USENIX Security Symposium USENIX Association

perf-subsystem [5] supports tracing of VM guest oper-ations and long-term tracing However it is designed totrace logical CPUs not vCPUs Even if VMX executionis traced the data would be associated with logical CPUsand not with vCPUs Hence the VMX context must bereassembled which is a costly task

To address these shortcomings we developed KVM-PT It allows us to trace vCPUs for an indefinite amountof time without any scheduling side effects or any lossof trace data due to overflowing output regions The ex-tension provides a fast and reliable trace mechanism forKVM vCPUs Moreover this extension exposes muchlike KVM an extensive user mode interface to accessthis tracing feature from user space QEMU-PT utilizesthis novel interface to interact with KVM-PT and to ac-cess the resulting trace data

411 vCPU Specific Traces

To enable Intel PT software that runs within ring0 (in our case KVM-PT) has to set the corre-sponding bit of a model specific register (MSR)(IA32_RTIT_CTL_MSRTraceEn) [28] After tracing isenabled the logical CPU will trace any executed code ifit satisfies the configured filter options The modificationhas to be done before the CPU switches from the hostcontext to the VM operation otherwise the CPU will ex-ecute guest code and is technically unable to modify anyhost MSRs The inverse procedure is required after theCPU has left the guest context However enabling or dis-abling Intel PT manually will also yield a trace contain-ing the manual MSR modification To prevent the collec-tion of unwanted trace data within the VMM we use theMSR autoload capabilities of Intel VT-x MSR autoload-ing can be enabled by modifying the corresponding en-tries in the VMCS (eg VM_ENTRY_CONTROL_MSRfor VM-entries) This forces the CPU to load a list of pre-configured values for defined MSRs after either a VM-entry or VM-exit has occurred By enabling tracing viaMSR autoloading we only gather Intel PT trace data forone specific vCPU

412 Continuous Tracing

Once we have enabled Intel PT the CPU will write theresulting trace data into a memory buffer until it is fullThe physical addresses of this buffer and how to han-dle full buffers is specified by an array of data structurescalled Table of Physical Addresses (ToPA) entries

The array can contain multiple entries and has to beterminated by a single END entry 3copy There are twodifferent ways the CPU can handle an overflow It canstop the tracing (while continuing the executionmdashthus

Figure 3 KVM-PT ToPA configuration

resulting in incomplete traces) or it can raise an inter-rupt This interrupt causes a VM-exit since it is not mask-able We catch the interrupt on the host and consume thetrace data Finally we reset the buffers and continue withthe VM execution Unfortunately this interrupt might beraised at an unspecified time after the buffer was filled2Our configuration of the ToPA entries can be seen in Fig-ure 3 To avoid losing trace data we use two differentToPA entries The first one is the main buffer 1copy Itsoverflow behavior is to trigger the interrupt Once themain buffer is filled a second entry is used until the in-terrupt is actually delivered The ToPA specifies anothersmaller buffer 2copy Overflowing the second buffer wouldlead to the stop of the tracing To avoid the resultingdata loss we chose the second buffer to be about fourtimes larger than the largest overflowing trace we haveever seen in our tests (4 KB)

In case the second buffer also overflows the followingtrace will contain a packet indicating that some data ismissing In that case the size of the second buffer cansimply be increased This way we manage to obtain pre-cise traces for any amount of trace data

42 QEMU-PTTo make use of the KVM extension KVM-PT an userspace counterpart is required QEMU-PT is an extensionof QEMU and provides full support for KVM-PTrsquos userspace interface This interface provides mechanisms toenable disable and configure Intel PT at runtime as wellas a periodic ToPA status check to avoid overruns KVM-PT is accessible from user mode via ioctl() commandsand an mmap() interface

In addition to being a userland interface to KVM-PTQEMU-PT includes a component that decodes trace datainto a form more suitable for the fuzzing logic We de-code the Intel PT packets and turn them into an AFL-likebitmap

421 PT Decoder

Extensive kernel fuzzing may generate several hundredsof megabytes of trace data per second To deal with

2This is due to the current implementation of this interrupt Intelspecifies the interrupt as not precise which means it is likely that fur-ther data will be written to the next buffer or tracing will be terminatedand data will be discarded

USENIX Association 26th USENIX Security Symposium 173

Figure 4 Overview of the pipeline that converts Intel PT traces to kAFL bitmaps

such large amounts of incoming data the decoder mustbe implemented with a focus on efficiency Otherwisethe decoder may become the major bottleneck duringthe fuzzing process Nevertheless the decoder must alsobe precise as inaccuracies during the decoding processwould result in further errors This is due to the nature ofIntel PT decoding since the decoding process is sequen-tial and is affected by previously decoded packets

To ease efforts to implement an Intel PT softwaredecoder Intel provides its own decoding engine calledlibipt [4] libipt is a general-purpose Intel PT decod-ing engine However it does not fit our purposes verywell because libipt decodes trace data in order to pro-vide execution data and flow information Furthermorelibipt does not cache disassembled instructions and hasperformed poorly in our use cases

Since kAFL only relies on flow information and thefuzzing process is repeatedly applied to the same codeit is possible to optimize the decoding process Our In-tel PT software decoder acts like a just-in-time decoderwhich means that code sections are only considered ifthey are executed according to the decoded trace dataTo optimize further look-ups all disassembled code sec-tions are cached In addition we simply ignore packetsthat are not relevant for our use case

Since our PT decoder is part of QEMU-PT trace datais directly processed if the ToPA base region is filledThe decoding process is applied in-place since the bufferis directly accessible from user space via mmap() Un-like other Intel PT drivers we do not need to store largeamounts of trace data in memory or on storage devicesfor post-mortem decoding Eventually the decoded tracedata is translated to the AFL bitmap format

43 AFL Fuzzing Logic

We give a brief description of the fuzzing parts of AFLbecause the logic we use to perform scheduling and mu-tations closely follows that of AFL The most importantaspect of AFL is the bitmap used to trace which basicblock transitions where encountered Each basic blockhas a randomly assigned ID and each transition from ba-sic block A to another basic block B is assigned an offsetinto the bitmap according to the following formula

(id(A)2oplus id(B)) SIZE_OF_BITMAP

Instead of the compile-time random kAFL uses theaddresses of the basic blocks Each time the transition isobserved the corresponding byte in the bitmap is incre-mented After finishing the fuzzing iteration each entryof the bitmap is rounded such that only the highest bit re-mains set Then the bitmap is compared with the globalstatic bitmap to see if any new bit was found If a new bitwas found it is added to the global bitmap and the inputthat triggered the new bit is added to the queue Whena new interesting input is found a deterministic stage isexecuted that tries to mutate each byte individually

Once the deterministic stage is finished the non-deterministic phase is started During this non-deterministic phase multiple mutations are performed atrandom locations If the deterministic phase finds newinputs the non-deterministic phase will be delayed un-til all deterministic phases of all interesting inputs havebeen performed If an input triggers an entirely new tran-sition (as opposed to a change in the number of times thetransition was taken) it will be favored and fuzzed witha higher priority

5 Evaluation

Based on our implementation we now describe thedifferent fuzzing campaigns we performed to evaluatekAFL We evaluate kAFLrsquos fuzzing performance acrossdifferent platforms Section 55 provides an overview ofall reported vulnerabilities crashes and bugs that werefound during the development process of kAFL We alsoevaluate kAFLrsquos ability to find a previously known vul-nerability Finally in Section 56 the overall fuzzingperformance of kAFL is compared to ProjectTriforcethe only other OS-independent feedback fuzzer avail-able TriforceAFL is based on the emulation backendof QEMU instead of hardware-assisted virtualization andIntel PT The performance overhead of KVM-PT is dis-cussed in Section 57 Additionally a performance com-parison of our PT decoder and an Intel implementationof a software decoder is given in Section 58

If not stated otherwise the benchmarks were per-formed on a desktop system with an Intel i7-6700 pro-cessor and 32GB DDR4 RAM To avoid distortions due

174 26th USENIX Security Symposium USENIX Association

to poor IO performance all benchmarks are performedon a RAM disk Similar to AFL we consider a crash-ing input to be unique if it triggered at least one basicblock transition which has not been triggered by any pre-vious crash (ie the bitmap contains at least one newbit) Note this does not imply that the underlying bugsare truly unique

51 Fuzzing Windows

We implemented a small Windows 10 specific user modeagent that mounts any data chunk (fuzzed payload) asNTFS-partitioned volume (289 lines of C code) Weused the Virtual Hard Disk (VHD) API and variousIOCTLS to mount and unmount volumes programmat-ically [31 32] Unfortunately mounting volumes is aslow operation under Windows and we only managedto achieve a throughput of 20 executions per secondNonetheless kAFL managed to find a crash in the NTFSdriver The fuzzer ran for 4 days and 14 hours and re-ported 59 unique crashes all of which were division byzero crashes After manual investigation we suspect thatthere is only one unique bug While it does not allowcode execution it is still a denial-of-service vulnerabilityas for example a USB stick with that malicious NTFSvolume plugged into a critical system will crash that sys-tem with a blue screen It seems that we only scratchedthe surface and NTFS was not thoroughly fuzzed yetHence we assume that the NTFS driver under Windowsis a valuable target for coverage-based feedback fuzzing

Furthermore we implemented a generic system call(syscall) fuzzing agent that simply passes a block of datato a syscall by setting all registers and the top stack re-gion (55 lines of C and 46 lines of assembly code) Thisallows to set parameters for a syscall with a fuzzing pay-load independent of the OS ABI The same fuzzer canbe used to attack syscalls on different operation sys-tems such as Linux or macOS However we evaluatedit against the Windows kernel given the proprietary na-ture of this OS We did not find any bugs in 13 hoursof fuzzing with approx 63M executions since manysyscalls cause the userspace agent to terminate Dueto the coverage-guided feedback kAFL quickly learnedhow to generate payloads to execute valid syscalls andthis led to the unexpected execution of user mode call-backs via the kernel within the fuzzing agent Thesecrashes require rather expensive restarts of the agent andtherefore we only achieved approx 134 executions persecond while normally kAFL achieves a throughput of1000 to 4000 tests per second (see Section 52) Ad-ditionally the Windows syscall interface has already re-ceived much attention by the security community

Figure 5 Fuzzing the ext4 kernel module for 32 hours

52 Fuzzing Linux

We implemented a similar agent for Linux whichmounts data as ext4 volumes (66 lines of C code) Westarted the fuzz campaign with a minimal 64KB ext4 im-age as initial input However we configured the fuzzersuch that it only fuzzes the first two kilobytes duringthe deterministic phase In contrast to Windows theLinux mount process is very fast and we reached 1000to 2000 tests per second on a Thinkpad laptop with ai7-6700HQ26GHz CPU and 32GB RAM Due to thishigh performance we obtained significantly better cov-erage and managed to discover 160 unique crashes andmultiple (confirmed) bugs in the ext4 driver during atwelve-day fuzzing campaign Figure 5 shows the first32 hours of another fuzzing run The fuzzing process wasstill finding new paths and crashes on a fairly regular ba-sis after 32 hours An interesting observation is that therewas no new coverage produced between hours 16 and 25yet the number of inputs increased due a higher numberof loop iterations After hour 25 a truly new input wasfound that unlocked significant parts of the codebase

53 Fuzzing macOS

Similarly to Windows and Linux we targeted multiplefile systems for macOS So far we found approximately150 crashes in the HFS driver and manually confirmedthat at least three of them are unique bugs that lead toa kernel panic Those bugs can be triggered by unprivi-leged users and therefore could very well be abused forlocal denial-of-service attacks One of these bugs seemsto be a use-after-free vulnerability that leads to full con-trol of the rip register Additionally kAFL found 220unique crashes in the APFS kernel extension All 3 HFSvulnerabilities and multiple APFS flaws have been re-ported to Apple

USENIX Association 26th USENIX Security Symposium 175

54 Rediscovery of Known BugsWe evaluated kAFL on the keyctl interface which al-lows a user space program to store and manage vari-ous kinds of key material in the kernel More specif-ically it features a DER (see RFC5280) parser to loadcertificates This functionality had a known bug (CVE-2016-07583) We tested kAFL against the same interfaceon a vulnerable kernel (version 432) kAFL was ableto uncover the same problem and one additional previ-ously unknown bug that was assigned CVE-2016-86504kAFL managed to trigger 17 unique KASan reports and15 unique panics in just one hour of execution time Dur-ing this experiment kAFL generated over 34 million in-puts found 295 interesting inputs and performed nearly9000 executions per second This experiment was per-formed while running 8 processes in parallel

55 Detected VulnerabilitiesDuring the evaluation kAFL found more than a thou-sand unique crashes We evaluated some manually andfound multiple security vulnerabilities in all tested op-erating systems such as Linux Windows and macOSSo far eight bugs were reported and three of them wereconfirmed by the maintainers

bull Linux keyctl Null Pointer Dereference5 (CVE-2016-86506)

bull Linux ext4 Memory Corruption7

bull Linux ext4 Error Handling8

bull Windows NTFS Div-by-Zero9

bull macOS HFS Div-by-Zero10

bull macOS HFS Assertion Fail10

bull macOS HFS Use-After-Free10

bull macOS APFS Memory Corruption10

Red Hat has assigned a CVE number for the first re-ported security flaw which triggers a null pointer def-erence and a partial memory corruption in the kernelASN1 parser if an RSA certificate with a zero expo-nent is presented For the second reported vulnerabil-ity which triggers a memory corruption in the ext4 file

3httpsaccessredhatcomsecuritycvecve-2016-07584httpsaccessredhatcomsecuritycvecve-2016-86505httpseclistsorgfulldisclosure2016Nov766httpsaccessredhatcomsecuritycvecve-2016-86507httpseclistsorgfulldisclosure2016Nov758httpseclistsorgbugtraq2016Nov19Reported to Microsoft Security

10Reported to Apple Product Security

system a mainline patch was proposed The last re-ported Linux vulnerability which calls in the ext4 errorhandling routine panic() and hence results in a kernelpanic was at the time of writing not investigated any fur-ther The NTFS bug in Windows 10 is a non-recoverableerror condition which leads to a blue screen This bugwas reported to Microsoft but has not been confirmedyet Similarly Apple has not yet verified or confirmedour reported macOS bugs

56 Fuzzing PerformanceWe compare the overall performance of kAFL across dif-ferent operating systems To ensure comparable resultswe created a simple driver that contains a JSON parserbased on jsmn11 for each aforementioned operating sys-tem and used it to decode user input (see Listing 2) Ifthe user input is a JSON string starting with KAFL acrash is triggered We traced both the JSON parser aswell as the final check This way kAFL was able to learncorrect JSON syntax We measured the time used to findthe crash the number of executions per second and thespeed for new paths to be discovered on all three targetoperating systems

1 jsmn_parser parser2 jsmntok_t tokens [5]3 jsmn_init (amp parser)4

5 int res=jsmn_parse (ampparser input size tokens 5)6 if(res gt= 2)7 if(tokens [0] type == JSMN_STRING)8 int json_len = tokens [0] end - tokens [0]

start9 int s = tokens [0] start

10 if(json_len gt 0 ampamp input[s+0] == K)11 if(json_len gt 1 ampamp input[s+1] == A)12 if(json_len gt 2 ampamp input[s+2] == F)13 if(json_len gt 3 ampamp input[s+3] == L)14 panic(KERN_INFO KAFL n)15 16

Listing 2 The JSON parser kernel module used for thecoverage benchmarks

We performed five repeated experiments for each op-erating system Additionally we tested TriforceAFL withthe Linux target driver TriforceAFL is unable to fuzzWindows and macOS To compare TriforceAFL withkAFL the associated TriforceLinuxSyscallFuzzer wasslightly modified to work with our vulnerable Linux ker-nel module Unfortunately it was not possible to com-pare kAFL against Oraclersquos file system fuzzer [34] dueto technical issues with its setup

During each run we fuzzed the JSON parser for 30minutes The averaged and rounded results are displayedin Table 1 As we can see the performance of kAFLis very similar across different systems It should be re-marked that the variance in this experiment is rather high

11httpzsergecomjsmnhtml

176 26th USENIX Security Symposium USENIX Association

ExecsSec(1 Process)

ExecsSec(8 Processes)

Time to Crash(1 Process)

Time to Crash(8 Processes)

PathsMin(1 Process)

PathsMin(8 Processes)

TriforceAFL 150 320 -a -a 1008 -b

Linux (initramfs) 3000 5700 750 600 1584 1562Debian 8 3000 5700 455 630 1620 1600Debian 8 (KASan) 4300 5700 920 600 1622 1590macOS (10124) 5100 8100 743 510 1450 1506Windows 10 4300 8700 414 450 1150 1202

a Not found during 30-minute experimentsb This value cannot be obtained since TriforceAFL does not synchronize in such short time frames

Table 1 kAFL and TriforceAFL fuzzing performance on the JSON sample driver

Figure 6 Coverage comparison of kAFL (initramfs) andTriforceAFL kAFL takes less than 3 minutes to find thesame number of paths as TriforceAFL does in 30 minutes(each running 1 process)

and explains some of the surprising results This is due tothe stochastic nature of fuzzing since each fuzzer findsvastly different paths some of which may take signifi-cantly longer to process especially crashing paths andloops One example for high variance is the fact that onDebian 8 (initramfs) the multiprocessing configurationon average needed more time to find the crash than oneprocess

TriforceAFL We used the JSON driver to comparekAFL and TriforceAFL with respect to execution speedand code coverage However the results where biasedheavily in two ways TriforceAFL did not manage to finda path that triggers the crash within 30 minutes (usuallyit takes approximately 2 hours) making it very hard tocompare the code coverage of kAFL and TriforceAFLThe number of discovered paths is not a good indica-tor for the amount of coverage With increasing runningtime it becomes more difficult to discover new pathsSecondly the number of executions per second is also bi-ased by slower and harder to reach paths and especiallycrashing inputs The coverage reached over time can beseen in Figure 6 It is obvious from the figure that kAFLfound a significant number of paths that are very hard to

Figure 7 Raw execution performance comparison

reach for TriforceAFL kAFL mostly stops finding newpaths around the 10-15 minute mark because the targetdriver simply doesnrsquot contain any more paths to be un-covered Therefore the coverage value in Table 1 (statedas PathsMin) is limited to the first 10 minutes of each30-minute run

We also compare raw execution performance insteadof overall fuzzing performance which is biased becauseof the execution of different paths the sampling processfor the non-determinism-filter and various synchroniza-tion mechanisms Especially on smaller inputs thesefactors disproportionately affect the overall fuzzing per-formance To avoid this we compared the performanceduring the first havoc stage Figure 7 shows the raw ex-ecution performance of kAFL compared to TriforceAFLduring this havoc phase kAFL provides up to 54 timesbetter performance compared to TriforceAFLrsquos QEMUCPU emulation Slightly lower performance boosts areseen in single-process execution (48 times faster)

syzkaller We did not perform a performance compari-son against syzkaller [10] This has two reasons First ofall syzkaller is a highly specific syscall fuzzer that en-codes a significant amount of domain knowledge and istherefore not applicable to other domains such as filesys-tem images On the other hand syzkaller would mostlikely generate a significantly higher code coverage evenwithout any feedback since it knows how to generate

USENIX Association 26th USENIX Security Symposium 177

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 6: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

Figure 2 Overview of the kAFL hypercall interaction

to the logic for further processing 8copy Afterwards theagent can continue to run any untraced clean-up routinesbefore issuing another HC_GET_INPUT to start the nextloop iteration

31 Fuzzing Logic

The fuzzing logic is the command and controlling com-ponent of kAFL It manages the queue of interestinginputs creates mutated inputs and schedules them forevaluation In most aspects it is based on the algo-rithms used by AFL Similarly to AFL we use a bitmapto store basic block transitions We gather the AFLbitmap from the VMs through an interface to QEMU-PTand decide which inputs triggered interesting behaviourThe fuzzing logic also coordinates the number of VMsspawned in parallel One of the bigger design differencesto AFL is that kAFL makes extensive use of multipro-cessing and parallelism where AFL simply spawns mul-tiple independent fuzzers which synchronize their inputqueues sporadically1 In contrast kAFL executes the de-terministic stage in parallel and all threads work on themost interesting input A significant amount of time isspent in tasks that are not CPU-bound (such as gueststhat delay execution) Therefore using many parallelprocesses (upto 5-6 per CPU core) drastically improvesperformance of the fuzzing process due to a higher CPUload per core Lastly the fuzzing logic communicateswith the user interface to display current statistics in reg-ular intervals

1AFL recently added experimental support for distributing thedeterministic stage see httpsgithubcommirroreraflblobmasterdocsparallel_fuzzingtxtL60-L66

32 User Mode AgentWe expect a user mode agent to run inside the virtual-ized target OS In principle this component only has tosynchronize and gather new inputs by the fuzzing logicvia the hypercall interface and use it to interact with theguestrsquos kernel Example agents are programs that try tomount inputs as file system images pass specific filessuch as certificates to kernel parser or even execute achain of various syscalls

In theory we only need one such component In prac-tice we use two different components The first programis the loader component Its job is to accept an arbitrarybinary via the hypercall interface This binary representsthe user mode agent and is executed by the loader com-ponent Additionally the loader component will checkif the agent has crashed (which happens often in case ofsyscall fuzzing) and restarts it if necessary This setuphas the advantage that we can pass any binary to theVM and reuse VM snapshots for different fuzzing com-ponents

33 Virtualization InfrastructureThe fuzzing logic uses QEMU-PT to interact with KVM-PT to spawn the target VMs KVM-PT allows us to traceindividual vCPUs instead of logical CPUs This com-ponent configures and enables Intel PT on the respec-tive logical CPU before the CPU switches to guest ex-ecution and disables tracing during the VM-Exit tran-sition This way the associated CPU will only pro-vide trace data of the virtualized kernel itself QEMU-PT is used to interact with the KVM-PT interface toconfigure and toggle Intel PT from user space and ac-cess the output buffer to decode the trace data The

USENIX Association 26th USENIX Security Symposium 171

decoded trace data is directly translated into a streamof addresses of executed conditional branch instruc-tions Moreover QEMU-PT also filters the stream ofexecuted addressesmdashbased on previous knowledge ofnon-deterministic basic blocksmdashto prevent false-positivefuzzing results and makes those available to the fuzzinglogic as AFL-compatible bitmaps We use our own cus-tom Intel PT decoder to cache disassembly results whichleads to significant performance gains compared to theoff-the-shelf solution provided by Intel

34 Stateful and Non-Deterministic Code

Tracing operating systems results in a significant amountof non-determinism The largest source of non-deterministic basic block transitions are interrupts whichcan occur at any point in time Additionally our imple-mentation does not reset the whole state after each exe-cution since reloading the VM from a memory snapshotis costly Thus we have to deal with the stateful and asyn-chronous aspects of the kernel An example for statefulcode might be a simple call to kmalloc() Dependingon the number of previous allocations kmalloc() mightsimply return a fresh pointer or map a whole range ofpages and update a significant amount of metadata Weuse two techniques to deal with these challenges

The first one is to filter out interrupts and the transi-tion caused while handling interrupts This is possibleusing the Intel PT trace data If an interrupt occurs theprocessor emits a TIP instruction since the transfer is notvisible in the code To avoid confusion during an inter-rupt occurring at an indirect control flow instruction theTIP packet is marked with FUP (flow update packet) toindicate an asynchronous event After identifying such asignature the decoder will drop all basic blocks visiteduntil the corresponding iret instruction is encounteredTo link the interrupts with their corresponding iret wetrack all interrupts on a simple call stack This mecha-nism is necessary since the interrupt handler may itselfbe interrupted by another interrupt

The second mechanism is to blacklist any basic blockthat occurs non-deterministically Each time we en-counter a new bit in the AFL bitmap we re-run the in-put several times in a row Every basic block that doesnot show up in all of the trials will be marked as non-deterministic and filtered from further processing Forfast access the results are stored in a bitmap of black-listed basic block addresses During the AFL bitmaptranslation any transition hash valuemdashwhich combinesthe current basic block address and the previous ba-sic block addressmdashinvolving a blacklisted block will beskipped

35 Hypercalls

Hypercalls are a feature introduced by virtualization OnIntel platforms hypercalls are triggered by the vmcallinstruction Hypercalls are to VMMs as syscalls are tokernels If any ring 3 process or the kernel in the VM ex-ecutes a vmcall instruction a VM-Exit event is triggeredand the VMM can decide how to process the hypercallWe patched KVM-PT to pass through our own set of hy-percalls to the fuzzing logic if a magic value is passedin rax and the appropriate hypercall-ID is set in rbxAdditionally we also patched KVM-PT to accept hyper-calls from ring 3 Arguments for specific hypercalls arepassed through rcx We use this mechanism to definean interface that user mode agent can use to communi-cate with the fuzzing logic One example hypercall isHC_SUBMIT_BUFFER Its argument is a guest pointer thatis stored in rcx Upon executing the vmcall instructiona VM-Exit is triggered and QEMU-PT stores the bufferpointer that was passed It will later copy the new inputdata into this buffer (see step 5copy in Figure 2) Finally theexecution of the VM is continued

climov rax KAFL_MAGIC_VALUEmov rbx HC_CRASHmov rcx 0x0vmcall

Listing 1 Hypercall crash notifier

Another use case for this interface is to notify thefuzzing logic when a crash occurs in the target OS kernelIn order to do so we overwrite the kernel crash handlerof the OS with a simple hypercall routine The injectedcode is shown in Listing 1 and displays how the hyper-call interface is used on the assembly level The cli in-struction disables all interrupts to avoid any kind of asyn-chronous interference during the hypercall routine

4 Implementation Details

Based on the design outlined in the previous section webuilt a prototype of our approach called kAFL In the fol-lowing we describe several implementation details Thesource code of our reference implementation is availableat httpsgithubcomRUB-SysSeckAFL

41 KVM-PT

Intel PT allows us to trace branch transitions withoutpatching or recompiling the targeted kernel To the bestof our knowledge no publicly available driver is able totrace only guest executions of a single vCPU using In-tel PT for long periods of time For instance Simple-PT[29] does not support long-term tracing by design The

172 26th USENIX Security Symposium USENIX Association

perf-subsystem [5] supports tracing of VM guest oper-ations and long-term tracing However it is designed totrace logical CPUs not vCPUs Even if VMX executionis traced the data would be associated with logical CPUsand not with vCPUs Hence the VMX context must bereassembled which is a costly task

To address these shortcomings we developed KVM-PT It allows us to trace vCPUs for an indefinite amountof time without any scheduling side effects or any lossof trace data due to overflowing output regions The ex-tension provides a fast and reliable trace mechanism forKVM vCPUs Moreover this extension exposes muchlike KVM an extensive user mode interface to accessthis tracing feature from user space QEMU-PT utilizesthis novel interface to interact with KVM-PT and to ac-cess the resulting trace data

411 vCPU Specific Traces

To enable Intel PT software that runs within ring0 (in our case KVM-PT) has to set the corre-sponding bit of a model specific register (MSR)(IA32_RTIT_CTL_MSRTraceEn) [28] After tracing isenabled the logical CPU will trace any executed code ifit satisfies the configured filter options The modificationhas to be done before the CPU switches from the hostcontext to the VM operation otherwise the CPU will ex-ecute guest code and is technically unable to modify anyhost MSRs The inverse procedure is required after theCPU has left the guest context However enabling or dis-abling Intel PT manually will also yield a trace contain-ing the manual MSR modification To prevent the collec-tion of unwanted trace data within the VMM we use theMSR autoload capabilities of Intel VT-x MSR autoload-ing can be enabled by modifying the corresponding en-tries in the VMCS (eg VM_ENTRY_CONTROL_MSRfor VM-entries) This forces the CPU to load a list of pre-configured values for defined MSRs after either a VM-entry or VM-exit has occurred By enabling tracing viaMSR autoloading we only gather Intel PT trace data forone specific vCPU

412 Continuous Tracing

Once we have enabled Intel PT the CPU will write theresulting trace data into a memory buffer until it is fullThe physical addresses of this buffer and how to han-dle full buffers is specified by an array of data structurescalled Table of Physical Addresses (ToPA) entries

The array can contain multiple entries and has to beterminated by a single END entry 3copy There are twodifferent ways the CPU can handle an overflow It canstop the tracing (while continuing the executionmdashthus

Figure 3 KVM-PT ToPA configuration

resulting in incomplete traces) or it can raise an inter-rupt This interrupt causes a VM-exit since it is not mask-able We catch the interrupt on the host and consume thetrace data Finally we reset the buffers and continue withthe VM execution Unfortunately this interrupt might beraised at an unspecified time after the buffer was filled2Our configuration of the ToPA entries can be seen in Fig-ure 3 To avoid losing trace data we use two differentToPA entries The first one is the main buffer 1copy Itsoverflow behavior is to trigger the interrupt Once themain buffer is filled a second entry is used until the in-terrupt is actually delivered The ToPA specifies anothersmaller buffer 2copy Overflowing the second buffer wouldlead to the stop of the tracing To avoid the resultingdata loss we chose the second buffer to be about fourtimes larger than the largest overflowing trace we haveever seen in our tests (4 KB)

In case the second buffer also overflows the followingtrace will contain a packet indicating that some data ismissing In that case the size of the second buffer cansimply be increased This way we manage to obtain pre-cise traces for any amount of trace data

42 QEMU-PTTo make use of the KVM extension KVM-PT an userspace counterpart is required QEMU-PT is an extensionof QEMU and provides full support for KVM-PTrsquos userspace interface This interface provides mechanisms toenable disable and configure Intel PT at runtime as wellas a periodic ToPA status check to avoid overruns KVM-PT is accessible from user mode via ioctl() commandsand an mmap() interface

In addition to being a userland interface to KVM-PTQEMU-PT includes a component that decodes trace datainto a form more suitable for the fuzzing logic We de-code the Intel PT packets and turn them into an AFL-likebitmap

421 PT Decoder

Extensive kernel fuzzing may generate several hundredsof megabytes of trace data per second To deal with

2This is due to the current implementation of this interrupt Intelspecifies the interrupt as not precise which means it is likely that fur-ther data will be written to the next buffer or tracing will be terminatedand data will be discarded

USENIX Association 26th USENIX Security Symposium 173

Figure 4 Overview of the pipeline that converts Intel PT traces to kAFL bitmaps

such large amounts of incoming data the decoder mustbe implemented with a focus on efficiency Otherwisethe decoder may become the major bottleneck duringthe fuzzing process Nevertheless the decoder must alsobe precise as inaccuracies during the decoding processwould result in further errors This is due to the nature ofIntel PT decoding since the decoding process is sequen-tial and is affected by previously decoded packets

To ease efforts to implement an Intel PT softwaredecoder Intel provides its own decoding engine calledlibipt [4] libipt is a general-purpose Intel PT decod-ing engine However it does not fit our purposes verywell because libipt decodes trace data in order to pro-vide execution data and flow information Furthermorelibipt does not cache disassembled instructions and hasperformed poorly in our use cases

Since kAFL only relies on flow information and thefuzzing process is repeatedly applied to the same codeit is possible to optimize the decoding process Our In-tel PT software decoder acts like a just-in-time decoderwhich means that code sections are only considered ifthey are executed according to the decoded trace dataTo optimize further look-ups all disassembled code sec-tions are cached In addition we simply ignore packetsthat are not relevant for our use case

Since our PT decoder is part of QEMU-PT trace datais directly processed if the ToPA base region is filledThe decoding process is applied in-place since the bufferis directly accessible from user space via mmap() Un-like other Intel PT drivers we do not need to store largeamounts of trace data in memory or on storage devicesfor post-mortem decoding Eventually the decoded tracedata is translated to the AFL bitmap format

43 AFL Fuzzing Logic

We give a brief description of the fuzzing parts of AFLbecause the logic we use to perform scheduling and mu-tations closely follows that of AFL The most importantaspect of AFL is the bitmap used to trace which basicblock transitions where encountered Each basic blockhas a randomly assigned ID and each transition from ba-sic block A to another basic block B is assigned an offsetinto the bitmap according to the following formula

(id(A)2oplus id(B)) SIZE_OF_BITMAP

Instead of the compile-time random kAFL uses theaddresses of the basic blocks Each time the transition isobserved the corresponding byte in the bitmap is incre-mented After finishing the fuzzing iteration each entryof the bitmap is rounded such that only the highest bit re-mains set Then the bitmap is compared with the globalstatic bitmap to see if any new bit was found If a new bitwas found it is added to the global bitmap and the inputthat triggered the new bit is added to the queue Whena new interesting input is found a deterministic stage isexecuted that tries to mutate each byte individually

Once the deterministic stage is finished the non-deterministic phase is started During this non-deterministic phase multiple mutations are performed atrandom locations If the deterministic phase finds newinputs the non-deterministic phase will be delayed un-til all deterministic phases of all interesting inputs havebeen performed If an input triggers an entirely new tran-sition (as opposed to a change in the number of times thetransition was taken) it will be favored and fuzzed witha higher priority

5 Evaluation

Based on our implementation we now describe thedifferent fuzzing campaigns we performed to evaluatekAFL We evaluate kAFLrsquos fuzzing performance acrossdifferent platforms Section 55 provides an overview ofall reported vulnerabilities crashes and bugs that werefound during the development process of kAFL We alsoevaluate kAFLrsquos ability to find a previously known vul-nerability Finally in Section 56 the overall fuzzingperformance of kAFL is compared to ProjectTriforcethe only other OS-independent feedback fuzzer avail-able TriforceAFL is based on the emulation backendof QEMU instead of hardware-assisted virtualization andIntel PT The performance overhead of KVM-PT is dis-cussed in Section 57 Additionally a performance com-parison of our PT decoder and an Intel implementationof a software decoder is given in Section 58

If not stated otherwise the benchmarks were per-formed on a desktop system with an Intel i7-6700 pro-cessor and 32GB DDR4 RAM To avoid distortions due

174 26th USENIX Security Symposium USENIX Association

to poor IO performance all benchmarks are performedon a RAM disk Similar to AFL we consider a crash-ing input to be unique if it triggered at least one basicblock transition which has not been triggered by any pre-vious crash (ie the bitmap contains at least one newbit) Note this does not imply that the underlying bugsare truly unique

51 Fuzzing Windows

We implemented a small Windows 10 specific user modeagent that mounts any data chunk (fuzzed payload) asNTFS-partitioned volume (289 lines of C code) Weused the Virtual Hard Disk (VHD) API and variousIOCTLS to mount and unmount volumes programmat-ically [31 32] Unfortunately mounting volumes is aslow operation under Windows and we only managedto achieve a throughput of 20 executions per secondNonetheless kAFL managed to find a crash in the NTFSdriver The fuzzer ran for 4 days and 14 hours and re-ported 59 unique crashes all of which were division byzero crashes After manual investigation we suspect thatthere is only one unique bug While it does not allowcode execution it is still a denial-of-service vulnerabilityas for example a USB stick with that malicious NTFSvolume plugged into a critical system will crash that sys-tem with a blue screen It seems that we only scratchedthe surface and NTFS was not thoroughly fuzzed yetHence we assume that the NTFS driver under Windowsis a valuable target for coverage-based feedback fuzzing

Furthermore we implemented a generic system call(syscall) fuzzing agent that simply passes a block of datato a syscall by setting all registers and the top stack re-gion (55 lines of C and 46 lines of assembly code) Thisallows to set parameters for a syscall with a fuzzing pay-load independent of the OS ABI The same fuzzer canbe used to attack syscalls on different operation sys-tems such as Linux or macOS However we evaluatedit against the Windows kernel given the proprietary na-ture of this OS We did not find any bugs in 13 hoursof fuzzing with approx 63M executions since manysyscalls cause the userspace agent to terminate Dueto the coverage-guided feedback kAFL quickly learnedhow to generate payloads to execute valid syscalls andthis led to the unexpected execution of user mode call-backs via the kernel within the fuzzing agent Thesecrashes require rather expensive restarts of the agent andtherefore we only achieved approx 134 executions persecond while normally kAFL achieves a throughput of1000 to 4000 tests per second (see Section 52) Ad-ditionally the Windows syscall interface has already re-ceived much attention by the security community

Figure 5 Fuzzing the ext4 kernel module for 32 hours

52 Fuzzing Linux

We implemented a similar agent for Linux whichmounts data as ext4 volumes (66 lines of C code) Westarted the fuzz campaign with a minimal 64KB ext4 im-age as initial input However we configured the fuzzersuch that it only fuzzes the first two kilobytes duringthe deterministic phase In contrast to Windows theLinux mount process is very fast and we reached 1000to 2000 tests per second on a Thinkpad laptop with ai7-6700HQ26GHz CPU and 32GB RAM Due to thishigh performance we obtained significantly better cov-erage and managed to discover 160 unique crashes andmultiple (confirmed) bugs in the ext4 driver during atwelve-day fuzzing campaign Figure 5 shows the first32 hours of another fuzzing run The fuzzing process wasstill finding new paths and crashes on a fairly regular ba-sis after 32 hours An interesting observation is that therewas no new coverage produced between hours 16 and 25yet the number of inputs increased due a higher numberof loop iterations After hour 25 a truly new input wasfound that unlocked significant parts of the codebase

53 Fuzzing macOS

Similarly to Windows and Linux we targeted multiplefile systems for macOS So far we found approximately150 crashes in the HFS driver and manually confirmedthat at least three of them are unique bugs that lead toa kernel panic Those bugs can be triggered by unprivi-leged users and therefore could very well be abused forlocal denial-of-service attacks One of these bugs seemsto be a use-after-free vulnerability that leads to full con-trol of the rip register Additionally kAFL found 220unique crashes in the APFS kernel extension All 3 HFSvulnerabilities and multiple APFS flaws have been re-ported to Apple

USENIX Association 26th USENIX Security Symposium 175

54 Rediscovery of Known BugsWe evaluated kAFL on the keyctl interface which al-lows a user space program to store and manage vari-ous kinds of key material in the kernel More specif-ically it features a DER (see RFC5280) parser to loadcertificates This functionality had a known bug (CVE-2016-07583) We tested kAFL against the same interfaceon a vulnerable kernel (version 432) kAFL was ableto uncover the same problem and one additional previ-ously unknown bug that was assigned CVE-2016-86504kAFL managed to trigger 17 unique KASan reports and15 unique panics in just one hour of execution time Dur-ing this experiment kAFL generated over 34 million in-puts found 295 interesting inputs and performed nearly9000 executions per second This experiment was per-formed while running 8 processes in parallel

55 Detected VulnerabilitiesDuring the evaluation kAFL found more than a thou-sand unique crashes We evaluated some manually andfound multiple security vulnerabilities in all tested op-erating systems such as Linux Windows and macOSSo far eight bugs were reported and three of them wereconfirmed by the maintainers

bull Linux keyctl Null Pointer Dereference5 (CVE-2016-86506)

bull Linux ext4 Memory Corruption7

bull Linux ext4 Error Handling8

bull Windows NTFS Div-by-Zero9

bull macOS HFS Div-by-Zero10

bull macOS HFS Assertion Fail10

bull macOS HFS Use-After-Free10

bull macOS APFS Memory Corruption10

Red Hat has assigned a CVE number for the first re-ported security flaw which triggers a null pointer def-erence and a partial memory corruption in the kernelASN1 parser if an RSA certificate with a zero expo-nent is presented For the second reported vulnerabil-ity which triggers a memory corruption in the ext4 file

3httpsaccessredhatcomsecuritycvecve-2016-07584httpsaccessredhatcomsecuritycvecve-2016-86505httpseclistsorgfulldisclosure2016Nov766httpsaccessredhatcomsecuritycvecve-2016-86507httpseclistsorgfulldisclosure2016Nov758httpseclistsorgbugtraq2016Nov19Reported to Microsoft Security

10Reported to Apple Product Security

system a mainline patch was proposed The last re-ported Linux vulnerability which calls in the ext4 errorhandling routine panic() and hence results in a kernelpanic was at the time of writing not investigated any fur-ther The NTFS bug in Windows 10 is a non-recoverableerror condition which leads to a blue screen This bugwas reported to Microsoft but has not been confirmedyet Similarly Apple has not yet verified or confirmedour reported macOS bugs

56 Fuzzing PerformanceWe compare the overall performance of kAFL across dif-ferent operating systems To ensure comparable resultswe created a simple driver that contains a JSON parserbased on jsmn11 for each aforementioned operating sys-tem and used it to decode user input (see Listing 2) Ifthe user input is a JSON string starting with KAFL acrash is triggered We traced both the JSON parser aswell as the final check This way kAFL was able to learncorrect JSON syntax We measured the time used to findthe crash the number of executions per second and thespeed for new paths to be discovered on all three targetoperating systems

1 jsmn_parser parser2 jsmntok_t tokens [5]3 jsmn_init (amp parser)4

5 int res=jsmn_parse (ampparser input size tokens 5)6 if(res gt= 2)7 if(tokens [0] type == JSMN_STRING)8 int json_len = tokens [0] end - tokens [0]

start9 int s = tokens [0] start

10 if(json_len gt 0 ampamp input[s+0] == K)11 if(json_len gt 1 ampamp input[s+1] == A)12 if(json_len gt 2 ampamp input[s+2] == F)13 if(json_len gt 3 ampamp input[s+3] == L)14 panic(KERN_INFO KAFL n)15 16

Listing 2 The JSON parser kernel module used for thecoverage benchmarks

We performed five repeated experiments for each op-erating system Additionally we tested TriforceAFL withthe Linux target driver TriforceAFL is unable to fuzzWindows and macOS To compare TriforceAFL withkAFL the associated TriforceLinuxSyscallFuzzer wasslightly modified to work with our vulnerable Linux ker-nel module Unfortunately it was not possible to com-pare kAFL against Oraclersquos file system fuzzer [34] dueto technical issues with its setup

During each run we fuzzed the JSON parser for 30minutes The averaged and rounded results are displayedin Table 1 As we can see the performance of kAFLis very similar across different systems It should be re-marked that the variance in this experiment is rather high

11httpzsergecomjsmnhtml

176 26th USENIX Security Symposium USENIX Association

ExecsSec(1 Process)

ExecsSec(8 Processes)

Time to Crash(1 Process)

Time to Crash(8 Processes)

PathsMin(1 Process)

PathsMin(8 Processes)

TriforceAFL 150 320 -a -a 1008 -b

Linux (initramfs) 3000 5700 750 600 1584 1562Debian 8 3000 5700 455 630 1620 1600Debian 8 (KASan) 4300 5700 920 600 1622 1590macOS (10124) 5100 8100 743 510 1450 1506Windows 10 4300 8700 414 450 1150 1202

a Not found during 30-minute experimentsb This value cannot be obtained since TriforceAFL does not synchronize in such short time frames

Table 1 kAFL and TriforceAFL fuzzing performance on the JSON sample driver

Figure 6 Coverage comparison of kAFL (initramfs) andTriforceAFL kAFL takes less than 3 minutes to find thesame number of paths as TriforceAFL does in 30 minutes(each running 1 process)

and explains some of the surprising results This is due tothe stochastic nature of fuzzing since each fuzzer findsvastly different paths some of which may take signifi-cantly longer to process especially crashing paths andloops One example for high variance is the fact that onDebian 8 (initramfs) the multiprocessing configurationon average needed more time to find the crash than oneprocess

TriforceAFL We used the JSON driver to comparekAFL and TriforceAFL with respect to execution speedand code coverage However the results where biasedheavily in two ways TriforceAFL did not manage to finda path that triggers the crash within 30 minutes (usuallyit takes approximately 2 hours) making it very hard tocompare the code coverage of kAFL and TriforceAFLThe number of discovered paths is not a good indica-tor for the amount of coverage With increasing runningtime it becomes more difficult to discover new pathsSecondly the number of executions per second is also bi-ased by slower and harder to reach paths and especiallycrashing inputs The coverage reached over time can beseen in Figure 6 It is obvious from the figure that kAFLfound a significant number of paths that are very hard to

Figure 7 Raw execution performance comparison

reach for TriforceAFL kAFL mostly stops finding newpaths around the 10-15 minute mark because the targetdriver simply doesnrsquot contain any more paths to be un-covered Therefore the coverage value in Table 1 (statedas PathsMin) is limited to the first 10 minutes of each30-minute run

We also compare raw execution performance insteadof overall fuzzing performance which is biased becauseof the execution of different paths the sampling processfor the non-determinism-filter and various synchroniza-tion mechanisms Especially on smaller inputs thesefactors disproportionately affect the overall fuzzing per-formance To avoid this we compared the performanceduring the first havoc stage Figure 7 shows the raw ex-ecution performance of kAFL compared to TriforceAFLduring this havoc phase kAFL provides up to 54 timesbetter performance compared to TriforceAFLrsquos QEMUCPU emulation Slightly lower performance boosts areseen in single-process execution (48 times faster)

syzkaller We did not perform a performance compari-son against syzkaller [10] This has two reasons First ofall syzkaller is a highly specific syscall fuzzer that en-codes a significant amount of domain knowledge and istherefore not applicable to other domains such as filesys-tem images On the other hand syzkaller would mostlikely generate a significantly higher code coverage evenwithout any feedback since it knows how to generate

USENIX Association 26th USENIX Security Symposium 177

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 7: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

decoded trace data is directly translated into a streamof addresses of executed conditional branch instruc-tions Moreover QEMU-PT also filters the stream ofexecuted addressesmdashbased on previous knowledge ofnon-deterministic basic blocksmdashto prevent false-positivefuzzing results and makes those available to the fuzzinglogic as AFL-compatible bitmaps We use our own cus-tom Intel PT decoder to cache disassembly results whichleads to significant performance gains compared to theoff-the-shelf solution provided by Intel

34 Stateful and Non-Deterministic Code

Tracing operating systems results in a significant amountof non-determinism The largest source of non-deterministic basic block transitions are interrupts whichcan occur at any point in time Additionally our imple-mentation does not reset the whole state after each exe-cution since reloading the VM from a memory snapshotis costly Thus we have to deal with the stateful and asyn-chronous aspects of the kernel An example for statefulcode might be a simple call to kmalloc() Dependingon the number of previous allocations kmalloc() mightsimply return a fresh pointer or map a whole range ofpages and update a significant amount of metadata Weuse two techniques to deal with these challenges

The first one is to filter out interrupts and the transi-tion caused while handling interrupts This is possibleusing the Intel PT trace data If an interrupt occurs theprocessor emits a TIP instruction since the transfer is notvisible in the code To avoid confusion during an inter-rupt occurring at an indirect control flow instruction theTIP packet is marked with FUP (flow update packet) toindicate an asynchronous event After identifying such asignature the decoder will drop all basic blocks visiteduntil the corresponding iret instruction is encounteredTo link the interrupts with their corresponding iret wetrack all interrupts on a simple call stack This mecha-nism is necessary since the interrupt handler may itselfbe interrupted by another interrupt

The second mechanism is to blacklist any basic blockthat occurs non-deterministically Each time we en-counter a new bit in the AFL bitmap we re-run the in-put several times in a row Every basic block that doesnot show up in all of the trials will be marked as non-deterministic and filtered from further processing Forfast access the results are stored in a bitmap of black-listed basic block addresses During the AFL bitmaptranslation any transition hash valuemdashwhich combinesthe current basic block address and the previous ba-sic block addressmdashinvolving a blacklisted block will beskipped

35 Hypercalls

Hypercalls are a feature introduced by virtualization OnIntel platforms hypercalls are triggered by the vmcallinstruction Hypercalls are to VMMs as syscalls are tokernels If any ring 3 process or the kernel in the VM ex-ecutes a vmcall instruction a VM-Exit event is triggeredand the VMM can decide how to process the hypercallWe patched KVM-PT to pass through our own set of hy-percalls to the fuzzing logic if a magic value is passedin rax and the appropriate hypercall-ID is set in rbxAdditionally we also patched KVM-PT to accept hyper-calls from ring 3 Arguments for specific hypercalls arepassed through rcx We use this mechanism to definean interface that user mode agent can use to communi-cate with the fuzzing logic One example hypercall isHC_SUBMIT_BUFFER Its argument is a guest pointer thatis stored in rcx Upon executing the vmcall instructiona VM-Exit is triggered and QEMU-PT stores the bufferpointer that was passed It will later copy the new inputdata into this buffer (see step 5copy in Figure 2) Finally theexecution of the VM is continued

climov rax KAFL_MAGIC_VALUEmov rbx HC_CRASHmov rcx 0x0vmcall

Listing 1 Hypercall crash notifier

Another use case for this interface is to notify thefuzzing logic when a crash occurs in the target OS kernelIn order to do so we overwrite the kernel crash handlerof the OS with a simple hypercall routine The injectedcode is shown in Listing 1 and displays how the hyper-call interface is used on the assembly level The cli in-struction disables all interrupts to avoid any kind of asyn-chronous interference during the hypercall routine

4 Implementation Details

Based on the design outlined in the previous section webuilt a prototype of our approach called kAFL In the fol-lowing we describe several implementation details Thesource code of our reference implementation is availableat httpsgithubcomRUB-SysSeckAFL

41 KVM-PT

Intel PT allows us to trace branch transitions withoutpatching or recompiling the targeted kernel To the bestof our knowledge no publicly available driver is able totrace only guest executions of a single vCPU using In-tel PT for long periods of time For instance Simple-PT[29] does not support long-term tracing by design The

172 26th USENIX Security Symposium USENIX Association

perf-subsystem [5] supports tracing of VM guest oper-ations and long-term tracing However it is designed totrace logical CPUs not vCPUs Even if VMX executionis traced the data would be associated with logical CPUsand not with vCPUs Hence the VMX context must bereassembled which is a costly task

To address these shortcomings we developed KVM-PT It allows us to trace vCPUs for an indefinite amountof time without any scheduling side effects or any lossof trace data due to overflowing output regions The ex-tension provides a fast and reliable trace mechanism forKVM vCPUs Moreover this extension exposes muchlike KVM an extensive user mode interface to accessthis tracing feature from user space QEMU-PT utilizesthis novel interface to interact with KVM-PT and to ac-cess the resulting trace data

411 vCPU Specific Traces

To enable Intel PT software that runs within ring0 (in our case KVM-PT) has to set the corre-sponding bit of a model specific register (MSR)(IA32_RTIT_CTL_MSRTraceEn) [28] After tracing isenabled the logical CPU will trace any executed code ifit satisfies the configured filter options The modificationhas to be done before the CPU switches from the hostcontext to the VM operation otherwise the CPU will ex-ecute guest code and is technically unable to modify anyhost MSRs The inverse procedure is required after theCPU has left the guest context However enabling or dis-abling Intel PT manually will also yield a trace contain-ing the manual MSR modification To prevent the collec-tion of unwanted trace data within the VMM we use theMSR autoload capabilities of Intel VT-x MSR autoload-ing can be enabled by modifying the corresponding en-tries in the VMCS (eg VM_ENTRY_CONTROL_MSRfor VM-entries) This forces the CPU to load a list of pre-configured values for defined MSRs after either a VM-entry or VM-exit has occurred By enabling tracing viaMSR autoloading we only gather Intel PT trace data forone specific vCPU

412 Continuous Tracing

Once we have enabled Intel PT the CPU will write theresulting trace data into a memory buffer until it is fullThe physical addresses of this buffer and how to han-dle full buffers is specified by an array of data structurescalled Table of Physical Addresses (ToPA) entries

The array can contain multiple entries and has to beterminated by a single END entry 3copy There are twodifferent ways the CPU can handle an overflow It canstop the tracing (while continuing the executionmdashthus

Figure 3 KVM-PT ToPA configuration

resulting in incomplete traces) or it can raise an inter-rupt This interrupt causes a VM-exit since it is not mask-able We catch the interrupt on the host and consume thetrace data Finally we reset the buffers and continue withthe VM execution Unfortunately this interrupt might beraised at an unspecified time after the buffer was filled2Our configuration of the ToPA entries can be seen in Fig-ure 3 To avoid losing trace data we use two differentToPA entries The first one is the main buffer 1copy Itsoverflow behavior is to trigger the interrupt Once themain buffer is filled a second entry is used until the in-terrupt is actually delivered The ToPA specifies anothersmaller buffer 2copy Overflowing the second buffer wouldlead to the stop of the tracing To avoid the resultingdata loss we chose the second buffer to be about fourtimes larger than the largest overflowing trace we haveever seen in our tests (4 KB)

In case the second buffer also overflows the followingtrace will contain a packet indicating that some data ismissing In that case the size of the second buffer cansimply be increased This way we manage to obtain pre-cise traces for any amount of trace data

42 QEMU-PTTo make use of the KVM extension KVM-PT an userspace counterpart is required QEMU-PT is an extensionof QEMU and provides full support for KVM-PTrsquos userspace interface This interface provides mechanisms toenable disable and configure Intel PT at runtime as wellas a periodic ToPA status check to avoid overruns KVM-PT is accessible from user mode via ioctl() commandsand an mmap() interface

In addition to being a userland interface to KVM-PTQEMU-PT includes a component that decodes trace datainto a form more suitable for the fuzzing logic We de-code the Intel PT packets and turn them into an AFL-likebitmap

421 PT Decoder

Extensive kernel fuzzing may generate several hundredsof megabytes of trace data per second To deal with

2This is due to the current implementation of this interrupt Intelspecifies the interrupt as not precise which means it is likely that fur-ther data will be written to the next buffer or tracing will be terminatedand data will be discarded

USENIX Association 26th USENIX Security Symposium 173

Figure 4 Overview of the pipeline that converts Intel PT traces to kAFL bitmaps

such large amounts of incoming data the decoder mustbe implemented with a focus on efficiency Otherwisethe decoder may become the major bottleneck duringthe fuzzing process Nevertheless the decoder must alsobe precise as inaccuracies during the decoding processwould result in further errors This is due to the nature ofIntel PT decoding since the decoding process is sequen-tial and is affected by previously decoded packets

To ease efforts to implement an Intel PT softwaredecoder Intel provides its own decoding engine calledlibipt [4] libipt is a general-purpose Intel PT decod-ing engine However it does not fit our purposes verywell because libipt decodes trace data in order to pro-vide execution data and flow information Furthermorelibipt does not cache disassembled instructions and hasperformed poorly in our use cases

Since kAFL only relies on flow information and thefuzzing process is repeatedly applied to the same codeit is possible to optimize the decoding process Our In-tel PT software decoder acts like a just-in-time decoderwhich means that code sections are only considered ifthey are executed according to the decoded trace dataTo optimize further look-ups all disassembled code sec-tions are cached In addition we simply ignore packetsthat are not relevant for our use case

Since our PT decoder is part of QEMU-PT trace datais directly processed if the ToPA base region is filledThe decoding process is applied in-place since the bufferis directly accessible from user space via mmap() Un-like other Intel PT drivers we do not need to store largeamounts of trace data in memory or on storage devicesfor post-mortem decoding Eventually the decoded tracedata is translated to the AFL bitmap format

43 AFL Fuzzing Logic

We give a brief description of the fuzzing parts of AFLbecause the logic we use to perform scheduling and mu-tations closely follows that of AFL The most importantaspect of AFL is the bitmap used to trace which basicblock transitions where encountered Each basic blockhas a randomly assigned ID and each transition from ba-sic block A to another basic block B is assigned an offsetinto the bitmap according to the following formula

(id(A)2oplus id(B)) SIZE_OF_BITMAP

Instead of the compile-time random kAFL uses theaddresses of the basic blocks Each time the transition isobserved the corresponding byte in the bitmap is incre-mented After finishing the fuzzing iteration each entryof the bitmap is rounded such that only the highest bit re-mains set Then the bitmap is compared with the globalstatic bitmap to see if any new bit was found If a new bitwas found it is added to the global bitmap and the inputthat triggered the new bit is added to the queue Whena new interesting input is found a deterministic stage isexecuted that tries to mutate each byte individually

Once the deterministic stage is finished the non-deterministic phase is started During this non-deterministic phase multiple mutations are performed atrandom locations If the deterministic phase finds newinputs the non-deterministic phase will be delayed un-til all deterministic phases of all interesting inputs havebeen performed If an input triggers an entirely new tran-sition (as opposed to a change in the number of times thetransition was taken) it will be favored and fuzzed witha higher priority

5 Evaluation

Based on our implementation we now describe thedifferent fuzzing campaigns we performed to evaluatekAFL We evaluate kAFLrsquos fuzzing performance acrossdifferent platforms Section 55 provides an overview ofall reported vulnerabilities crashes and bugs that werefound during the development process of kAFL We alsoevaluate kAFLrsquos ability to find a previously known vul-nerability Finally in Section 56 the overall fuzzingperformance of kAFL is compared to ProjectTriforcethe only other OS-independent feedback fuzzer avail-able TriforceAFL is based on the emulation backendof QEMU instead of hardware-assisted virtualization andIntel PT The performance overhead of KVM-PT is dis-cussed in Section 57 Additionally a performance com-parison of our PT decoder and an Intel implementationof a software decoder is given in Section 58

If not stated otherwise the benchmarks were per-formed on a desktop system with an Intel i7-6700 pro-cessor and 32GB DDR4 RAM To avoid distortions due

174 26th USENIX Security Symposium USENIX Association

to poor IO performance all benchmarks are performedon a RAM disk Similar to AFL we consider a crash-ing input to be unique if it triggered at least one basicblock transition which has not been triggered by any pre-vious crash (ie the bitmap contains at least one newbit) Note this does not imply that the underlying bugsare truly unique

51 Fuzzing Windows

We implemented a small Windows 10 specific user modeagent that mounts any data chunk (fuzzed payload) asNTFS-partitioned volume (289 lines of C code) Weused the Virtual Hard Disk (VHD) API and variousIOCTLS to mount and unmount volumes programmat-ically [31 32] Unfortunately mounting volumes is aslow operation under Windows and we only managedto achieve a throughput of 20 executions per secondNonetheless kAFL managed to find a crash in the NTFSdriver The fuzzer ran for 4 days and 14 hours and re-ported 59 unique crashes all of which were division byzero crashes After manual investigation we suspect thatthere is only one unique bug While it does not allowcode execution it is still a denial-of-service vulnerabilityas for example a USB stick with that malicious NTFSvolume plugged into a critical system will crash that sys-tem with a blue screen It seems that we only scratchedthe surface and NTFS was not thoroughly fuzzed yetHence we assume that the NTFS driver under Windowsis a valuable target for coverage-based feedback fuzzing

Furthermore we implemented a generic system call(syscall) fuzzing agent that simply passes a block of datato a syscall by setting all registers and the top stack re-gion (55 lines of C and 46 lines of assembly code) Thisallows to set parameters for a syscall with a fuzzing pay-load independent of the OS ABI The same fuzzer canbe used to attack syscalls on different operation sys-tems such as Linux or macOS However we evaluatedit against the Windows kernel given the proprietary na-ture of this OS We did not find any bugs in 13 hoursof fuzzing with approx 63M executions since manysyscalls cause the userspace agent to terminate Dueto the coverage-guided feedback kAFL quickly learnedhow to generate payloads to execute valid syscalls andthis led to the unexpected execution of user mode call-backs via the kernel within the fuzzing agent Thesecrashes require rather expensive restarts of the agent andtherefore we only achieved approx 134 executions persecond while normally kAFL achieves a throughput of1000 to 4000 tests per second (see Section 52) Ad-ditionally the Windows syscall interface has already re-ceived much attention by the security community

Figure 5 Fuzzing the ext4 kernel module for 32 hours

52 Fuzzing Linux

We implemented a similar agent for Linux whichmounts data as ext4 volumes (66 lines of C code) Westarted the fuzz campaign with a minimal 64KB ext4 im-age as initial input However we configured the fuzzersuch that it only fuzzes the first two kilobytes duringthe deterministic phase In contrast to Windows theLinux mount process is very fast and we reached 1000to 2000 tests per second on a Thinkpad laptop with ai7-6700HQ26GHz CPU and 32GB RAM Due to thishigh performance we obtained significantly better cov-erage and managed to discover 160 unique crashes andmultiple (confirmed) bugs in the ext4 driver during atwelve-day fuzzing campaign Figure 5 shows the first32 hours of another fuzzing run The fuzzing process wasstill finding new paths and crashes on a fairly regular ba-sis after 32 hours An interesting observation is that therewas no new coverage produced between hours 16 and 25yet the number of inputs increased due a higher numberof loop iterations After hour 25 a truly new input wasfound that unlocked significant parts of the codebase

53 Fuzzing macOS

Similarly to Windows and Linux we targeted multiplefile systems for macOS So far we found approximately150 crashes in the HFS driver and manually confirmedthat at least three of them are unique bugs that lead toa kernel panic Those bugs can be triggered by unprivi-leged users and therefore could very well be abused forlocal denial-of-service attacks One of these bugs seemsto be a use-after-free vulnerability that leads to full con-trol of the rip register Additionally kAFL found 220unique crashes in the APFS kernel extension All 3 HFSvulnerabilities and multiple APFS flaws have been re-ported to Apple

USENIX Association 26th USENIX Security Symposium 175

54 Rediscovery of Known BugsWe evaluated kAFL on the keyctl interface which al-lows a user space program to store and manage vari-ous kinds of key material in the kernel More specif-ically it features a DER (see RFC5280) parser to loadcertificates This functionality had a known bug (CVE-2016-07583) We tested kAFL against the same interfaceon a vulnerable kernel (version 432) kAFL was ableto uncover the same problem and one additional previ-ously unknown bug that was assigned CVE-2016-86504kAFL managed to trigger 17 unique KASan reports and15 unique panics in just one hour of execution time Dur-ing this experiment kAFL generated over 34 million in-puts found 295 interesting inputs and performed nearly9000 executions per second This experiment was per-formed while running 8 processes in parallel

55 Detected VulnerabilitiesDuring the evaluation kAFL found more than a thou-sand unique crashes We evaluated some manually andfound multiple security vulnerabilities in all tested op-erating systems such as Linux Windows and macOSSo far eight bugs were reported and three of them wereconfirmed by the maintainers

bull Linux keyctl Null Pointer Dereference5 (CVE-2016-86506)

bull Linux ext4 Memory Corruption7

bull Linux ext4 Error Handling8

bull Windows NTFS Div-by-Zero9

bull macOS HFS Div-by-Zero10

bull macOS HFS Assertion Fail10

bull macOS HFS Use-After-Free10

bull macOS APFS Memory Corruption10

Red Hat has assigned a CVE number for the first re-ported security flaw which triggers a null pointer def-erence and a partial memory corruption in the kernelASN1 parser if an RSA certificate with a zero expo-nent is presented For the second reported vulnerabil-ity which triggers a memory corruption in the ext4 file

3httpsaccessredhatcomsecuritycvecve-2016-07584httpsaccessredhatcomsecuritycvecve-2016-86505httpseclistsorgfulldisclosure2016Nov766httpsaccessredhatcomsecuritycvecve-2016-86507httpseclistsorgfulldisclosure2016Nov758httpseclistsorgbugtraq2016Nov19Reported to Microsoft Security

10Reported to Apple Product Security

system a mainline patch was proposed The last re-ported Linux vulnerability which calls in the ext4 errorhandling routine panic() and hence results in a kernelpanic was at the time of writing not investigated any fur-ther The NTFS bug in Windows 10 is a non-recoverableerror condition which leads to a blue screen This bugwas reported to Microsoft but has not been confirmedyet Similarly Apple has not yet verified or confirmedour reported macOS bugs

56 Fuzzing PerformanceWe compare the overall performance of kAFL across dif-ferent operating systems To ensure comparable resultswe created a simple driver that contains a JSON parserbased on jsmn11 for each aforementioned operating sys-tem and used it to decode user input (see Listing 2) Ifthe user input is a JSON string starting with KAFL acrash is triggered We traced both the JSON parser aswell as the final check This way kAFL was able to learncorrect JSON syntax We measured the time used to findthe crash the number of executions per second and thespeed for new paths to be discovered on all three targetoperating systems

1 jsmn_parser parser2 jsmntok_t tokens [5]3 jsmn_init (amp parser)4

5 int res=jsmn_parse (ampparser input size tokens 5)6 if(res gt= 2)7 if(tokens [0] type == JSMN_STRING)8 int json_len = tokens [0] end - tokens [0]

start9 int s = tokens [0] start

10 if(json_len gt 0 ampamp input[s+0] == K)11 if(json_len gt 1 ampamp input[s+1] == A)12 if(json_len gt 2 ampamp input[s+2] == F)13 if(json_len gt 3 ampamp input[s+3] == L)14 panic(KERN_INFO KAFL n)15 16

Listing 2 The JSON parser kernel module used for thecoverage benchmarks

We performed five repeated experiments for each op-erating system Additionally we tested TriforceAFL withthe Linux target driver TriforceAFL is unable to fuzzWindows and macOS To compare TriforceAFL withkAFL the associated TriforceLinuxSyscallFuzzer wasslightly modified to work with our vulnerable Linux ker-nel module Unfortunately it was not possible to com-pare kAFL against Oraclersquos file system fuzzer [34] dueto technical issues with its setup

During each run we fuzzed the JSON parser for 30minutes The averaged and rounded results are displayedin Table 1 As we can see the performance of kAFLis very similar across different systems It should be re-marked that the variance in this experiment is rather high

11httpzsergecomjsmnhtml

176 26th USENIX Security Symposium USENIX Association

ExecsSec(1 Process)

ExecsSec(8 Processes)

Time to Crash(1 Process)

Time to Crash(8 Processes)

PathsMin(1 Process)

PathsMin(8 Processes)

TriforceAFL 150 320 -a -a 1008 -b

Linux (initramfs) 3000 5700 750 600 1584 1562Debian 8 3000 5700 455 630 1620 1600Debian 8 (KASan) 4300 5700 920 600 1622 1590macOS (10124) 5100 8100 743 510 1450 1506Windows 10 4300 8700 414 450 1150 1202

a Not found during 30-minute experimentsb This value cannot be obtained since TriforceAFL does not synchronize in such short time frames

Table 1 kAFL and TriforceAFL fuzzing performance on the JSON sample driver

Figure 6 Coverage comparison of kAFL (initramfs) andTriforceAFL kAFL takes less than 3 minutes to find thesame number of paths as TriforceAFL does in 30 minutes(each running 1 process)

and explains some of the surprising results This is due tothe stochastic nature of fuzzing since each fuzzer findsvastly different paths some of which may take signifi-cantly longer to process especially crashing paths andloops One example for high variance is the fact that onDebian 8 (initramfs) the multiprocessing configurationon average needed more time to find the crash than oneprocess

TriforceAFL We used the JSON driver to comparekAFL and TriforceAFL with respect to execution speedand code coverage However the results where biasedheavily in two ways TriforceAFL did not manage to finda path that triggers the crash within 30 minutes (usuallyit takes approximately 2 hours) making it very hard tocompare the code coverage of kAFL and TriforceAFLThe number of discovered paths is not a good indica-tor for the amount of coverage With increasing runningtime it becomes more difficult to discover new pathsSecondly the number of executions per second is also bi-ased by slower and harder to reach paths and especiallycrashing inputs The coverage reached over time can beseen in Figure 6 It is obvious from the figure that kAFLfound a significant number of paths that are very hard to

Figure 7 Raw execution performance comparison

reach for TriforceAFL kAFL mostly stops finding newpaths around the 10-15 minute mark because the targetdriver simply doesnrsquot contain any more paths to be un-covered Therefore the coverage value in Table 1 (statedas PathsMin) is limited to the first 10 minutes of each30-minute run

We also compare raw execution performance insteadof overall fuzzing performance which is biased becauseof the execution of different paths the sampling processfor the non-determinism-filter and various synchroniza-tion mechanisms Especially on smaller inputs thesefactors disproportionately affect the overall fuzzing per-formance To avoid this we compared the performanceduring the first havoc stage Figure 7 shows the raw ex-ecution performance of kAFL compared to TriforceAFLduring this havoc phase kAFL provides up to 54 timesbetter performance compared to TriforceAFLrsquos QEMUCPU emulation Slightly lower performance boosts areseen in single-process execution (48 times faster)

syzkaller We did not perform a performance compari-son against syzkaller [10] This has two reasons First ofall syzkaller is a highly specific syscall fuzzer that en-codes a significant amount of domain knowledge and istherefore not applicable to other domains such as filesys-tem images On the other hand syzkaller would mostlikely generate a significantly higher code coverage evenwithout any feedback since it knows how to generate

USENIX Association 26th USENIX Security Symposium 177

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 8: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

perf-subsystem [5] supports tracing of VM guest oper-ations and long-term tracing However it is designed totrace logical CPUs not vCPUs Even if VMX executionis traced the data would be associated with logical CPUsand not with vCPUs Hence the VMX context must bereassembled which is a costly task

To address these shortcomings we developed KVM-PT It allows us to trace vCPUs for an indefinite amountof time without any scheduling side effects or any lossof trace data due to overflowing output regions The ex-tension provides a fast and reliable trace mechanism forKVM vCPUs Moreover this extension exposes muchlike KVM an extensive user mode interface to accessthis tracing feature from user space QEMU-PT utilizesthis novel interface to interact with KVM-PT and to ac-cess the resulting trace data

411 vCPU Specific Traces

To enable Intel PT software that runs within ring0 (in our case KVM-PT) has to set the corre-sponding bit of a model specific register (MSR)(IA32_RTIT_CTL_MSRTraceEn) [28] After tracing isenabled the logical CPU will trace any executed code ifit satisfies the configured filter options The modificationhas to be done before the CPU switches from the hostcontext to the VM operation otherwise the CPU will ex-ecute guest code and is technically unable to modify anyhost MSRs The inverse procedure is required after theCPU has left the guest context However enabling or dis-abling Intel PT manually will also yield a trace contain-ing the manual MSR modification To prevent the collec-tion of unwanted trace data within the VMM we use theMSR autoload capabilities of Intel VT-x MSR autoload-ing can be enabled by modifying the corresponding en-tries in the VMCS (eg VM_ENTRY_CONTROL_MSRfor VM-entries) This forces the CPU to load a list of pre-configured values for defined MSRs after either a VM-entry or VM-exit has occurred By enabling tracing viaMSR autoloading we only gather Intel PT trace data forone specific vCPU

412 Continuous Tracing

Once we have enabled Intel PT the CPU will write theresulting trace data into a memory buffer until it is fullThe physical addresses of this buffer and how to han-dle full buffers is specified by an array of data structurescalled Table of Physical Addresses (ToPA) entries

The array can contain multiple entries and has to beterminated by a single END entry 3copy There are twodifferent ways the CPU can handle an overflow It canstop the tracing (while continuing the executionmdashthus

Figure 3 KVM-PT ToPA configuration

resulting in incomplete traces) or it can raise an inter-rupt This interrupt causes a VM-exit since it is not mask-able We catch the interrupt on the host and consume thetrace data Finally we reset the buffers and continue withthe VM execution Unfortunately this interrupt might beraised at an unspecified time after the buffer was filled2Our configuration of the ToPA entries can be seen in Fig-ure 3 To avoid losing trace data we use two differentToPA entries The first one is the main buffer 1copy Itsoverflow behavior is to trigger the interrupt Once themain buffer is filled a second entry is used until the in-terrupt is actually delivered The ToPA specifies anothersmaller buffer 2copy Overflowing the second buffer wouldlead to the stop of the tracing To avoid the resultingdata loss we chose the second buffer to be about fourtimes larger than the largest overflowing trace we haveever seen in our tests (4 KB)

In case the second buffer also overflows the followingtrace will contain a packet indicating that some data ismissing In that case the size of the second buffer cansimply be increased This way we manage to obtain pre-cise traces for any amount of trace data

42 QEMU-PTTo make use of the KVM extension KVM-PT an userspace counterpart is required QEMU-PT is an extensionof QEMU and provides full support for KVM-PTrsquos userspace interface This interface provides mechanisms toenable disable and configure Intel PT at runtime as wellas a periodic ToPA status check to avoid overruns KVM-PT is accessible from user mode via ioctl() commandsand an mmap() interface

In addition to being a userland interface to KVM-PTQEMU-PT includes a component that decodes trace datainto a form more suitable for the fuzzing logic We de-code the Intel PT packets and turn them into an AFL-likebitmap

421 PT Decoder

Extensive kernel fuzzing may generate several hundredsof megabytes of trace data per second To deal with

2This is due to the current implementation of this interrupt Intelspecifies the interrupt as not precise which means it is likely that fur-ther data will be written to the next buffer or tracing will be terminatedand data will be discarded

USENIX Association 26th USENIX Security Symposium 173

Figure 4 Overview of the pipeline that converts Intel PT traces to kAFL bitmaps

such large amounts of incoming data the decoder mustbe implemented with a focus on efficiency Otherwisethe decoder may become the major bottleneck duringthe fuzzing process Nevertheless the decoder must alsobe precise as inaccuracies during the decoding processwould result in further errors This is due to the nature ofIntel PT decoding since the decoding process is sequen-tial and is affected by previously decoded packets

To ease efforts to implement an Intel PT softwaredecoder Intel provides its own decoding engine calledlibipt [4] libipt is a general-purpose Intel PT decod-ing engine However it does not fit our purposes verywell because libipt decodes trace data in order to pro-vide execution data and flow information Furthermorelibipt does not cache disassembled instructions and hasperformed poorly in our use cases

Since kAFL only relies on flow information and thefuzzing process is repeatedly applied to the same codeit is possible to optimize the decoding process Our In-tel PT software decoder acts like a just-in-time decoderwhich means that code sections are only considered ifthey are executed according to the decoded trace dataTo optimize further look-ups all disassembled code sec-tions are cached In addition we simply ignore packetsthat are not relevant for our use case

Since our PT decoder is part of QEMU-PT trace datais directly processed if the ToPA base region is filledThe decoding process is applied in-place since the bufferis directly accessible from user space via mmap() Un-like other Intel PT drivers we do not need to store largeamounts of trace data in memory or on storage devicesfor post-mortem decoding Eventually the decoded tracedata is translated to the AFL bitmap format

43 AFL Fuzzing Logic

We give a brief description of the fuzzing parts of AFLbecause the logic we use to perform scheduling and mu-tations closely follows that of AFL The most importantaspect of AFL is the bitmap used to trace which basicblock transitions where encountered Each basic blockhas a randomly assigned ID and each transition from ba-sic block A to another basic block B is assigned an offsetinto the bitmap according to the following formula

(id(A)2oplus id(B)) SIZE_OF_BITMAP

Instead of the compile-time random kAFL uses theaddresses of the basic blocks Each time the transition isobserved the corresponding byte in the bitmap is incre-mented After finishing the fuzzing iteration each entryof the bitmap is rounded such that only the highest bit re-mains set Then the bitmap is compared with the globalstatic bitmap to see if any new bit was found If a new bitwas found it is added to the global bitmap and the inputthat triggered the new bit is added to the queue Whena new interesting input is found a deterministic stage isexecuted that tries to mutate each byte individually

Once the deterministic stage is finished the non-deterministic phase is started During this non-deterministic phase multiple mutations are performed atrandom locations If the deterministic phase finds newinputs the non-deterministic phase will be delayed un-til all deterministic phases of all interesting inputs havebeen performed If an input triggers an entirely new tran-sition (as opposed to a change in the number of times thetransition was taken) it will be favored and fuzzed witha higher priority

5 Evaluation

Based on our implementation we now describe thedifferent fuzzing campaigns we performed to evaluatekAFL We evaluate kAFLrsquos fuzzing performance acrossdifferent platforms Section 55 provides an overview ofall reported vulnerabilities crashes and bugs that werefound during the development process of kAFL We alsoevaluate kAFLrsquos ability to find a previously known vul-nerability Finally in Section 56 the overall fuzzingperformance of kAFL is compared to ProjectTriforcethe only other OS-independent feedback fuzzer avail-able TriforceAFL is based on the emulation backendof QEMU instead of hardware-assisted virtualization andIntel PT The performance overhead of KVM-PT is dis-cussed in Section 57 Additionally a performance com-parison of our PT decoder and an Intel implementationof a software decoder is given in Section 58

If not stated otherwise the benchmarks were per-formed on a desktop system with an Intel i7-6700 pro-cessor and 32GB DDR4 RAM To avoid distortions due

174 26th USENIX Security Symposium USENIX Association

to poor IO performance all benchmarks are performedon a RAM disk Similar to AFL we consider a crash-ing input to be unique if it triggered at least one basicblock transition which has not been triggered by any pre-vious crash (ie the bitmap contains at least one newbit) Note this does not imply that the underlying bugsare truly unique

51 Fuzzing Windows

We implemented a small Windows 10 specific user modeagent that mounts any data chunk (fuzzed payload) asNTFS-partitioned volume (289 lines of C code) Weused the Virtual Hard Disk (VHD) API and variousIOCTLS to mount and unmount volumes programmat-ically [31 32] Unfortunately mounting volumes is aslow operation under Windows and we only managedto achieve a throughput of 20 executions per secondNonetheless kAFL managed to find a crash in the NTFSdriver The fuzzer ran for 4 days and 14 hours and re-ported 59 unique crashes all of which were division byzero crashes After manual investigation we suspect thatthere is only one unique bug While it does not allowcode execution it is still a denial-of-service vulnerabilityas for example a USB stick with that malicious NTFSvolume plugged into a critical system will crash that sys-tem with a blue screen It seems that we only scratchedthe surface and NTFS was not thoroughly fuzzed yetHence we assume that the NTFS driver under Windowsis a valuable target for coverage-based feedback fuzzing

Furthermore we implemented a generic system call(syscall) fuzzing agent that simply passes a block of datato a syscall by setting all registers and the top stack re-gion (55 lines of C and 46 lines of assembly code) Thisallows to set parameters for a syscall with a fuzzing pay-load independent of the OS ABI The same fuzzer canbe used to attack syscalls on different operation sys-tems such as Linux or macOS However we evaluatedit against the Windows kernel given the proprietary na-ture of this OS We did not find any bugs in 13 hoursof fuzzing with approx 63M executions since manysyscalls cause the userspace agent to terminate Dueto the coverage-guided feedback kAFL quickly learnedhow to generate payloads to execute valid syscalls andthis led to the unexpected execution of user mode call-backs via the kernel within the fuzzing agent Thesecrashes require rather expensive restarts of the agent andtherefore we only achieved approx 134 executions persecond while normally kAFL achieves a throughput of1000 to 4000 tests per second (see Section 52) Ad-ditionally the Windows syscall interface has already re-ceived much attention by the security community

Figure 5 Fuzzing the ext4 kernel module for 32 hours

52 Fuzzing Linux

We implemented a similar agent for Linux whichmounts data as ext4 volumes (66 lines of C code) Westarted the fuzz campaign with a minimal 64KB ext4 im-age as initial input However we configured the fuzzersuch that it only fuzzes the first two kilobytes duringthe deterministic phase In contrast to Windows theLinux mount process is very fast and we reached 1000to 2000 tests per second on a Thinkpad laptop with ai7-6700HQ26GHz CPU and 32GB RAM Due to thishigh performance we obtained significantly better cov-erage and managed to discover 160 unique crashes andmultiple (confirmed) bugs in the ext4 driver during atwelve-day fuzzing campaign Figure 5 shows the first32 hours of another fuzzing run The fuzzing process wasstill finding new paths and crashes on a fairly regular ba-sis after 32 hours An interesting observation is that therewas no new coverage produced between hours 16 and 25yet the number of inputs increased due a higher numberof loop iterations After hour 25 a truly new input wasfound that unlocked significant parts of the codebase

53 Fuzzing macOS

Similarly to Windows and Linux we targeted multiplefile systems for macOS So far we found approximately150 crashes in the HFS driver and manually confirmedthat at least three of them are unique bugs that lead toa kernel panic Those bugs can be triggered by unprivi-leged users and therefore could very well be abused forlocal denial-of-service attacks One of these bugs seemsto be a use-after-free vulnerability that leads to full con-trol of the rip register Additionally kAFL found 220unique crashes in the APFS kernel extension All 3 HFSvulnerabilities and multiple APFS flaws have been re-ported to Apple

USENIX Association 26th USENIX Security Symposium 175

54 Rediscovery of Known BugsWe evaluated kAFL on the keyctl interface which al-lows a user space program to store and manage vari-ous kinds of key material in the kernel More specif-ically it features a DER (see RFC5280) parser to loadcertificates This functionality had a known bug (CVE-2016-07583) We tested kAFL against the same interfaceon a vulnerable kernel (version 432) kAFL was ableto uncover the same problem and one additional previ-ously unknown bug that was assigned CVE-2016-86504kAFL managed to trigger 17 unique KASan reports and15 unique panics in just one hour of execution time Dur-ing this experiment kAFL generated over 34 million in-puts found 295 interesting inputs and performed nearly9000 executions per second This experiment was per-formed while running 8 processes in parallel

55 Detected VulnerabilitiesDuring the evaluation kAFL found more than a thou-sand unique crashes We evaluated some manually andfound multiple security vulnerabilities in all tested op-erating systems such as Linux Windows and macOSSo far eight bugs were reported and three of them wereconfirmed by the maintainers

bull Linux keyctl Null Pointer Dereference5 (CVE-2016-86506)

bull Linux ext4 Memory Corruption7

bull Linux ext4 Error Handling8

bull Windows NTFS Div-by-Zero9

bull macOS HFS Div-by-Zero10

bull macOS HFS Assertion Fail10

bull macOS HFS Use-After-Free10

bull macOS APFS Memory Corruption10

Red Hat has assigned a CVE number for the first re-ported security flaw which triggers a null pointer def-erence and a partial memory corruption in the kernelASN1 parser if an RSA certificate with a zero expo-nent is presented For the second reported vulnerabil-ity which triggers a memory corruption in the ext4 file

3httpsaccessredhatcomsecuritycvecve-2016-07584httpsaccessredhatcomsecuritycvecve-2016-86505httpseclistsorgfulldisclosure2016Nov766httpsaccessredhatcomsecuritycvecve-2016-86507httpseclistsorgfulldisclosure2016Nov758httpseclistsorgbugtraq2016Nov19Reported to Microsoft Security

10Reported to Apple Product Security

system a mainline patch was proposed The last re-ported Linux vulnerability which calls in the ext4 errorhandling routine panic() and hence results in a kernelpanic was at the time of writing not investigated any fur-ther The NTFS bug in Windows 10 is a non-recoverableerror condition which leads to a blue screen This bugwas reported to Microsoft but has not been confirmedyet Similarly Apple has not yet verified or confirmedour reported macOS bugs

56 Fuzzing PerformanceWe compare the overall performance of kAFL across dif-ferent operating systems To ensure comparable resultswe created a simple driver that contains a JSON parserbased on jsmn11 for each aforementioned operating sys-tem and used it to decode user input (see Listing 2) Ifthe user input is a JSON string starting with KAFL acrash is triggered We traced both the JSON parser aswell as the final check This way kAFL was able to learncorrect JSON syntax We measured the time used to findthe crash the number of executions per second and thespeed for new paths to be discovered on all three targetoperating systems

1 jsmn_parser parser2 jsmntok_t tokens [5]3 jsmn_init (amp parser)4

5 int res=jsmn_parse (ampparser input size tokens 5)6 if(res gt= 2)7 if(tokens [0] type == JSMN_STRING)8 int json_len = tokens [0] end - tokens [0]

start9 int s = tokens [0] start

10 if(json_len gt 0 ampamp input[s+0] == K)11 if(json_len gt 1 ampamp input[s+1] == A)12 if(json_len gt 2 ampamp input[s+2] == F)13 if(json_len gt 3 ampamp input[s+3] == L)14 panic(KERN_INFO KAFL n)15 16

Listing 2 The JSON parser kernel module used for thecoverage benchmarks

We performed five repeated experiments for each op-erating system Additionally we tested TriforceAFL withthe Linux target driver TriforceAFL is unable to fuzzWindows and macOS To compare TriforceAFL withkAFL the associated TriforceLinuxSyscallFuzzer wasslightly modified to work with our vulnerable Linux ker-nel module Unfortunately it was not possible to com-pare kAFL against Oraclersquos file system fuzzer [34] dueto technical issues with its setup

During each run we fuzzed the JSON parser for 30minutes The averaged and rounded results are displayedin Table 1 As we can see the performance of kAFLis very similar across different systems It should be re-marked that the variance in this experiment is rather high

11httpzsergecomjsmnhtml

176 26th USENIX Security Symposium USENIX Association

ExecsSec(1 Process)

ExecsSec(8 Processes)

Time to Crash(1 Process)

Time to Crash(8 Processes)

PathsMin(1 Process)

PathsMin(8 Processes)

TriforceAFL 150 320 -a -a 1008 -b

Linux (initramfs) 3000 5700 750 600 1584 1562Debian 8 3000 5700 455 630 1620 1600Debian 8 (KASan) 4300 5700 920 600 1622 1590macOS (10124) 5100 8100 743 510 1450 1506Windows 10 4300 8700 414 450 1150 1202

a Not found during 30-minute experimentsb This value cannot be obtained since TriforceAFL does not synchronize in such short time frames

Table 1 kAFL and TriforceAFL fuzzing performance on the JSON sample driver

Figure 6 Coverage comparison of kAFL (initramfs) andTriforceAFL kAFL takes less than 3 minutes to find thesame number of paths as TriforceAFL does in 30 minutes(each running 1 process)

and explains some of the surprising results This is due tothe stochastic nature of fuzzing since each fuzzer findsvastly different paths some of which may take signifi-cantly longer to process especially crashing paths andloops One example for high variance is the fact that onDebian 8 (initramfs) the multiprocessing configurationon average needed more time to find the crash than oneprocess

TriforceAFL We used the JSON driver to comparekAFL and TriforceAFL with respect to execution speedand code coverage However the results where biasedheavily in two ways TriforceAFL did not manage to finda path that triggers the crash within 30 minutes (usuallyit takes approximately 2 hours) making it very hard tocompare the code coverage of kAFL and TriforceAFLThe number of discovered paths is not a good indica-tor for the amount of coverage With increasing runningtime it becomes more difficult to discover new pathsSecondly the number of executions per second is also bi-ased by slower and harder to reach paths and especiallycrashing inputs The coverage reached over time can beseen in Figure 6 It is obvious from the figure that kAFLfound a significant number of paths that are very hard to

Figure 7 Raw execution performance comparison

reach for TriforceAFL kAFL mostly stops finding newpaths around the 10-15 minute mark because the targetdriver simply doesnrsquot contain any more paths to be un-covered Therefore the coverage value in Table 1 (statedas PathsMin) is limited to the first 10 minutes of each30-minute run

We also compare raw execution performance insteadof overall fuzzing performance which is biased becauseof the execution of different paths the sampling processfor the non-determinism-filter and various synchroniza-tion mechanisms Especially on smaller inputs thesefactors disproportionately affect the overall fuzzing per-formance To avoid this we compared the performanceduring the first havoc stage Figure 7 shows the raw ex-ecution performance of kAFL compared to TriforceAFLduring this havoc phase kAFL provides up to 54 timesbetter performance compared to TriforceAFLrsquos QEMUCPU emulation Slightly lower performance boosts areseen in single-process execution (48 times faster)

syzkaller We did not perform a performance compari-son against syzkaller [10] This has two reasons First ofall syzkaller is a highly specific syscall fuzzer that en-codes a significant amount of domain knowledge and istherefore not applicable to other domains such as filesys-tem images On the other hand syzkaller would mostlikely generate a significantly higher code coverage evenwithout any feedback since it knows how to generate

USENIX Association 26th USENIX Security Symposium 177

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 9: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

Figure 4 Overview of the pipeline that converts Intel PT traces to kAFL bitmaps

such large amounts of incoming data the decoder mustbe implemented with a focus on efficiency Otherwisethe decoder may become the major bottleneck duringthe fuzzing process Nevertheless the decoder must alsobe precise as inaccuracies during the decoding processwould result in further errors This is due to the nature ofIntel PT decoding since the decoding process is sequen-tial and is affected by previously decoded packets

To ease efforts to implement an Intel PT softwaredecoder Intel provides its own decoding engine calledlibipt [4] libipt is a general-purpose Intel PT decod-ing engine However it does not fit our purposes verywell because libipt decodes trace data in order to pro-vide execution data and flow information Furthermorelibipt does not cache disassembled instructions and hasperformed poorly in our use cases

Since kAFL only relies on flow information and thefuzzing process is repeatedly applied to the same codeit is possible to optimize the decoding process Our In-tel PT software decoder acts like a just-in-time decoderwhich means that code sections are only considered ifthey are executed according to the decoded trace dataTo optimize further look-ups all disassembled code sec-tions are cached In addition we simply ignore packetsthat are not relevant for our use case

Since our PT decoder is part of QEMU-PT trace datais directly processed if the ToPA base region is filledThe decoding process is applied in-place since the bufferis directly accessible from user space via mmap() Un-like other Intel PT drivers we do not need to store largeamounts of trace data in memory or on storage devicesfor post-mortem decoding Eventually the decoded tracedata is translated to the AFL bitmap format

43 AFL Fuzzing Logic

We give a brief description of the fuzzing parts of AFLbecause the logic we use to perform scheduling and mu-tations closely follows that of AFL The most importantaspect of AFL is the bitmap used to trace which basicblock transitions where encountered Each basic blockhas a randomly assigned ID and each transition from ba-sic block A to another basic block B is assigned an offsetinto the bitmap according to the following formula

(id(A)2oplus id(B)) SIZE_OF_BITMAP

Instead of the compile-time random kAFL uses theaddresses of the basic blocks Each time the transition isobserved the corresponding byte in the bitmap is incre-mented After finishing the fuzzing iteration each entryof the bitmap is rounded such that only the highest bit re-mains set Then the bitmap is compared with the globalstatic bitmap to see if any new bit was found If a new bitwas found it is added to the global bitmap and the inputthat triggered the new bit is added to the queue Whena new interesting input is found a deterministic stage isexecuted that tries to mutate each byte individually

Once the deterministic stage is finished the non-deterministic phase is started During this non-deterministic phase multiple mutations are performed atrandom locations If the deterministic phase finds newinputs the non-deterministic phase will be delayed un-til all deterministic phases of all interesting inputs havebeen performed If an input triggers an entirely new tran-sition (as opposed to a change in the number of times thetransition was taken) it will be favored and fuzzed witha higher priority

5 Evaluation

Based on our implementation we now describe thedifferent fuzzing campaigns we performed to evaluatekAFL We evaluate kAFLrsquos fuzzing performance acrossdifferent platforms Section 55 provides an overview ofall reported vulnerabilities crashes and bugs that werefound during the development process of kAFL We alsoevaluate kAFLrsquos ability to find a previously known vul-nerability Finally in Section 56 the overall fuzzingperformance of kAFL is compared to ProjectTriforcethe only other OS-independent feedback fuzzer avail-able TriforceAFL is based on the emulation backendof QEMU instead of hardware-assisted virtualization andIntel PT The performance overhead of KVM-PT is dis-cussed in Section 57 Additionally a performance com-parison of our PT decoder and an Intel implementationof a software decoder is given in Section 58

If not stated otherwise the benchmarks were per-formed on a desktop system with an Intel i7-6700 pro-cessor and 32GB DDR4 RAM To avoid distortions due

174 26th USENIX Security Symposium USENIX Association

to poor IO performance all benchmarks are performedon a RAM disk Similar to AFL we consider a crash-ing input to be unique if it triggered at least one basicblock transition which has not been triggered by any pre-vious crash (ie the bitmap contains at least one newbit) Note this does not imply that the underlying bugsare truly unique

51 Fuzzing Windows

We implemented a small Windows 10 specific user modeagent that mounts any data chunk (fuzzed payload) asNTFS-partitioned volume (289 lines of C code) Weused the Virtual Hard Disk (VHD) API and variousIOCTLS to mount and unmount volumes programmat-ically [31 32] Unfortunately mounting volumes is aslow operation under Windows and we only managedto achieve a throughput of 20 executions per secondNonetheless kAFL managed to find a crash in the NTFSdriver The fuzzer ran for 4 days and 14 hours and re-ported 59 unique crashes all of which were division byzero crashes After manual investigation we suspect thatthere is only one unique bug While it does not allowcode execution it is still a denial-of-service vulnerabilityas for example a USB stick with that malicious NTFSvolume plugged into a critical system will crash that sys-tem with a blue screen It seems that we only scratchedthe surface and NTFS was not thoroughly fuzzed yetHence we assume that the NTFS driver under Windowsis a valuable target for coverage-based feedback fuzzing

Furthermore we implemented a generic system call(syscall) fuzzing agent that simply passes a block of datato a syscall by setting all registers and the top stack re-gion (55 lines of C and 46 lines of assembly code) Thisallows to set parameters for a syscall with a fuzzing pay-load independent of the OS ABI The same fuzzer canbe used to attack syscalls on different operation sys-tems such as Linux or macOS However we evaluatedit against the Windows kernel given the proprietary na-ture of this OS We did not find any bugs in 13 hoursof fuzzing with approx 63M executions since manysyscalls cause the userspace agent to terminate Dueto the coverage-guided feedback kAFL quickly learnedhow to generate payloads to execute valid syscalls andthis led to the unexpected execution of user mode call-backs via the kernel within the fuzzing agent Thesecrashes require rather expensive restarts of the agent andtherefore we only achieved approx 134 executions persecond while normally kAFL achieves a throughput of1000 to 4000 tests per second (see Section 52) Ad-ditionally the Windows syscall interface has already re-ceived much attention by the security community

Figure 5 Fuzzing the ext4 kernel module for 32 hours

52 Fuzzing Linux

We implemented a similar agent for Linux whichmounts data as ext4 volumes (66 lines of C code) Westarted the fuzz campaign with a minimal 64KB ext4 im-age as initial input However we configured the fuzzersuch that it only fuzzes the first two kilobytes duringthe deterministic phase In contrast to Windows theLinux mount process is very fast and we reached 1000to 2000 tests per second on a Thinkpad laptop with ai7-6700HQ26GHz CPU and 32GB RAM Due to thishigh performance we obtained significantly better cov-erage and managed to discover 160 unique crashes andmultiple (confirmed) bugs in the ext4 driver during atwelve-day fuzzing campaign Figure 5 shows the first32 hours of another fuzzing run The fuzzing process wasstill finding new paths and crashes on a fairly regular ba-sis after 32 hours An interesting observation is that therewas no new coverage produced between hours 16 and 25yet the number of inputs increased due a higher numberof loop iterations After hour 25 a truly new input wasfound that unlocked significant parts of the codebase

53 Fuzzing macOS

Similarly to Windows and Linux we targeted multiplefile systems for macOS So far we found approximately150 crashes in the HFS driver and manually confirmedthat at least three of them are unique bugs that lead toa kernel panic Those bugs can be triggered by unprivi-leged users and therefore could very well be abused forlocal denial-of-service attacks One of these bugs seemsto be a use-after-free vulnerability that leads to full con-trol of the rip register Additionally kAFL found 220unique crashes in the APFS kernel extension All 3 HFSvulnerabilities and multiple APFS flaws have been re-ported to Apple

USENIX Association 26th USENIX Security Symposium 175

54 Rediscovery of Known BugsWe evaluated kAFL on the keyctl interface which al-lows a user space program to store and manage vari-ous kinds of key material in the kernel More specif-ically it features a DER (see RFC5280) parser to loadcertificates This functionality had a known bug (CVE-2016-07583) We tested kAFL against the same interfaceon a vulnerable kernel (version 432) kAFL was ableto uncover the same problem and one additional previ-ously unknown bug that was assigned CVE-2016-86504kAFL managed to trigger 17 unique KASan reports and15 unique panics in just one hour of execution time Dur-ing this experiment kAFL generated over 34 million in-puts found 295 interesting inputs and performed nearly9000 executions per second This experiment was per-formed while running 8 processes in parallel

55 Detected VulnerabilitiesDuring the evaluation kAFL found more than a thou-sand unique crashes We evaluated some manually andfound multiple security vulnerabilities in all tested op-erating systems such as Linux Windows and macOSSo far eight bugs were reported and three of them wereconfirmed by the maintainers

bull Linux keyctl Null Pointer Dereference5 (CVE-2016-86506)

bull Linux ext4 Memory Corruption7

bull Linux ext4 Error Handling8

bull Windows NTFS Div-by-Zero9

bull macOS HFS Div-by-Zero10

bull macOS HFS Assertion Fail10

bull macOS HFS Use-After-Free10

bull macOS APFS Memory Corruption10

Red Hat has assigned a CVE number for the first re-ported security flaw which triggers a null pointer def-erence and a partial memory corruption in the kernelASN1 parser if an RSA certificate with a zero expo-nent is presented For the second reported vulnerabil-ity which triggers a memory corruption in the ext4 file

3httpsaccessredhatcomsecuritycvecve-2016-07584httpsaccessredhatcomsecuritycvecve-2016-86505httpseclistsorgfulldisclosure2016Nov766httpsaccessredhatcomsecuritycvecve-2016-86507httpseclistsorgfulldisclosure2016Nov758httpseclistsorgbugtraq2016Nov19Reported to Microsoft Security

10Reported to Apple Product Security

system a mainline patch was proposed The last re-ported Linux vulnerability which calls in the ext4 errorhandling routine panic() and hence results in a kernelpanic was at the time of writing not investigated any fur-ther The NTFS bug in Windows 10 is a non-recoverableerror condition which leads to a blue screen This bugwas reported to Microsoft but has not been confirmedyet Similarly Apple has not yet verified or confirmedour reported macOS bugs

56 Fuzzing PerformanceWe compare the overall performance of kAFL across dif-ferent operating systems To ensure comparable resultswe created a simple driver that contains a JSON parserbased on jsmn11 for each aforementioned operating sys-tem and used it to decode user input (see Listing 2) Ifthe user input is a JSON string starting with KAFL acrash is triggered We traced both the JSON parser aswell as the final check This way kAFL was able to learncorrect JSON syntax We measured the time used to findthe crash the number of executions per second and thespeed for new paths to be discovered on all three targetoperating systems

1 jsmn_parser parser2 jsmntok_t tokens [5]3 jsmn_init (amp parser)4

5 int res=jsmn_parse (ampparser input size tokens 5)6 if(res gt= 2)7 if(tokens [0] type == JSMN_STRING)8 int json_len = tokens [0] end - tokens [0]

start9 int s = tokens [0] start

10 if(json_len gt 0 ampamp input[s+0] == K)11 if(json_len gt 1 ampamp input[s+1] == A)12 if(json_len gt 2 ampamp input[s+2] == F)13 if(json_len gt 3 ampamp input[s+3] == L)14 panic(KERN_INFO KAFL n)15 16

Listing 2 The JSON parser kernel module used for thecoverage benchmarks

We performed five repeated experiments for each op-erating system Additionally we tested TriforceAFL withthe Linux target driver TriforceAFL is unable to fuzzWindows and macOS To compare TriforceAFL withkAFL the associated TriforceLinuxSyscallFuzzer wasslightly modified to work with our vulnerable Linux ker-nel module Unfortunately it was not possible to com-pare kAFL against Oraclersquos file system fuzzer [34] dueto technical issues with its setup

During each run we fuzzed the JSON parser for 30minutes The averaged and rounded results are displayedin Table 1 As we can see the performance of kAFLis very similar across different systems It should be re-marked that the variance in this experiment is rather high

11httpzsergecomjsmnhtml

176 26th USENIX Security Symposium USENIX Association

ExecsSec(1 Process)

ExecsSec(8 Processes)

Time to Crash(1 Process)

Time to Crash(8 Processes)

PathsMin(1 Process)

PathsMin(8 Processes)

TriforceAFL 150 320 -a -a 1008 -b

Linux (initramfs) 3000 5700 750 600 1584 1562Debian 8 3000 5700 455 630 1620 1600Debian 8 (KASan) 4300 5700 920 600 1622 1590macOS (10124) 5100 8100 743 510 1450 1506Windows 10 4300 8700 414 450 1150 1202

a Not found during 30-minute experimentsb This value cannot be obtained since TriforceAFL does not synchronize in such short time frames

Table 1 kAFL and TriforceAFL fuzzing performance on the JSON sample driver

Figure 6 Coverage comparison of kAFL (initramfs) andTriforceAFL kAFL takes less than 3 minutes to find thesame number of paths as TriforceAFL does in 30 minutes(each running 1 process)

and explains some of the surprising results This is due tothe stochastic nature of fuzzing since each fuzzer findsvastly different paths some of which may take signifi-cantly longer to process especially crashing paths andloops One example for high variance is the fact that onDebian 8 (initramfs) the multiprocessing configurationon average needed more time to find the crash than oneprocess

TriforceAFL We used the JSON driver to comparekAFL and TriforceAFL with respect to execution speedand code coverage However the results where biasedheavily in two ways TriforceAFL did not manage to finda path that triggers the crash within 30 minutes (usuallyit takes approximately 2 hours) making it very hard tocompare the code coverage of kAFL and TriforceAFLThe number of discovered paths is not a good indica-tor for the amount of coverage With increasing runningtime it becomes more difficult to discover new pathsSecondly the number of executions per second is also bi-ased by slower and harder to reach paths and especiallycrashing inputs The coverage reached over time can beseen in Figure 6 It is obvious from the figure that kAFLfound a significant number of paths that are very hard to

Figure 7 Raw execution performance comparison

reach for TriforceAFL kAFL mostly stops finding newpaths around the 10-15 minute mark because the targetdriver simply doesnrsquot contain any more paths to be un-covered Therefore the coverage value in Table 1 (statedas PathsMin) is limited to the first 10 minutes of each30-minute run

We also compare raw execution performance insteadof overall fuzzing performance which is biased becauseof the execution of different paths the sampling processfor the non-determinism-filter and various synchroniza-tion mechanisms Especially on smaller inputs thesefactors disproportionately affect the overall fuzzing per-formance To avoid this we compared the performanceduring the first havoc stage Figure 7 shows the raw ex-ecution performance of kAFL compared to TriforceAFLduring this havoc phase kAFL provides up to 54 timesbetter performance compared to TriforceAFLrsquos QEMUCPU emulation Slightly lower performance boosts areseen in single-process execution (48 times faster)

syzkaller We did not perform a performance compari-son against syzkaller [10] This has two reasons First ofall syzkaller is a highly specific syscall fuzzer that en-codes a significant amount of domain knowledge and istherefore not applicable to other domains such as filesys-tem images On the other hand syzkaller would mostlikely generate a significantly higher code coverage evenwithout any feedback since it knows how to generate

USENIX Association 26th USENIX Security Symposium 177

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 10: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

to poor IO performance all benchmarks are performedon a RAM disk Similar to AFL we consider a crash-ing input to be unique if it triggered at least one basicblock transition which has not been triggered by any pre-vious crash (ie the bitmap contains at least one newbit) Note this does not imply that the underlying bugsare truly unique

51 Fuzzing Windows

We implemented a small Windows 10 specific user modeagent that mounts any data chunk (fuzzed payload) asNTFS-partitioned volume (289 lines of C code) Weused the Virtual Hard Disk (VHD) API and variousIOCTLS to mount and unmount volumes programmat-ically [31 32] Unfortunately mounting volumes is aslow operation under Windows and we only managedto achieve a throughput of 20 executions per secondNonetheless kAFL managed to find a crash in the NTFSdriver The fuzzer ran for 4 days and 14 hours and re-ported 59 unique crashes all of which were division byzero crashes After manual investigation we suspect thatthere is only one unique bug While it does not allowcode execution it is still a denial-of-service vulnerabilityas for example a USB stick with that malicious NTFSvolume plugged into a critical system will crash that sys-tem with a blue screen It seems that we only scratchedthe surface and NTFS was not thoroughly fuzzed yetHence we assume that the NTFS driver under Windowsis a valuable target for coverage-based feedback fuzzing

Furthermore we implemented a generic system call(syscall) fuzzing agent that simply passes a block of datato a syscall by setting all registers and the top stack re-gion (55 lines of C and 46 lines of assembly code) Thisallows to set parameters for a syscall with a fuzzing pay-load independent of the OS ABI The same fuzzer canbe used to attack syscalls on different operation sys-tems such as Linux or macOS However we evaluatedit against the Windows kernel given the proprietary na-ture of this OS We did not find any bugs in 13 hoursof fuzzing with approx 63M executions since manysyscalls cause the userspace agent to terminate Dueto the coverage-guided feedback kAFL quickly learnedhow to generate payloads to execute valid syscalls andthis led to the unexpected execution of user mode call-backs via the kernel within the fuzzing agent Thesecrashes require rather expensive restarts of the agent andtherefore we only achieved approx 134 executions persecond while normally kAFL achieves a throughput of1000 to 4000 tests per second (see Section 52) Ad-ditionally the Windows syscall interface has already re-ceived much attention by the security community

Figure 5 Fuzzing the ext4 kernel module for 32 hours

52 Fuzzing Linux

We implemented a similar agent for Linux whichmounts data as ext4 volumes (66 lines of C code) Westarted the fuzz campaign with a minimal 64KB ext4 im-age as initial input However we configured the fuzzersuch that it only fuzzes the first two kilobytes duringthe deterministic phase In contrast to Windows theLinux mount process is very fast and we reached 1000to 2000 tests per second on a Thinkpad laptop with ai7-6700HQ26GHz CPU and 32GB RAM Due to thishigh performance we obtained significantly better cov-erage and managed to discover 160 unique crashes andmultiple (confirmed) bugs in the ext4 driver during atwelve-day fuzzing campaign Figure 5 shows the first32 hours of another fuzzing run The fuzzing process wasstill finding new paths and crashes on a fairly regular ba-sis after 32 hours An interesting observation is that therewas no new coverage produced between hours 16 and 25yet the number of inputs increased due a higher numberof loop iterations After hour 25 a truly new input wasfound that unlocked significant parts of the codebase

53 Fuzzing macOS

Similarly to Windows and Linux we targeted multiplefile systems for macOS So far we found approximately150 crashes in the HFS driver and manually confirmedthat at least three of them are unique bugs that lead toa kernel panic Those bugs can be triggered by unprivi-leged users and therefore could very well be abused forlocal denial-of-service attacks One of these bugs seemsto be a use-after-free vulnerability that leads to full con-trol of the rip register Additionally kAFL found 220unique crashes in the APFS kernel extension All 3 HFSvulnerabilities and multiple APFS flaws have been re-ported to Apple

USENIX Association 26th USENIX Security Symposium 175

54 Rediscovery of Known BugsWe evaluated kAFL on the keyctl interface which al-lows a user space program to store and manage vari-ous kinds of key material in the kernel More specif-ically it features a DER (see RFC5280) parser to loadcertificates This functionality had a known bug (CVE-2016-07583) We tested kAFL against the same interfaceon a vulnerable kernel (version 432) kAFL was ableto uncover the same problem and one additional previ-ously unknown bug that was assigned CVE-2016-86504kAFL managed to trigger 17 unique KASan reports and15 unique panics in just one hour of execution time Dur-ing this experiment kAFL generated over 34 million in-puts found 295 interesting inputs and performed nearly9000 executions per second This experiment was per-formed while running 8 processes in parallel

55 Detected VulnerabilitiesDuring the evaluation kAFL found more than a thou-sand unique crashes We evaluated some manually andfound multiple security vulnerabilities in all tested op-erating systems such as Linux Windows and macOSSo far eight bugs were reported and three of them wereconfirmed by the maintainers

bull Linux keyctl Null Pointer Dereference5 (CVE-2016-86506)

bull Linux ext4 Memory Corruption7

bull Linux ext4 Error Handling8

bull Windows NTFS Div-by-Zero9

bull macOS HFS Div-by-Zero10

bull macOS HFS Assertion Fail10

bull macOS HFS Use-After-Free10

bull macOS APFS Memory Corruption10

Red Hat has assigned a CVE number for the first re-ported security flaw which triggers a null pointer def-erence and a partial memory corruption in the kernelASN1 parser if an RSA certificate with a zero expo-nent is presented For the second reported vulnerabil-ity which triggers a memory corruption in the ext4 file

3httpsaccessredhatcomsecuritycvecve-2016-07584httpsaccessredhatcomsecuritycvecve-2016-86505httpseclistsorgfulldisclosure2016Nov766httpsaccessredhatcomsecuritycvecve-2016-86507httpseclistsorgfulldisclosure2016Nov758httpseclistsorgbugtraq2016Nov19Reported to Microsoft Security

10Reported to Apple Product Security

system a mainline patch was proposed The last re-ported Linux vulnerability which calls in the ext4 errorhandling routine panic() and hence results in a kernelpanic was at the time of writing not investigated any fur-ther The NTFS bug in Windows 10 is a non-recoverableerror condition which leads to a blue screen This bugwas reported to Microsoft but has not been confirmedyet Similarly Apple has not yet verified or confirmedour reported macOS bugs

56 Fuzzing PerformanceWe compare the overall performance of kAFL across dif-ferent operating systems To ensure comparable resultswe created a simple driver that contains a JSON parserbased on jsmn11 for each aforementioned operating sys-tem and used it to decode user input (see Listing 2) Ifthe user input is a JSON string starting with KAFL acrash is triggered We traced both the JSON parser aswell as the final check This way kAFL was able to learncorrect JSON syntax We measured the time used to findthe crash the number of executions per second and thespeed for new paths to be discovered on all three targetoperating systems

1 jsmn_parser parser2 jsmntok_t tokens [5]3 jsmn_init (amp parser)4

5 int res=jsmn_parse (ampparser input size tokens 5)6 if(res gt= 2)7 if(tokens [0] type == JSMN_STRING)8 int json_len = tokens [0] end - tokens [0]

start9 int s = tokens [0] start

10 if(json_len gt 0 ampamp input[s+0] == K)11 if(json_len gt 1 ampamp input[s+1] == A)12 if(json_len gt 2 ampamp input[s+2] == F)13 if(json_len gt 3 ampamp input[s+3] == L)14 panic(KERN_INFO KAFL n)15 16

Listing 2 The JSON parser kernel module used for thecoverage benchmarks

We performed five repeated experiments for each op-erating system Additionally we tested TriforceAFL withthe Linux target driver TriforceAFL is unable to fuzzWindows and macOS To compare TriforceAFL withkAFL the associated TriforceLinuxSyscallFuzzer wasslightly modified to work with our vulnerable Linux ker-nel module Unfortunately it was not possible to com-pare kAFL against Oraclersquos file system fuzzer [34] dueto technical issues with its setup

During each run we fuzzed the JSON parser for 30minutes The averaged and rounded results are displayedin Table 1 As we can see the performance of kAFLis very similar across different systems It should be re-marked that the variance in this experiment is rather high

11httpzsergecomjsmnhtml

176 26th USENIX Security Symposium USENIX Association

ExecsSec(1 Process)

ExecsSec(8 Processes)

Time to Crash(1 Process)

Time to Crash(8 Processes)

PathsMin(1 Process)

PathsMin(8 Processes)

TriforceAFL 150 320 -a -a 1008 -b

Linux (initramfs) 3000 5700 750 600 1584 1562Debian 8 3000 5700 455 630 1620 1600Debian 8 (KASan) 4300 5700 920 600 1622 1590macOS (10124) 5100 8100 743 510 1450 1506Windows 10 4300 8700 414 450 1150 1202

a Not found during 30-minute experimentsb This value cannot be obtained since TriforceAFL does not synchronize in such short time frames

Table 1 kAFL and TriforceAFL fuzzing performance on the JSON sample driver

Figure 6 Coverage comparison of kAFL (initramfs) andTriforceAFL kAFL takes less than 3 minutes to find thesame number of paths as TriforceAFL does in 30 minutes(each running 1 process)

and explains some of the surprising results This is due tothe stochastic nature of fuzzing since each fuzzer findsvastly different paths some of which may take signifi-cantly longer to process especially crashing paths andloops One example for high variance is the fact that onDebian 8 (initramfs) the multiprocessing configurationon average needed more time to find the crash than oneprocess

TriforceAFL We used the JSON driver to comparekAFL and TriforceAFL with respect to execution speedand code coverage However the results where biasedheavily in two ways TriforceAFL did not manage to finda path that triggers the crash within 30 minutes (usuallyit takes approximately 2 hours) making it very hard tocompare the code coverage of kAFL and TriforceAFLThe number of discovered paths is not a good indica-tor for the amount of coverage With increasing runningtime it becomes more difficult to discover new pathsSecondly the number of executions per second is also bi-ased by slower and harder to reach paths and especiallycrashing inputs The coverage reached over time can beseen in Figure 6 It is obvious from the figure that kAFLfound a significant number of paths that are very hard to

Figure 7 Raw execution performance comparison

reach for TriforceAFL kAFL mostly stops finding newpaths around the 10-15 minute mark because the targetdriver simply doesnrsquot contain any more paths to be un-covered Therefore the coverage value in Table 1 (statedas PathsMin) is limited to the first 10 minutes of each30-minute run

We also compare raw execution performance insteadof overall fuzzing performance which is biased becauseof the execution of different paths the sampling processfor the non-determinism-filter and various synchroniza-tion mechanisms Especially on smaller inputs thesefactors disproportionately affect the overall fuzzing per-formance To avoid this we compared the performanceduring the first havoc stage Figure 7 shows the raw ex-ecution performance of kAFL compared to TriforceAFLduring this havoc phase kAFL provides up to 54 timesbetter performance compared to TriforceAFLrsquos QEMUCPU emulation Slightly lower performance boosts areseen in single-process execution (48 times faster)

syzkaller We did not perform a performance compari-son against syzkaller [10] This has two reasons First ofall syzkaller is a highly specific syscall fuzzer that en-codes a significant amount of domain knowledge and istherefore not applicable to other domains such as filesys-tem images On the other hand syzkaller would mostlikely generate a significantly higher code coverage evenwithout any feedback since it knows how to generate

USENIX Association 26th USENIX Security Symposium 177

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 11: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

54 Rediscovery of Known BugsWe evaluated kAFL on the keyctl interface which al-lows a user space program to store and manage vari-ous kinds of key material in the kernel More specif-ically it features a DER (see RFC5280) parser to loadcertificates This functionality had a known bug (CVE-2016-07583) We tested kAFL against the same interfaceon a vulnerable kernel (version 432) kAFL was ableto uncover the same problem and one additional previ-ously unknown bug that was assigned CVE-2016-86504kAFL managed to trigger 17 unique KASan reports and15 unique panics in just one hour of execution time Dur-ing this experiment kAFL generated over 34 million in-puts found 295 interesting inputs and performed nearly9000 executions per second This experiment was per-formed while running 8 processes in parallel

55 Detected VulnerabilitiesDuring the evaluation kAFL found more than a thou-sand unique crashes We evaluated some manually andfound multiple security vulnerabilities in all tested op-erating systems such as Linux Windows and macOSSo far eight bugs were reported and three of them wereconfirmed by the maintainers

bull Linux keyctl Null Pointer Dereference5 (CVE-2016-86506)

bull Linux ext4 Memory Corruption7

bull Linux ext4 Error Handling8

bull Windows NTFS Div-by-Zero9

bull macOS HFS Div-by-Zero10

bull macOS HFS Assertion Fail10

bull macOS HFS Use-After-Free10

bull macOS APFS Memory Corruption10

Red Hat has assigned a CVE number for the first re-ported security flaw which triggers a null pointer def-erence and a partial memory corruption in the kernelASN1 parser if an RSA certificate with a zero expo-nent is presented For the second reported vulnerabil-ity which triggers a memory corruption in the ext4 file

3httpsaccessredhatcomsecuritycvecve-2016-07584httpsaccessredhatcomsecuritycvecve-2016-86505httpseclistsorgfulldisclosure2016Nov766httpsaccessredhatcomsecuritycvecve-2016-86507httpseclistsorgfulldisclosure2016Nov758httpseclistsorgbugtraq2016Nov19Reported to Microsoft Security

10Reported to Apple Product Security

system a mainline patch was proposed The last re-ported Linux vulnerability which calls in the ext4 errorhandling routine panic() and hence results in a kernelpanic was at the time of writing not investigated any fur-ther The NTFS bug in Windows 10 is a non-recoverableerror condition which leads to a blue screen This bugwas reported to Microsoft but has not been confirmedyet Similarly Apple has not yet verified or confirmedour reported macOS bugs

56 Fuzzing PerformanceWe compare the overall performance of kAFL across dif-ferent operating systems To ensure comparable resultswe created a simple driver that contains a JSON parserbased on jsmn11 for each aforementioned operating sys-tem and used it to decode user input (see Listing 2) Ifthe user input is a JSON string starting with KAFL acrash is triggered We traced both the JSON parser aswell as the final check This way kAFL was able to learncorrect JSON syntax We measured the time used to findthe crash the number of executions per second and thespeed for new paths to be discovered on all three targetoperating systems

1 jsmn_parser parser2 jsmntok_t tokens [5]3 jsmn_init (amp parser)4

5 int res=jsmn_parse (ampparser input size tokens 5)6 if(res gt= 2)7 if(tokens [0] type == JSMN_STRING)8 int json_len = tokens [0] end - tokens [0]

start9 int s = tokens [0] start

10 if(json_len gt 0 ampamp input[s+0] == K)11 if(json_len gt 1 ampamp input[s+1] == A)12 if(json_len gt 2 ampamp input[s+2] == F)13 if(json_len gt 3 ampamp input[s+3] == L)14 panic(KERN_INFO KAFL n)15 16

Listing 2 The JSON parser kernel module used for thecoverage benchmarks

We performed five repeated experiments for each op-erating system Additionally we tested TriforceAFL withthe Linux target driver TriforceAFL is unable to fuzzWindows and macOS To compare TriforceAFL withkAFL the associated TriforceLinuxSyscallFuzzer wasslightly modified to work with our vulnerable Linux ker-nel module Unfortunately it was not possible to com-pare kAFL against Oraclersquos file system fuzzer [34] dueto technical issues with its setup

During each run we fuzzed the JSON parser for 30minutes The averaged and rounded results are displayedin Table 1 As we can see the performance of kAFLis very similar across different systems It should be re-marked that the variance in this experiment is rather high

11httpzsergecomjsmnhtml

176 26th USENIX Security Symposium USENIX Association

ExecsSec(1 Process)

ExecsSec(8 Processes)

Time to Crash(1 Process)

Time to Crash(8 Processes)

PathsMin(1 Process)

PathsMin(8 Processes)

TriforceAFL 150 320 -a -a 1008 -b

Linux (initramfs) 3000 5700 750 600 1584 1562Debian 8 3000 5700 455 630 1620 1600Debian 8 (KASan) 4300 5700 920 600 1622 1590macOS (10124) 5100 8100 743 510 1450 1506Windows 10 4300 8700 414 450 1150 1202

a Not found during 30-minute experimentsb This value cannot be obtained since TriforceAFL does not synchronize in such short time frames

Table 1 kAFL and TriforceAFL fuzzing performance on the JSON sample driver

Figure 6 Coverage comparison of kAFL (initramfs) andTriforceAFL kAFL takes less than 3 minutes to find thesame number of paths as TriforceAFL does in 30 minutes(each running 1 process)

and explains some of the surprising results This is due tothe stochastic nature of fuzzing since each fuzzer findsvastly different paths some of which may take signifi-cantly longer to process especially crashing paths andloops One example for high variance is the fact that onDebian 8 (initramfs) the multiprocessing configurationon average needed more time to find the crash than oneprocess

TriforceAFL We used the JSON driver to comparekAFL and TriforceAFL with respect to execution speedand code coverage However the results where biasedheavily in two ways TriforceAFL did not manage to finda path that triggers the crash within 30 minutes (usuallyit takes approximately 2 hours) making it very hard tocompare the code coverage of kAFL and TriforceAFLThe number of discovered paths is not a good indica-tor for the amount of coverage With increasing runningtime it becomes more difficult to discover new pathsSecondly the number of executions per second is also bi-ased by slower and harder to reach paths and especiallycrashing inputs The coverage reached over time can beseen in Figure 6 It is obvious from the figure that kAFLfound a significant number of paths that are very hard to

Figure 7 Raw execution performance comparison

reach for TriforceAFL kAFL mostly stops finding newpaths around the 10-15 minute mark because the targetdriver simply doesnrsquot contain any more paths to be un-covered Therefore the coverage value in Table 1 (statedas PathsMin) is limited to the first 10 minutes of each30-minute run

We also compare raw execution performance insteadof overall fuzzing performance which is biased becauseof the execution of different paths the sampling processfor the non-determinism-filter and various synchroniza-tion mechanisms Especially on smaller inputs thesefactors disproportionately affect the overall fuzzing per-formance To avoid this we compared the performanceduring the first havoc stage Figure 7 shows the raw ex-ecution performance of kAFL compared to TriforceAFLduring this havoc phase kAFL provides up to 54 timesbetter performance compared to TriforceAFLrsquos QEMUCPU emulation Slightly lower performance boosts areseen in single-process execution (48 times faster)

syzkaller We did not perform a performance compari-son against syzkaller [10] This has two reasons First ofall syzkaller is a highly specific syscall fuzzer that en-codes a significant amount of domain knowledge and istherefore not applicable to other domains such as filesys-tem images On the other hand syzkaller would mostlikely generate a significantly higher code coverage evenwithout any feedback since it knows how to generate

USENIX Association 26th USENIX Security Symposium 177

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 12: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

ExecsSec(1 Process)

ExecsSec(8 Processes)

Time to Crash(1 Process)

Time to Crash(8 Processes)

PathsMin(1 Process)

PathsMin(8 Processes)

TriforceAFL 150 320 -a -a 1008 -b

Linux (initramfs) 3000 5700 750 600 1584 1562Debian 8 3000 5700 455 630 1620 1600Debian 8 (KASan) 4300 5700 920 600 1622 1590macOS (10124) 5100 8100 743 510 1450 1506Windows 10 4300 8700 414 450 1150 1202

a Not found during 30-minute experimentsb This value cannot be obtained since TriforceAFL does not synchronize in such short time frames

Table 1 kAFL and TriforceAFL fuzzing performance on the JSON sample driver

Figure 6 Coverage comparison of kAFL (initramfs) andTriforceAFL kAFL takes less than 3 minutes to find thesame number of paths as TriforceAFL does in 30 minutes(each running 1 process)

and explains some of the surprising results This is due tothe stochastic nature of fuzzing since each fuzzer findsvastly different paths some of which may take signifi-cantly longer to process especially crashing paths andloops One example for high variance is the fact that onDebian 8 (initramfs) the multiprocessing configurationon average needed more time to find the crash than oneprocess

TriforceAFL We used the JSON driver to comparekAFL and TriforceAFL with respect to execution speedand code coverage However the results where biasedheavily in two ways TriforceAFL did not manage to finda path that triggers the crash within 30 minutes (usuallyit takes approximately 2 hours) making it very hard tocompare the code coverage of kAFL and TriforceAFLThe number of discovered paths is not a good indica-tor for the amount of coverage With increasing runningtime it becomes more difficult to discover new pathsSecondly the number of executions per second is also bi-ased by slower and harder to reach paths and especiallycrashing inputs The coverage reached over time can beseen in Figure 6 It is obvious from the figure that kAFLfound a significant number of paths that are very hard to

Figure 7 Raw execution performance comparison

reach for TriforceAFL kAFL mostly stops finding newpaths around the 10-15 minute mark because the targetdriver simply doesnrsquot contain any more paths to be un-covered Therefore the coverage value in Table 1 (statedas PathsMin) is limited to the first 10 minutes of each30-minute run

We also compare raw execution performance insteadof overall fuzzing performance which is biased becauseof the execution of different paths the sampling processfor the non-determinism-filter and various synchroniza-tion mechanisms Especially on smaller inputs thesefactors disproportionately affect the overall fuzzing per-formance To avoid this we compared the performanceduring the first havoc stage Figure 7 shows the raw ex-ecution performance of kAFL compared to TriforceAFLduring this havoc phase kAFL provides up to 54 timesbetter performance compared to TriforceAFLrsquos QEMUCPU emulation Slightly lower performance boosts areseen in single-process execution (48 times faster)

syzkaller We did not perform a performance compari-son against syzkaller [10] This has two reasons First ofall syzkaller is a highly specific syscall fuzzer that en-codes a significant amount of domain knowledge and istherefore not applicable to other domains such as filesys-tem images On the other hand syzkaller would mostlikely generate a significantly higher code coverage evenwithout any feedback since it knows how to generate

USENIX Association 26th USENIX Security Symposium 177

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 13: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

Figure 8 Overhead for compiling QEMU-260 in atraced VM

valid syscalls and hence is able to trigger valid pathswithout any learning Therefore the coverage compar-ison would be highly misleading unless we implementthe same syscall logic a task that is out of the scope ofthis paper Additionally the coverage collection via kcovis highly specific to Linux and not applicable to closed-source targets

57 KVM-PT Overhead

Our KVM extension KVM-PT adds overhead to the rawexecution of KVM Therefore the performance overheadwas compared with several KVM-PT setups on an i5-650032Ghz desktop system with 8GB DDR4 RAMThis includes KVM-PT in combination with the PT de-coder KVM-PT without the PT decoder but processingfrequent ToPA state checks and KVM-PT without anyToPA consideration For this benchmark a 13MB ker-nel code range was configured via IP filtering ranges andtraced with one of the aforementioned setups of KVM-PT These benchmarks consider only the kernel core butneither considers any kernel module During KVM-PTexecution only supervisor mode was traced

To generate Intel PT load QEMU-260 was com-piled within a traced VM using the configure option--target-list=x86_64-softmmu We restricted trac-ing to the whole kernel address space This benchmarkwas executed on a single vCPU The resulting compiletime was measured and compared The following figureillustrates the relative overhead compared to KVM ex-ecution without KVM-PT (see Figure 8) We ran threeexperiments to determine the overhead of the differentcomponents In each experiment we measured three dif-ferent overheads wall-clock time user and kernel Thedifference in overall time is denoted by the wall-clockoverhead Additionally we measured how much moretime is spent in the kernel and how much time is spendonly in user space Since we only trace the kernel weexpect the users space overhead to be insignificant Intel

Figure 9 kAFL and ptxed decoding time on multiplecopies of the same trace (kAFL is up to 30 times faster)

describes a performance penalty of lt 5 compared toexecution without enabled Intel PT [30] Accordinglywe expect approximately 5 of kernel overhead Inthe first experiment the traces were discarded withoutfurther analysis (KVM-PT) In the second experiment(KVM-PT amp ToPA Check) we enabled repeated check-ing and clearing of the ToPA buffers In the final ex-periment (KVM-PT amp PT decoder) we tested the wholepipeline including our own decoder and conversion to anAFL bitmap

During our benchmarks an overhead between 1 ndash4 was measured empirically Since the resulting over-head is small we do not expect it to have a major influ-ence on the overall fuzzing performance

58 Decoder Engine

In contrast to KVM-PT the decoder has significant in-fluence on the overall performance of the fuzzing pro-cess since the decoding process ismdashother than Intel PTand hence KVM-PTmdashnot hardware-accelerated There-fore this process is costly and has to be as efficient aspossible Consequently the performance of our devel-oped PT decoder was compared to that of ptxed Thisdecoder is Intelrsquos example implementation of an Intel PTsoftware decoder and is based on libipt To compareboth decoder engines a small Intel PT trace sample wasgenerated by executing

find gt devnull 2gt devnull

within a Linux VM (Linux debian 480-1-amd64)traced by KVM-PT This performance benchmark wasprocessed on an i5-650032Ghz desktop system with8GB DDR4 RAM Only code execution in supervisormode was traced The generated sample is 94MB insize and contains over 431650 TNT packets each repre-senting up to 7 branch transitions The sample also con-tains over 100045 TIPs We sanitized the sample by re-moving anything but flow information packets (see Sec-tion 23) to avoid any influence of decoding large amount

178 26th USENIX Security Symposium USENIX Association

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 14: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

of execution information packets since those are not con-sidered by our PT decoder The result is a 53MB tracefile To test the effectiveness of the caching approach ofour PT decoder we created cases containing 1 5 1050 and 250 copies of the trace This is a realistic testcase since during fuzzing we see the same (or very sim-ilar) paths repeatedly Figure 9 illustrates the measuredspeedup of our PT decoder compared to ptxed

The figure also shows that our PT decoder easily out-performs the Intel decoder implementation even if thePT decoder processes data for the very first time This ismost likely due to the fact that even a single trace alreadycontains a significant amount of loops Another possiblefactor is the use of Capstone [2] as the instruction decod-ing backend As we decode more and more copies of thesame trace it can be seen that our decoder becomes in-creasingly faster (only using 56 times as much time todecode 250 times that amount of data) The caching ap-proach outperforms Intelrsquos implementation and is up to25 to 30 times faster

6 Related Work

Fuzzers are often classified according to the amountof interaction with the target program For black-boxfuzzers the fuzzer does not use any information aboutthe target program at all White-box fuzzers typically useadvanced program analysis techniques to uncover inter-esting properties of the target Somewhere in the mid-dle are so called gray-box fuzzers that will typically usesome kind of feedback from the target (such as cover-age information) to guide their search without analyzingthe logic of the target program itself In this section weprovide a brief overview of the work performed in thecorresponding areas of fuzzing

61 Black-Box FuzzersThe oldest class of fuzzers are black-box fuzzers Thesefuzzers typically have no interaction with the target pro-gram beyond executing it on newly generated inputs Toincrease effectiveness a number of assumptions are usu-ally made Either a large corpus of good coverage inputsget mutated and recombined repeatedly Examples forthis class are Radamsa [3] or zzuf [12] Or the pro-grammer needs to specify how to generate new semi-valid input files that almost look like real files Exam-ples including tools like Peach [6] or Sulley [9] Bothapproaches have one very important drawback It is atime-consuming task to use these tools

To improve the performance of black-box fuzzersmany techniques have been proposed Holler et al[27] introduced learning interesting parts of the inputgrammar from old crashing inputs Others even sought

to infer the whole input grammar from program traces[132438] The selection of more interesting inputs wasoptimized by Rebert et al [36] Similar approaches havebeen used to optimize the mutation rate [17 40]

62 White-Box fuzzersTo reduce the burden on the tester techniques where in-troduced that apply insights from program analysis tofind more interesting inputs Tools like SAGE [23]DART [22] KLEE [15] SmartFuzz [33] or Mayhem[16] try to enumerate complex paths by using techniquessuch as symbolic execution and constraint solving Toolslike TaintScope [39] BuzzFuzz [21] and Vuzzer [35]utilize taint tracing and similar dynamic analysis tech-niques to uncover new paths These tools are often ableto find very complicated code paths that are hidden be-hind checksums magic constants and other constraintsthat are very unlikely to be satisfied by random inputsAnother approach is to use the same kind of informationto bias the search towards dangerous behavior instead ofnew code paths [26]

The downside is that these techniques are often signif-icantly harder to implement scale to large programs andparallelize To the best of our knowledge there are nosuch tools for operating system fuzzing

63 Gray-Box FuzzersGray-box fuzzers try to retain the high throughput andsimplicity of black-box fuzzers while gaining some ofthe additional coverage provided by the advanced me-chanics in white-box fuzzing The prime example forgray-box fuzzing is AFL which uses coverage informa-tion to guide its search This way AFL voids spend-ing additional time on inputs that do not trigger newbehaviors Similar techniques are used by many otherfuzzers [8 25]

To further increase the effectiveness of gray-boxfuzzing many of the tricks already used in black-boxfuzzing can be applied Boumlhme et al [14] showed how touse the insight gained from modelling gray-box fuzzingas a walk on a Markov chain to increase the performanceof gray-box fuzzing by up to an order of magnitude

64 Coverage-Guided Kernel FuzzersA project called syzkaller was released by Vyukov itis the first publicly available gray-box coverage-guidedkernel fuzzer [10] Nossum and Casanovas demonstratethat most Linux file system drivers are vulnerable tofeedback-driven fuzzing by using an adapted version ofAFL [34] This modified AFL version is based on gluecode to the kernel consisting of a driver interface to

USENIX Association 26th USENIX Security Symposium 179

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 15: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

measure feedback during fuzzing file system drivers ofthe kernel and expose this data to the user space Thisfuzzer runs inside the targeted OS a crash terminates thefuzzing session

In 2016 Hertz and Newsham released a modified ver-sion of AFL called TriforceAFL [7] Their work is basedon a modification of QEMU and utilizes the correspond-ing emulation backend to measure fuzzing progress bydetermining the current instruction pointer after a controlflow altering instruction has been executed In theorytheir fuzzer is able to fuzz any OS emulated in QEMU Inpractice the TriforceAFL fuzzer is limited to operatingsystems that are able to boot from read-only file systemswhich narrows down the candidates to classic UNIX-likeoperating systems such as Linux FreeBSD NetBSD orOpenBSD Therefore TriforceAFL is currently not ableto fuzz closed-source operating systems such as macOSor Windows

7 Discussion

Even though our approach is general fast and mostly in-dependent of the underlying OS there are some limita-tions we want to discuss in this section

OS-Specific Code We use a small amount (usuallyless than 150 lines) of OS-dependent ring 3 code that per-forms three tasks First it interacts with the OS to trans-late the inputs from the fuzzing engine to interactionswith the OS (eg mount the data as a partition) Secondit obtains the address of the crash handler of the OS suchthat we can detect crashes faster than it would take towait for the timeout Third it can return the addresses ofcertain drivers These addresses can be used to limit trac-ing to the activity of said drivers which improves perfor-mance when only fuzzing individual drivers

None of these functions are necessary and only im-prove performance in some cases The first use casecan be avoided by using generic syscall fuzzing In thatcase a single standard C program which does not use anyplatform-specific API would suffice to trigger sysentersyscall instructions We do not strictly need the addressof the crash handler since there are numerous other waysto detect whether the VM crashed It would also be quiteeasy to obtain crash handlers dynamically by introduc-ing faults and analyzing the obtained traces Finally wecan always trace the whole kernel taking a slight perfor-mance hit (mostly introduced by the increased amount ofnon-determinism) In cases such as syscall fuzzing weneed to trace the whole kernel therefore syscall fuzzingwould not be impacted if this ability was missing Insummary this is the first approach that can fuzz arbitraryx86-64 kernels without any customization and a near-native performance

Supported CPUs Due to the usage of Intel PT andIntel VT-x our approach is limited to certain Intel CPUssupporting these extensions Virtually all modern IntelCPUs support Intel VT-x Unfortunately Intel is rathervague as to which CPUs exactly support process traceinside of VMs and various other extensions (such as IPfiltering and multi-entry ToPA) We tested our systemon the following CPU models Intel Core i5-6500 In-tel Core i7-6700HQ and Intel Core i5-6600 We believethat at the time of writing most Skylake and KabylakeCPUs have the necessary hardware support

Just-In-Time Code Intel PT does not provide a com-plete list of executed instruction pointers Instead IntelPT generates as little information as necessary to reducethe amount of data produced by the processor Con-sequently the Intel PT software decoder does not onlyrequire control flow information to reconstruct the con-trol flow but also needs the program that was executedduring tracing If the program is modified during run-time as often done by just-in-time (JIT) compilers inuser and kernel mode the decoder is unable to exactlyrestore the runtime control flow To bypass this limita-tion the decoder requires information about all modi-fications applied to the program instead of an ordinarymemory dump or the executable file As Deng et al [18]have shown this is possible by making use of EPT viola-tions when executing written pages Another somewhatmore old-fashioned method to achieve the same is to useshadow page tables [19] Once one it is possible to hookthe execution of modified code self-modifying code canbe dumped Reimplementing this technique was out ofthe scope of this work It should be noted though thatfuzzing kernel JIT code is a very interesting topic sincekernel JIT components such as the BPF JIT in Linuxhave often been part of serious vulnerabilities

Multibyte Compares Similar to AFL we are unableto effectively bypass checks for large magic values inthe inputs However we support specifying dictionariesof interesting constants to improve performance if suchmagic values are known in advance (eg from RFCssource code or disassembly) Some solutions involvingtechniques such as concolic execution (eg Driller [37])or taint tracking (eg Vuzzer [35]) have been proposedHowever none of these techniques can easily be adaptedto closed-source operating system kernels Therefore itremains an open research problem how to deal with thosesituations on the kernel level

Ring 3 Fuzzing We only demonstrated this techniqueagainst kernel-level code However the exact same tech-nique can be used to fuzz closed-source ring 3 code as

180 26th USENIX Security Symposium USENIX Association

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 16: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

well Since our approach has a very modest tracing over-head we expect that this technique will outperform cur-rent dynamic binary instrumentation based techniquesfor feedback fuzzing of closed-source ring 3 programssuch as winAFL [20]

8 Conclusion

The latest generation of feedback-driven fuzzing meth-ods has proven to be an effective approach to find vul-nerabilities in an automated and comprehensive fashionRecent work has also demonstrated that such techniquescan be applied to kernel space While previous feedback-driven kernel fuzzers were able to find a large amount ofsecurity flaws in certain operating systems their benefitwas either limited by poor performance due to CPU emu-lation or a lack of portability due to the need for compile-time instrumentations

In this paper we presented a novel mechanism to uti-lize the latest CPU features for a feedback-driven kernelfuzzer As shown in the evaluation combining all com-ponents provides the ability to apply kernel fuzz testingto any target OS with significantly better performancethan the alternative approaches

AcknowledgmentThis work was supported by the German FederalMinistry of Education and Research (BMBF Grant16KIS0592K HWSec) We would like to thank our shep-herd Suman Jana for his support in finalizing this paperand the anonymous reviewers for their constructive andvaluable comments Furthermore we would also liketo thank Ralf Spenneberg and Hendrik Schwartke fromOpenSource Security for supporting this research Fi-nally we would like to thank Ali Abbasi Tim BlazytkoTeemu Rytilahti and Christine Utz for their valuablefeedback

References

[1] Announcing oss-fuzz Continuous fuzzing foropen source software httpstestinggoogleblogcom201612announcing-oss-fuzz-continuous-fuzzinghtml Accessed2017-06-29

[2] Capstone disassembly framework httpwwwcapstone-engineorg Accessed 2017-06-29

[3] A general-purpose fuzzer httpsgithubcomaohradamsa Accessed 2017-06-29

[4] Intel Processor Trace Decoder Library httpsgithubcom01orgprocessor-trace Ac-cessed 2017-06-29

[5] Linux 48 perf Documentation httpsgitkernelorgcgitlinuxkernelgittorvaldslinuxgitplaintoolsperfDocumentationintel-pttxtid=refstagsv48 Accessed 2017-06-29

[6] Peach httpwwwpeachfuzzercom Ac-cessed 2017-06-29

[7] Project Triforce Run AFL on Everythinghttpswwwnccgrouptrustusabout-usnewsroom-and-eventsblog2016juneproject-triforce-run-afl-on-everythingAccessed 2017-06-29

[8] Security oriented fuzzer with powerful analysis op-tions httpsgithubcomgooglehonggfuzzAccessed 2017-06-29

[9] Sulley httpsgithubcomOpenRCEsulleyAccessed 2017-06-29

[10] syzkaller Linux syscall fuzzer httpsgithubcomgooglesyzkaller Accessed 2017-06-29

[11] Trinity Linux system call fuzzer httpsgithubcomkernelslackertrinity Ac-cessed 2017-06-29

[12] zzuf httpsgithubcomsamhocevarzzufAccessed 2017-06-29

[13] O Bastani R Sharma A Aiken and P LiangSynthesizing program input grammars In ACMSIGPLAN Conference on Programming LanguageDesign and Implementation (PLDI) 2017

[14] M Boumlhme V-T Pham and A RoychoudhuryCoverage-based greybox fuzzing as markov chainIn ACM Conference on Computer and Communica-tions Security (CCS) 2016

[15] C Cadar D Dunbar and D R Engler Klee Unas-sisted and automatic generation of high-coveragetests for complex systems programs In Symposiumon Operating Systems Design and Implementation(OSDI) 2008

[16] S K Cha T Avgerinos A Rebert and D Brum-ley Unleashing Mayhem on Binary Code In IEEESymposium on Security and Privacy 2012

[17] S K Cha M Woo and D Brumley Program-adaptive mutational fuzzing In IEEE Symposiumon Security and Privacy 2015

[18] Z Deng X Zhang and D Xu Spider Stealthybinary program instrumentation and debugging viahardware virtualization In Annual Computer Secu-rity Applications Conference (ACSAC) 2013

USENIX Association 26th USENIX Security Symposium 181

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion
Page 17: kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernelsei.rub.de/media/emma/veroeffentlichungen/2017/08/16/usenix17-kAFL.pdf · kAFL: Hardware-Assisted Feedback Fuzzing for OS Kernels

[19] A Dinaburg P Royal M Sharif and W LeeEther malware analysis via hardware virtualizationextensions In ACM Conference on Computer andCommunications Security (CCS) 2008

[20] Fratric Ivan WinAFL A fork of AFL forfuzzing Windows binaries httpsgithubcomivanfratricwinafl 2017

[21] V Ganesh T Leek and M Rinard Taint-baseddirected whitebox fuzzing In International Con-ference on Software Engineering (ICSE) 2009

[22] P Godefroid N Klarlund and K Sen DART Di-rected Automated Random Testing In ACM SIG-PLAN Conference on Programming Language De-sign and Implementation (PLDI) 2005

[23] P Godefroid M Y Levin and D Molnar SAGEWhitebox Fuzzing for Security Testing Queue10(1)20 2012

[24] P Godefroid H Peleg and R Singh LearnampfuzzMachine learning for input fuzzing Technical re-port January 2017

[25] P Goodman Shin GRR Make Fuzzing FastAgain httpsblogtrailofbitscom20161102shin-grr-make-fuzzing-fast-againAccessed 2017-06-29

[26] I Haller A Slowinska M Neugschwandtner andH Bos Dowsing for overflows A guided fuzzer tofind buffer boundary violations In USENIX Secu-rity Symposium 2013

[27] C Holler K Herzig and A Zeller Fuzzing withcode fragments In USENIX Security Symposium2012

[28] Intel Intelreg 64 and IA-32 Architectures Soft-ware Developerrsquos Manual (Order number 325384-058US April 2016)

[29] A Kleen simple-pt Simple Intel CPU pro-cessor tracing on Linux httpsgithubcomandikleensimple-pt

[30] A Kleen and B Strong Intel Processor Trace onLinux Tracing Summit 2015 2015

[31] Microsoft FSCTL_DISMOUNT_VOLUMEhttpsmsdnmicrosoftcomen-uslibrarywindowsdesktopaa364562(v=vs85)aspx 2017

[32] Microsoft VHD Reference httpsmsdnmicrosoftcomen-uslibrarywindowsdesktopdd323700(v=vs85)aspx2017

[33] D Molnar X C Li and D Wagner DynamicTest Generation to Find Integer Bugs in x86 BinaryLinux Programs In USENIX Security Symposium2009

[34] V Nossum and Q Casasnovas Filesystem Fuzzingwith American Fuzzy Lop Vault 2016 2016

[35] S Rawat V Jain A Kumar L Cojocar C Giuf-frida and H Bos Vuzzer Application-aware evo-lutionary fuzzing In Symposium on Network andDistributed System Security (NDSS) 2017

[36] A Rebert S K Cha T Avgerinos J M FooteD Warren G Grieco and D Brumley Optimiz-ing seed selection for fuzzing In USENIX SecuritySymposium 2014

[37] N Stephens J Grosen C Salls A DutcherR Wang J Corbetta Y ShoshitaishviliC Kruegel and G Vigna Driller Augment-ing fuzzing through selective symbolic executionIn Symposium on Network and Distributed SystemSecurity (NDSS) 2016

[38] J Viide A Helin M Laakso P PietikaumlinenM Seppaumlnen K Halunen R Puuperauml and J Roumln-ing Experiences with model inference assistedfuzzing In USENIX Workshop on Offensive Tech-nologies (WOOT) 2008

[39] T Wang T Wei G Gu and W Zou TaintScopeA checksum-aware directed fuzzing tool for auto-matic software vulnerability detection In IEEESymposium on Security and Privacy 2010

[40] M Woo S K Cha S Gottlieb and D BrumleyScheduling black-box mutational fuzzing In ACMConference on Computer and Communications Se-curity (CCS) 2013

182 26th USENIX Security Symposium USENIX Association

  • Introduction
  • Technical Background
    • x86-64 Virtual Memory Layouts
    • Intel VT-x
    • Intel Processor Trace
      • System Overview
        • Fuzzing Logic
        • User Mode Agent
        • Virtualization Infrastructure
        • Stateful and Non-Deterministic Code
        • Hypercalls
          • Implementation Details
            • KVM-PT
              • vCPU Specific Traces
              • Continuous Tracing
                • QEMU-PT
                  • PT Decoder
                    • AFL Fuzzing Logic
                      • Evaluation
                        • Fuzzing Windows
                        • Fuzzing Linux
                        • Fuzzing macOS
                        • Rediscovery of Known Bugs
                        • Detected Vulnerabilities
                        • Fuzzing Performance
                        • KVM-PT Overhead
                        • Decoder Engine
                          • Related Work
                            • Black-Box Fuzzers
                            • White-Box fuzzers
                            • Gray-Box Fuzzers
                            • Coverage-Guided Kernel Fuzzers
                              • Discussion
                              • Conclusion