TRUST Autumn 2008 Conference: November 11-12, 2008 Comparison of Blackbox and Whitebox Fuzzers in...

27
TRUST Autumn 2008 Conference: November 11-12, 2008 Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs Marjan Aslani, Nga Chung, Jason Doherty, Nichole Stockman, and William Quach Summer Undergraduate Program in Engineering Research at Berkeley (SUPERB) 2008 Team for Research in Ubiquitous Secure Technology

Transcript of TRUST Autumn 2008 Conference: November 11-12, 2008 Comparison of Blackbox and Whitebox Fuzzers in...

TRUST Autumn 2008 Conference: November 11-12, 2008

Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs

Marjan Aslani, Nga Chung, Jason Doherty, Nichole Stockman, and William Quach

Summer Undergraduate Program in Engineering Research at Berkeley

(SUPERB) 2008

Team for Research in Ubiquitous Secure Technology

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 2

Overview

Introduction to Fuzz testing Our research Result

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 3

What Is Fuzzing?

A method of finding software holes by feeding purposely invalid data as input to a program.

– B. Miller et al.; inspired by line noise

– Apps: image processors, media players, OS– Fuzz testing is generally automated– Finds many problems related to reliability; many of which are

potential security holes.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 4

BlackBox : Randomly generated data is fed to a program as input to see if it crashes.

– Does not require knowledge of the program source code/ deep code inspection.

– A quick way of finding defects without knowing details of the application.

WhiteBox: Creates test cases considering the target program's logical constraints and data structure.

– Requires knowledge of the system and how it uses the data. – Deeper penetration into the program.

Types of Fuzz Testing

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 5

Zzuf - Blackbox Fuzzer

Finds bugs in applications by corrupting random bits in user-contributed data.

To make new test cases, Zzuf uses a range of seeds and fuzzing ratios (corruption ratio).

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 6

Catchconv - WhiteBox Fuzzer

To create test cases, CC starts with a valid input, observes the program execution on this input, collects the path condition followed by the program on that sample, and attempts to infer related path conditions that lead to an error, then uses this as the starting point for bug-finding.

CC has has some downtime when it only traces a program and is not generating new fuzzed files.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 7

Valgrind

A tool for detecting memory management errors.

Reports the line number in the code where the program error occurred.

Helped us find and report more errors than we would if we focused solely on segmentation faults.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 8

Types of errors reported by Valgrind

By tracking a program’s execution of a file, Valgrind determines the types of errors that occur which may include:

Invalid writes Double free - Result 256 Invalid reads Double free Uninitialized values Syscal Pram Memory leak

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 9

Program run under Valgrind

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 10

Methodology

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 11

Metafuzz

All of the test files that triggered bugs were uploaded on Metafuzz.com. – The webpage contained:

Link to the test file Bug type Program that the bug was found in Stack hash number where the bug was located

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 12

Metafuzz webpage

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 13

Target applications

Mplayer, Antiword, ImageMagick Convert and Adobe Flash Player

MPlayer the promary target:– OS software – Preinstalled on many Linux distributions– Updates available via subversion– Convenient to file a bug report– Developer would get back to us!

Adobe bug reporting protocol requires a certain bug to receive a number of votes form users before it will be looked at by Flash developers.

VLC requires building subversions from nightly shots.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 14

Research Highlights

In 6 weeks, generated more than 1.2 million test cases.

We used UC Berkeley PSI-cluster of computers, which consists of 81 machines (270 processors).– Zzuf, MPlayer, and CC were installed on them.

Created a de-duplication script to find the unique bugs.

Reported 89 unique bugs; developers have already eliminated 15 of them.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 15

Result

To provide assessments for the two fuzzers, we gathered several metrics:

– Number of test cases generated– Number of unique test cases generated– Total bugs and total unique bugs found by each

fuzzer.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 16

Result con’t

Generated 1.2 million test cases – 962,402 by Zzuf.– 279,953 by Catchconv.

From the test cases:– Zzuf found 1,066,000 errors.– Catchconv reported 304,936.

Unique (nonduplicate) errors found:– 456 by Zzuf– 157 by Cachconv

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 17

Result con’t

Zzuf reports a disproportionately larger amount of errors than CC. Is Zzuf better than CC?

No! The two fuzzers generated different numbers of test cases.

How could we have a fair comparison of the fuzzers’ efficiency?– Need to gauge the amount of duplicate work

performed by each fuzzer.– Find how many of these test cases were unique.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 18

Average Unique Errors per 100 Unique Test Cases

First, we compared performance of the fuzzers by the average number of unique bugs found per 100 test cases.

– Zzuf: 2.69– CC : 2.63

Zzuf’s apparent superiority diminishes.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 19

Unique Errors as % of Total Errors

Next, we analyzed fuzzers’

performance based on the percentage of unique

errors found out of the total errors.

– Zzuf: .05%– CC: .22%

Less than a quarter percent difference between the fuzzers.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 20

Types of Errors (as % of Total Errors)

Also considered analyzing the fuzzer based on bug types found by the fuzzers.

Zzuf performed better in finding “invalid write”, which is a more important security bug type.

Not an accurate comparison, since we couldn’t tell which bug specifically caused a crash.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 21

Conclusion

We were not able to make a solid conclusion about the superiority of either fuzzer based on the metric we gathered.

Knowing which fuzzer is able to find serious errors more quickly would allow us to make a more informed conclusion about their comparative efficiencies.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 22

Conclusion con’t

Need to record the amount of CPU clock cycles required to execute test cases and find errors.

Unfortunately we did not record this data during our research, we are unable to make such a comparison between the fuzzers.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 23

Guides for Future Research

To perform a precise comparison of Zzuf and CC:

1. The difference between the number of test cases generated by Zzuf and CC for a given seed file and specific time frame should be recorded.

2. Measure CPU time to compare the number of unique test cases generated by each fuzzer for a given time.

3. Need a new method to identify unique errors avoid reporting duplicate bugs:

Need to use automatically generate a unique hash for each reported error that can then be used to identify duplicate errors.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 24

Guides for Future Research con’t

4. Use a more robust data collection infrastructure that could accommodate the massive amount of data colected.

– Our ISP shut Metafuzz down due to excess server load.

– Berkeley storage full.

5. Include an internal issue tracker that keeps track of whether or not a bug has been reported, to avoid reporting duplicate bugs.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 25

WhiteBox or BlackBox??

With lower budget/ less time: use Blackbox Once low-hanging bugs are gone, fuzzing

must become smarter: use whitebox In practice, use both.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 26

Acknowledgment

National Science Foundation (NSF) for funding this project through the SUPERB-TRUST (Summer Undergraduate Program in Engineering Research at Berkeley - Team for Research in Ubiquitous Secure Technology) program

Kristen Gates (Executive Director for Education for the TRUST Program)

Faculty advisor David Wagner Graduate mentors Li-Wen Hsu, David Molner,

Edwardo Segura, Alex Fabrikant, and Alvaro Cardenas.

TRUST Autumn 2008 Conference: November 11-12, 2008

"Comparison of Blackbox and Whitebox Fuzzers in Finding Software Bugs", Marjan Aslani 27

Questions?

Thank you