Mining Windows Kernel API Rules

16
Mining Windows Kernel API Rules Jinlin Yang [email protected] 09/28/2005 CS696

description

Mining Windows Kernel API Rules. Jinlin Yang [email protected] 09/28/2005CS696. My Background. Bounded exhaustive testing, 09/2001-01/2004 - PowerPoint PPT Presentation

Transcript of Mining Windows Kernel API Rules

Page 1: Mining Windows Kernel API Rules

Mining Windows Kernel API Rules

Jinlin Yang

[email protected]

09/28/2005 CS696

Page 2: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 2

My Background

• Bounded exhaustive testing, 09/2001-01/2004– D. Coppit, J. Yang, S. Khurshid, W. Le, and K. Sullivan. Software Assurance by Bounded

Exhaustive Testing. IEEE Transactions on Software Engineering. April 2005

– K. Sullivan, J. Yang, D. Coppit, S. Khurshid, and D. Jackson. Software Assurance by Bounded Exhaustive Testing. ISSTA ‘04

• Temporal properties inference, 01/2004-present– J. Yang and D. Evans. Dynamically Inferring Temporal Properties. PASTE ’04

– J. Yang and D. Evans. Automatically Inferring Temporal Properties for Program Evolution. ISSRE ’04

– J. Yang and D. Evans. Automatically Discovering Temporal Properties for Program Verification. Submitted to FMSD

– J. Yang, D. Evans, D. Bhardwah, T. Bhat, and M. Das. Terracotta: Mining Temporal API Rules from Imperfect Traces. Submitted to ICSE ‘06

Page 3: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 3

Overview

• Problem: unavailability of specification is a big issue in defect detection

• Solution: automatically inferring specification from execution traces

• Benefits: better understanding of legacy code and opportunity to find more defects– Experiments on finding kernel API rules– Found one previously unknown bug in Windows– Found interesting properties that should have been checked

Page 4: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 4

Problem

• Defect detection technique• Generic properties

– E.g. pointer and buffer usage– PREfix [Bush et al, SP&E00], PREfast– Very effective

• Application specific properties– E.g. lock/unlock, resource creation/deletion– SLAM/SDV [Ball et al, SPIN01], ESP [Das et al, PLDI02]

• Where do we get such properties?

Page 5: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 5

My Approach

ProgramInstrumented

Program

Instrumentation

Test Suite

ExecutionTraces

Running

Inferred Properties

PropertyTemplates

Inference

Post-processing

Report

J. Yang and D. Evans. Dynamically inferring temporal properties. PASTE ‘04.

Page 6: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 6

An Example

• Alternating template

(PS)*, P≠S. P and S are placeholders

Lock::acq Lock::rel Lock::acq Lock::rel

P=Lock::acq and S=Lock::rel

P=Lock::rel and S=Lock::acq

PSPS

SPSP

Lock::acqLock::rel

Lock::relLock::acq

Page 7: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 7

Implementation

• Terracotta– Inference engine– Context-aware trace analysis– Heuristics for prioritizing and presenting

properties

• Performance linear to length of trace and number of distinct events

• More information

http://www.cs.virginia.edu/terracotta

Page 8: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 8

Lessons

• Missing interesting properties– Original algorithm requires 100% satisfaction

• Real world is never perfect – Trace collected by sampling– Object information unavailable – Imperfect programs

• Can we develop better inference to handle this?

• Too many noises in results– Interesting properties are buried in a group of uninteresting ones

• Can we develop heuristics to select interesting ones?

Page 9: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 9

Refinement of Inference

• How to detect interesting properties in face of imperfect traces?

• Example– PS PS PS PS PS PS PS PS PS PPP– The dominant behavior is P and S alternate– 10 subtraces, 90% satisfy Alternating

Page 10: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 10

Refinement of Inference (2)

• How to pick out interesting properties?

• Which one is more likely to be interesting?– Heuristics: CD is often more interesting– Compute call graph for windows binaries– Keep AB if B is not reachable from A

void A(){ ... B(); ...}

Case 1

void x(){ C(); ... D();}

Case 2

void KeSetTimer(){ KeSetTimerEx();}

void x(){ ExAcquireFastMutexUnsafe(&m); ... ExReleaseFastMutexUnsafe(&m);}

Page 11: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 11

Refinement of Inference (3)

• Heuristics: the more similar two events are, the more likely that the properties is interesting

• Relative edit distance between A and B– Partition A and B into words

– A has wA words, B has wB, w common words

• For example:– Ke Acquire In Stack Queued Spin Lock

Ke Release In Stack Queued Spin Lock– Similarity = 85.7%

wwdistBA

AB

w

2

Page 12: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 12

Results: Kernel

• Approximation– PAL threshold = 0.90

– 7611 properties

• Call-graph and edit distance based reduction– Use the call-graph of ntoskrnl.exe, edit dist > 0.5– 142 properties. 53 times reduction!– Small enough for manual inspection

• 56 apparently interesting properties (40%)– Locking discipline– Resource allocation and deletion

Page 13: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 13

Result: Kernel (2)

• Found interesting properties that should be checked– Several types of kernel SpinLock– The Static Device Verifier should have checked them

• ESP found one previously unknown bug in ntfs.sys – Double-acquire of FastMutex– Confirmed and fixed by the responsible developers

M. Das, S. Lerner, and M. Seigle. ESP: Path-Sensitive Program Verification in Polynomial Time. PLDI ‘02

Static Driver Verifier: Finding Bugs in Device Drivers at Compile-Time. WinHEC, April 2004.

Page 14: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 14

Summary of Experiments

• We inferred interesting rules about kernel APIs!– SDV already encodes some propertieshttp://download.microsoft.com/download/5/b/5/5b5bec17-ea71-4653-9539-204a672f11cf/SDV-intro.doc

– We inferred undocumented ones too

• Inference scales well to realistic traces• Approximation is effective in tolerating imperfect traces

and detect dominant patterns• Call-graph and edit distance based reduction is very

effective• Check with defect detection tool is promising• Other experiments: Vulcan APIs, Daisy file system

Page 15: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 15

Conclusion

• Constructing interesting properties is important and difficult

• Automatic inference from execution traces is light-weight and effective

• Practical values– Helping developers understand legacy code– Giving us opportunity of leveraging sophisticated static analysis

tools to find application specific defects

Page 16: Mining Windows Kernel API Rules

09/28/2005 Jinlin Yang, CS696 16

Q & A

• For more information

[email protected]

http://www.cs.virginia.edu/terracotta

• Great collaborators– UVa

David Evans, Ed Mitchell

– Microsoft

Stephen Adams,

Deepali Bhardwaj,

Thirumalesh Bhat,

Manuvir Das,

Damian Hasse,

Marne Staples, Rick Vicik,

Jason Yang, Zhe Yang