Self-defending software: Collaborative learning for security and repair Michael Ernst MIT Computer...

Self-defending software:Collaborative learningfor security and repair

Michael ErnstMIT Computer Science & AI Lab

Goal: Security for COTS software

Requirements:1. Protect against unknown vulnerabilities2. Preserve functionality3. COTS (commercial off-the-shelf) softwareKey ideas:4. Learn from failure5. Leverage the benefits of a community

1. Unknown vulnerabilities• Proactively prevent attacks via unknown

vulnerabilities– “Zero-day exploits”– No pre-generated signatures– No time for human reaction– Works for bugs as well as attacks

2. Preserve functionality• Maintain continuity: application continues to

operate despite attacks• Important for mission-critical applications

– Web servers, air traffic control, communications• Technique: create a patch (repair the

application)• Previous work: crash the program when

attacked– Loss of data, overhead of restart, attack recurs

3. COTS software• No modification to source or executables• No cooperation required from developers

– No source information (no debug symbols)– Cannot assume built-in survivability features

• x86 Windows binaries

4. Learn from failure• Each attack provides information about the

underlying vulnerability• Detect all attacks (of given types)

– Prevent negative consequences– First few attacks may crash the application

• Repairs improve over time– Eventually, the attack is rendered harmless– Similar to an immune system

5. Leverage a community• Application community = many machines

running the same application (a monoculture)– Convenient for users, sysadmins, hackers

• Problem: A single vulnerability can be exploited throughout the community

• Solution: Community members collaborate to detect and resolve problems– Amortized risk: a problem in some leads to a solution

for all– Increased accuracy: exercise more behaviors during

learning– Shared burden: distribute the computational load

Evaluation• Adversarial Red Team exercise

– Red Team was hired by DARPA• Application: Firefox (v1.0)

– Exploits: malicious JavaScript, GC errors, stack smashing, heap buffer overflow, uninitialized memory

• Results:– Detected all attacks, prevented all exploits– For 15/18 attacks, generated a patch that maintained

functionality• After observing no more than 13 instances of the attack

– No false positives– Low steady-state overhead

Collaborative learning approach

Monitoring: Detect attacksPluggable detector, does not depend on learning

Learning: Infer normal behavior from successful executions

Repair: Propose, evaluate, and distribute patches based on the behavior of the attack

All modes run on each member of the communityModes are temporally and spatially overlapped

Monitoring

RepairLearning

normal execution attack (or bug)

Model of behavior

…

Collaborative Learning System Architecture

Constraints

Patch/RepairCode Generation

Patches

…

Client Workstations (Learning) Client Workstations (Protected)

Patches

Patches

Patch/Repair results

Merge Constraints (Daikon)

MPEE

Data Acquisition

Client Library

Application

Learning (Daikon)

Sample Data

MPEE

Application

Memory Firewall

Live Shield

Evaluation (observe execution)

MPEE

Application

Memory Firewall

Live Shield

Evaluation (observe execution)

MPEE

Data Acquisition

Client Library

Application

Learning (Daikon)

Sample Data

Central Management System

Structure of the system

(Server may bereplicated,

distributed, etc.)

Encrypted, authenticated communication

Server

Community machines

LearningCommunity machines

Server distributes learning tasks

Observe normal behavior

Server

Learning

…copy_len ≤ buff_size…

Clients send inference results

Server

Community machines

Server generalizes(merges results)

Clients do local inference

Generalize observed behavior

copy_len < buff_size

copy_len ≤ buff_size

copy_len = buff_size

Monitoring

Detector collects information and terminates application

Server

Community machinesDetect attacksAssumption: few false positives

Detectors used in our research:– Code injection (Memory Firewall)– Memory corruption (Heap Guard)

Many other possibilities exist

Monitoring

Server

Violated: copy_len ≤ buff_size

Instrumentation continuously evaluates learned behavior

What was the effect of the attack?

Community machines

Clients send difference in behavior: violated constraints

Server correlates constraints to attack

Repair

Candidate patches:1. Set copy_len = buff_size2. Set copy_len = 03. Set buff_size = copy_len4. Return from procedure

Server

Propose a set of patches for eachbehavior correlated to attack

Community machines

Correlated: copy_len ≤ buff_size

Server generatesa set of patches

Repair

Server Patch 1

Patch 3

Patch 2

Distribute patches to the community

Community machines

Ranking:Patch 1: 0Patch 2: 0Patch 3: 0…

Repair

Ranking:Patch 3: +5Patch 2: 0Patch 1: -5…

ServerPatch 1 failed

Patch 3 succeeded

Evaluate patchesSuccess = no detector is triggered

When attacked, clients send outcome to server

Community machines

Detector is still running on clientsServer ranks

patches

Repair

Server

Patch 3

Server redistributes the most effective patches

Redistribute the best patches Community machines

Ranking:Patch 3: +5Patch 2: 0Patch 1: -5…

Outline• Overview• Learning: create model of normal behavior• Monitoring: determine properties of attacks • Repair: propose and evaluate patches• Evaluation: adversarial Red Team exercise• Conclusion

Learning

Clients send inference results

Community machines

Server generalizes(merges results)

Clients do local inference

Generalize observed behavior

…copy_len ≤ buff_size…

Server

copy_len < buff_size

copy_len ≤ buff_size

copy_len = buff_size

Dynamic invariant detection

• Idea: generalize observed program executions

• Initial candidates = all possible instantiations of constraint templates over program variables

• Observed data quickly falsifies most candidates• Many optimizations for accuracy and speed

– Data structures, static analysis, statistical tests, …

copy_len < buff_sizecopy_len ≤ buff_sizecopy_len = buff_sizecopy_len ≥ buff_sizecopy_len > buff_sizecopy_len ≠ buff_size

copy_len: 22buff_size: 42

copy_len < buff_sizecopy_len ≤ buff_sizecopy_len = buff_sizecopy_len ≥ buff_sizecopy_len > buff_sizecopy_len ≠ buff_size

Candidate constraints: Remaining candidates:Observation:

Quality of inference results• Not sound

– Overfitting if observed executions are not representative

– Does not affect detection– For repair, mitigated by correlation step– Continued learning improves results

• Not complete– Templates are not exhaustive

• Useful!– Sound in practice– Complete enough

Enhancements to learning• Relations over variables in different parts

of the program– Flow-dependent constraints

• Distributed learning across the community• Filtering of constraint templates, variables

Learning for Windows x86 binaries

• No static or dynamic analysis to determine validity of values and pointers– Use expressions computed by the application– Can be more or fewer variables (e.g., loop invariants)– Static analysis to eliminate duplicated variables

• New constraint templates (e.g., stack pointer)• Determine code of a function from its entry point

Outline• Overview• Learning: create model of normal behavior• Monitoring: determine properties of attacks

– Detector details– Learning from failure

• Repair: propose and evaluate patches• Evaluation: adversarial Red Team exercise• Conclusion

Detecting attacks (or bugs)Code injection (Determina Memory Firewall)

– Triggers if control jumps to code that was not in the original executable

Memory corruption (Heap Guard)– Triggers if sentinel values are overwritten

These have low overhead and no false positives

Goal: detect problems close to their source

Other possible attack detectors

Non-application-specific:– Crash: page fault, assertions, etc.– Infinite loop

Application-specific checks:– Tainting, data injection– Heap consistency– User complaints

Application behavior:– Different failures/attacks– Unusual system calls or code execution– Violation of learned behaviors

Collection of detectors with high false positivesNon-example: social engineering

Learning from failuresEach attack provides information about the

underlying vulnerability– That it exists– Where it can be exploited– How the exploit operates– What repairs are successful

Monitoring

Server

Community machinesDetect attacksAssumption: few false positives


Monitoring

Server

Community machines

scanfread_inputprocess_recordmain


Client sends attack info to server

Where did the attack happen?

(Detector maintains a shadow stack)

Monitoring

Clients install instrumentation

Server

scanfread_inputprocess_recordmain

Server generates instrumentation for targeted code locations

Server sends instru-mentation to all clients

Monitoring for main,

process_record, …

Extra checking in attacked codeEvaluate learned constraints

Community machines

Monitoring

Server

Violated: copy_len ≤ buff_size

Instrumentation continuously evaluates inferred behavior

What was the effect of the attack?

Community machines

Clients send difference in behavior: violated constraints

Server correlates constraints to attack

Correlating attacks and constraints

Evaluate only constraints at the attack sites– Low overhead

A constraint is correlated to an attack if:– The constraint is violated iff the attack occurs

Create repairs for each correlated constraint– There may be multiple correlated constraints– Multiple candidate repairs per constraint

Repair

Server Patch 1

Patch 3

Patch 2

Distribute patches to the communitySuccess = no detector is triggered

Community machines

Ranking:Patch 1: 0Patch 2: 0Patch 3: 0…

Patch evaluation uses additional detectors(e.g., crash, difference in attack)

Attack example• Target: JavaScript system routine (written in C++)

– Casts its argument to a C++ object, calls a virtual method– Does not check type of the argument

• Attack supplies an “object” whose virtual table points to attacker-supplied code

• A constraint at the method call is correlated– Constraint: JSRI address target is one of a known set

• Possible repairs:– Skip over the call– Return early – Call one of the known valid methods– No repair

Triggering a repair• Repair may depend on constraint checking

– if !(copy_len ≤ buff_size) copy_len = buff_size;– If constraint is not violated, no need to repair– If constraint is violated, an attack is (probably) underway

• Repair does not depend on the detector– Should fix the problem before the detector is triggered

• Repair is not identical to what a human would write– A stopgap until an official patch is released– Unacceptable to wait for human response

Evaluating a patch • In-field evaluation

– No attack detector is triggered– No crashes or other behavior deviations

Pre-validation, before distributing the patch:• Replay the attack

+ No need to wait for a second attack+ Exactly reproduce the problem– Expensive to record log– Log terminates abruptly– Need to prevent irrevocable effects

• Run the program’s test suite– May be too sensitive– Not available for COTS software

Red Team• Red Team attempts to break our system

– Hired by DARPA– (MIT = “Blue Team”)

• Red Team created 18 Firefox exploits– Each exploit is a webpage– Firefox executes arbitrary code– Malicious JavaScript, GC errors, stack

smashing, heap buffer overflow, uninitialized memory

Rules of engagement• Firefox 1.0

– Blue team may not tune system to known vulnerabilities – Focus on most security-critical components

• No access to a community for learning• Focus on the research aspects

– Red Team may only attack Firefox and the Blue Team infrastructure

• No pre-infected community members– Future work

• Red Team has access to the implementation– Red Team may not discard attacks thwarted by the

system

Results• Blue Team system:

– Detected all attacks, prevented all exploits– Repaired 15/18 attacks

• After observing no more than 13 instances of the attack

– Handled simultaneous & intermixed attacks– Suffered no false positives– Monitoring/repair overhead was not noticeable

Results: Caveats• Blue team fixed a bug with reusing

filenames• Communications infrastructure (from

subcontractor) failed during the last day• Learning overhead is high

– Optimizations are possible– Can be distributed over community

Related work• Distributed detection

– Vigilante [Costa] (for worms; proof of exploit)– Live monitoring [Kıcıman]– Statistical bug isolation [Liblit]

• Learning (lots)– Typically for anomaly detection– FSMs for system calls

• Repair [Demsky]– requires specification; not scalable

Credits• Saman Amarasinghe• Jonathan Bachrach• Michael Carbin• Danny Dig• Michael Ernst• Sung Kim• Samuel Larsen

• Carlos Pacheco• Jeff Perkins• Martin Rinard• Frank Sherwood• Greg Sullivan• Weng-Fai Wong• Yoav Zibin

Subcontractor: Determina, Inc.Funding: DARPA (PM: Lee Badger)Red Team: SPARTA, Inc.

ContributionsFramework for collaborative learning and repair

• Pluggable detection, learning, repair • Handles unknown vulnerabilities• Preserves functionality by repairing the vulnerability• Learns from failure• Focuses resources where they are most effective

Implementation for Windows x86 binariesEvaluation via a Red Team exercise

Self-defending software: Collaborative learning for security and repair Michael Ernst MIT Computer...

Documents

Transcript of Self-defending software: Collaborative learning for security and repair Michael Ernst MIT Computer...