Self-defending software: Collaborative learning for security and repair Michael Ernst MIT Computer...
-
Upload
george-hutchinson -
Category
Documents
-
view
220 -
download
0
description
Transcript of Self-defending software: Collaborative learning for security and repair Michael Ernst MIT Computer...
Self-defending software:Collaborative learningfor security and repair
Michael ErnstMIT Computer Science & AI Lab
Goal: Security for COTS software
Requirements:1. Protect against unknown vulnerabilities2. Preserve functionality3. COTS (commercial off-the-shelf) softwareKey ideas:4. Learn from failure5. Leverage the benefits of a community
1. Unknown vulnerabilities• Proactively prevent attacks via unknown
vulnerabilities– “Zero-day exploits”– No pre-generated signatures– No time for human reaction– Works for bugs as well as attacks
2. Preserve functionality• Maintain continuity: application continues to
operate despite attacks• Important for mission-critical applications
– Web servers, air traffic control, communications• Technique: create a patch (repair the
application)• Previous work: crash the program when
attacked– Loss of data, overhead of restart, attack recurs
3. COTS software• No modification to source or executables• No cooperation required from developers
– No source information (no debug symbols)– Cannot assume built-in survivability features
• x86 Windows binaries
4. Learn from failure• Each attack provides information about the
underlying vulnerability• Detect all attacks (of given types)
– Prevent negative consequences– First few attacks may crash the application
• Repairs improve over time– Eventually, the attack is rendered harmless– Similar to an immune system
5. Leverage a community• Application community = many machines
running the same application (a monoculture)– Convenient for users, sysadmins, hackers
• Problem: A single vulnerability can be exploited throughout the community
• Solution: Community members collaborate to detect and resolve problems– Amortized risk: a problem in some leads to a solution
for all– Increased accuracy: exercise more behaviors during
learning– Shared burden: distribute the computational load
Evaluation• Adversarial Red Team exercise
– Red Team was hired by DARPA• Application: Firefox (v1.0)
– Exploits: malicious JavaScript, GC errors, stack smashing, heap buffer overflow, uninitialized memory
• Results:– Detected all attacks, prevented all exploits– For 15/18 attacks, generated a patch that maintained
functionality• After observing no more than 13 instances of the attack
– No false positives– Low steady-state overhead
Collaborative learning approach
Monitoring: Detect attacksPluggable detector, does not depend on learning
Learning: Infer normal behavior from successful executions
Repair: Propose, evaluate, and distribute patches based on the behavior of the attack
All modes run on each member of the communityModes are temporally and spatially overlapped
Monitoring
RepairLearning
normal execution attack (or bug)
Model of behavior
Collaborative learning approach
Monitoring: Detect attacksPluggable detector, does not depend on learning
Learning: Infer normal behavior from successful executions
Repair: Propose, evaluate, and distribute patches based on the behavior of the attack
All modes run on each member of the communityModes are temporally and spatially overlapped
Monitoring
RepairLearning
normal execution attack (or bug)
Model of behavior
Collaborative learning approach
Monitoring: Detect attacksPluggable detector, does not depend on learning
Learning: Infer normal behavior from successful executions
Repair: Propose, evaluate, and distribute patches based on the behavior of the attack
All modes run on each member of the communityModes are temporally and spatially overlapped
Monitoring
RepairLearning
normal execution attack (or bug)
Model of behavior
Collaborative learning approach
Monitoring: Detect attacksPluggable detector, does not depend on learning
Learning: Infer normal behavior from successful executions
Repair: Propose, evaluate, and distribute patches based on the behavior of the attack
All modes run on each member of the communityModes are temporally and spatially overlapped
Monitoring
RepairLearning
normal execution attack (or bug)
Model of behavior
Collaborative learning approach
Monitoring: Detect attacksPluggable detector, does not depend on learning
Learning: Infer normal behavior from successful executions
Repair: Propose, evaluate, and distribute patches based on the behavior of the attack
All modes run on each member of the communityModes are temporally and spatially overlapped
Monitoring
RepairLearning
normal execution attack (or bug)
Model of behavior
…
Collaborative Learning System Architecture
Constraints
Patch/RepairCode Generation
Patches
…
Client Workstations (Learning) Client Workstations (Protected)
Patches
Patches
Patch/Repair results
Merge Constraints (Daikon)
MPEE
Data Acquisition
Client Library
Application
Learning (Daikon)
Sample Data
MPEE
Application
Memory Firewall
Live Shield
Evaluation (observe execution)
MPEE
Application
Memory Firewall
Live Shield
Evaluation (observe execution)
MPEE
Data Acquisition
Client Library
Application
Learning (Daikon)
Sample Data
Central Management System
Structure of the system
(Server may bereplicated,
distributed, etc.)
Encrypted, authenticated communication
Server
Community machines
LearningCommunity machines
Server distributes learning tasks
Observe normal behavior
Server
Learning
…copy_len ≤ buff_size…
Clients send inference results
Server
Community machines
Server generalizes(merges results)
Clients do local inference
Generalize observed behavior
copy_len < buff_size
copy_len ≤ buff_size
copy_len = buff_size
Monitoring
Detector collects information and terminates application
Server
Community machinesDetect attacksAssumption: few false positives
Detectors used in our research:– Code injection (Memory Firewall)– Memory corruption (Heap Guard)
Many other possibilities exist
Monitoring
Server
Violated: copy_len ≤ buff_size
Instrumentation continuously evaluates learned behavior
What was the effect of the attack?
Community machines
Clients send difference in behavior: violated constraints
Server correlates constraints to attack
Repair
Candidate patches:1. Set copy_len = buff_size2. Set copy_len = 03. Set buff_size = copy_len4. Return from procedure
Server
Propose a set of patches for eachbehavior correlated to attack
Community machines
Correlated: copy_len ≤ buff_size
Server generatesa set of patches
Repair
Server Patch 1
Patch 3
Patch 2
Distribute patches to the community
Community machines
Ranking:Patch 1: 0Patch 2: 0Patch 3: 0…
Repair
Ranking:Patch 3: +5Patch 2: 0Patch 1: -5…
ServerPatch 1 failed
Patch 3 succeeded
Evaluate patchesSuccess = no detector is triggered
When attacked, clients send outcome to server
Community machines
Detector is still running on clientsServer ranks
patches
Repair
Server
Patch 3
Server redistributes the most effective patches
Redistribute the best patches Community machines
Ranking:Patch 3: +5Patch 2: 0Patch 1: -5…
Outline• Overview• Learning: create model of normal behavior• Monitoring: determine properties of attacks • Repair: propose and evaluate patches• Evaluation: adversarial Red Team exercise• Conclusion
Learning
Clients send inference results
Community machines
Server generalizes(merges results)
Clients do local inference
Generalize observed behavior
…copy_len ≤ buff_size…
Server
copy_len < buff_size
copy_len ≤ buff_size
copy_len = buff_size
Dynamic invariant detection
• Idea: generalize observed program executions
• Initial candidates = all possible instantiations of constraint templates over program variables
• Observed data quickly falsifies most candidates• Many optimizations for accuracy and speed
– Data structures, static analysis, statistical tests, …
copy_len < buff_sizecopy_len ≤ buff_sizecopy_len = buff_sizecopy_len ≥ buff_sizecopy_len > buff_sizecopy_len ≠ buff_size
copy_len: 22buff_size: 42
copy_len < buff_sizecopy_len ≤ buff_sizecopy_len = buff_sizecopy_len ≥ buff_sizecopy_len > buff_sizecopy_len ≠ buff_size
Candidate constraints: Remaining candidates:Observation:
Quality of inference results• Not sound
– Overfitting if observed executions are not representative
– Does not affect detection– For repair, mitigated by correlation step– Continued learning improves results
• Not complete– Templates are not exhaustive
• Useful!– Sound in practice– Complete enough
Enhancements to learning• Relations over variables in different parts
of the program– Flow-dependent constraints
• Distributed learning across the community• Filtering of constraint templates, variables
Learning for Windows x86 binaries
• No static or dynamic analysis to determine validity of values and pointers– Use expressions computed by the application– Can be more or fewer variables (e.g., loop invariants)– Static analysis to eliminate duplicated variables
• New constraint templates (e.g., stack pointer)• Determine code of a function from its entry point
Outline• Overview• Learning: create model of normal behavior• Monitoring: determine properties of attacks
– Detector details– Learning from failure
• Repair: propose and evaluate patches• Evaluation: adversarial Red Team exercise• Conclusion
Detecting attacks (or bugs)Code injection (Determina Memory Firewall)
– Triggers if control jumps to code that was not in the original executable
Memory corruption (Heap Guard)– Triggers if sentinel values are overwritten
These have low overhead and no false positives
Goal: detect problems close to their source
Other possible attack detectors
Non-application-specific:– Crash: page fault, assertions, etc.– Infinite loop
Application-specific checks:– Tainting, data injection– Heap consistency– User complaints
Application behavior:– Different failures/attacks– Unusual system calls or code execution– Violation of learned behaviors
Collection of detectors with high false positivesNon-example: social engineering
Learning from failuresEach attack provides information about the
underlying vulnerability– That it exists– Where it can be exploited– How the exploit operates– What repairs are successful
Monitoring
Server
Community machinesDetect attacksAssumption: few false positives
Detector collects information and terminates application
Monitoring
Server
Community machines
scanfread_inputprocess_recordmain
Detector collects information and terminates application
Client sends attack info to server
Where did the attack happen?
(Detector maintains a shadow stack)
Monitoring
Clients install instrumentation
Server
scanfread_inputprocess_recordmain
Server generates instrumentation for targeted code locations
Server sends instru-mentation to all clients
Monitoring for main,
process_record, …
Extra checking in attacked codeEvaluate learned constraints
Community machines
Monitoring
Server
Violated: copy_len ≤ buff_size
Instrumentation continuously evaluates inferred behavior
What was the effect of the attack?
Community machines
Clients send difference in behavior: violated constraints
Server correlates constraints to attack
Correlating attacks and constraints
Evaluate only constraints at the attack sites– Low overhead
A constraint is correlated to an attack if:– The constraint is violated iff the attack occurs
Create repairs for each correlated constraint– There may be multiple correlated constraints– Multiple candidate repairs per constraint
Outline• Overview• Learning: create model of normal behavior• Monitoring: determine properties of attacks • Repair: propose and evaluate patches• Evaluation: adversarial Red Team exercise• Conclusion
Repair
Server Patch 1
Patch 3
Patch 2
Distribute patches to the communitySuccess = no detector is triggered
Community machines
Ranking:Patch 1: 0Patch 2: 0Patch 3: 0…
Patch evaluation uses additional detectors(e.g., crash, difference in attack)
Attack example• Target: JavaScript system routine (written in C++)
– Casts its argument to a C++ object, calls a virtual method– Does not check type of the argument
• Attack supplies an “object” whose virtual table points to attacker-supplied code
• A constraint at the method call is correlated– Constraint: JSRI address target is one of a known set
• Possible repairs:– Skip over the call– Return early – Call one of the known valid methods– No repair
Triggering a repair• Repair may depend on constraint checking
– if !(copy_len ≤ buff_size) copy_len = buff_size;– If constraint is not violated, no need to repair– If constraint is violated, an attack is (probably) underway
• Repair does not depend on the detector– Should fix the problem before the detector is triggered
• Repair is not identical to what a human would write– A stopgap until an official patch is released– Unacceptable to wait for human response
Evaluating a patch • In-field evaluation
– No attack detector is triggered– No crashes or other behavior deviations
Pre-validation, before distributing the patch:• Replay the attack
+ No need to wait for a second attack+ Exactly reproduce the problem– Expensive to record log– Log terminates abruptly– Need to prevent irrevocable effects
• Run the program’s test suite– May be too sensitive– Not available for COTS software
Outline• Overview• Learning: create model of normal behavior• Monitoring: determine properties of attacks • Repair: propose and evaluate patches• Evaluation: adversarial Red Team exercise• Conclusion
Red Team• Red Team attempts to break our system
– Hired by DARPA– (MIT = “Blue Team”)
• Red Team created 18 Firefox exploits– Each exploit is a webpage– Firefox executes arbitrary code– Malicious JavaScript, GC errors, stack
smashing, heap buffer overflow, uninitialized memory
Rules of engagement• Firefox 1.0
– Blue team may not tune system to known vulnerabilities – Focus on most security-critical components
• No access to a community for learning• Focus on the research aspects
– Red Team may only attack Firefox and the Blue Team infrastructure
• No pre-infected community members– Future work
• Red Team has access to the implementation– Red Team may not discard attacks thwarted by the
system
Results• Blue Team system:
– Detected all attacks, prevented all exploits– Repaired 15/18 attacks
• After observing no more than 13 instances of the attack
– Handled simultaneous & intermixed attacks– Suffered no false positives– Monitoring/repair overhead was not noticeable
Results: Caveats• Blue team fixed a bug with reusing
filenames• Communications infrastructure (from
subcontractor) failed during the last day• Learning overhead is high
– Optimizations are possible– Can be distributed over community
Outline• Overview• Learning: create model of normal behavior• Monitoring: determine properties of attacks • Repair: propose and evaluate patches• Evaluation: adversarial Red Team exercise• Conclusion
Related work• Distributed detection
– Vigilante [Costa] (for worms; proof of exploit)– Live monitoring [Kıcıman]– Statistical bug isolation [Liblit]
• Learning (lots)– Typically for anomaly detection– FSMs for system calls
• Repair [Demsky]– requires specification; not scalable
Credits• Saman Amarasinghe• Jonathan Bachrach• Michael Carbin• Danny Dig• Michael Ernst• Sung Kim• Samuel Larsen
• Carlos Pacheco• Jeff Perkins• Martin Rinard• Frank Sherwood• Greg Sullivan• Weng-Fai Wong• Yoav Zibin
Subcontractor: Determina, Inc.Funding: DARPA (PM: Lee Badger)Red Team: SPARTA, Inc.
ContributionsFramework for collaborative learning and repair
• Pluggable detection, learning, repair • Handles unknown vulnerabilities• Preserves functionality by repairing the vulnerability• Learns from failure• Focuses resources where they are most effective
Implementation for Windows x86 binariesEvaluation via a Red Team exercise