LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks
Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan zhou, Youfeng Wu
University of Illinois at Urbana-Champaign
Intel Corporation
The Ohio State University
Information Flow Tracking
Taint Analysis
To detect / prevent security attacks For attacks that corrupts control data General: not for specific types of software vulnerabilities Even for unknown attacks
Approach
1. Tag (label) the input data from unsafe channels: network
2. Propagate the data tags through the computation Any data derived from unsafe data are also tagged as unsafe
3. Detect unexpected usages of the unsafe data Switch the program control to the unsafe data
A Simple Example
a is unsafe Information flows from a to b: b is unsafe If c is unsafe, jumping to the location pointed by c fails
Three Ways <1>
Language-based For programs written in special type-safe programming languages To track information flow at compile time
Good: No runtime overhead
Bad: Only for specific program languages
Not Practical
Three Ways <2>
Instrumentation To track the information flow and detect exploits at runtime
Source code instrumentation Lower overhead Cannot track in third-party library code Require a specification of library calls
• Complex, error-prone, side-effects
Binary code instrumentation Runtime overhead: 37 times
Three Ways <3>
Hardware-based RIFLE
Good: low overhead
Bad: Non-trivial hardware extensions
Overview of LIFT
Dynamically instruments the binary code (1) tracking information flow (2) detect security exploits
Advantages: Low overhead, software-only, No source code
Built on top of StarDBT Binary translator by Intel
Design of LIFT
Basic design Tag management Information flow tracking Exploit detection Protection of the tag space
Optimizations
Tag Management: Design
Associate a one-bit tag for each byte of data in memory and general data register 0: safe; 1: unsafe
At the beginning: all tags are cleared to zero Data may be tagged with 1 when
It is read from network or standard input Information flow from other unsafe data to it
An unsafe data can become safe if it is reassigned from some safe data
Tag Management: Storage
For memory data Storage: a special memory region (tag space) Look-up: one-to-one mapping between a tag bit and a memory
byte in the virtual address space Overhead: 12.5% Compression:
• memory data nearby each other usually have similar tag values
For general registers Store tags in a dedicated extra register (64-bit) Reduce overhead If no spare registers: a special memory area
• No significant overhead as the L1 cache• Hardware ??
Information Flow Tracking <1>
Dynamically instrument instructions Instrumented once at runtime, and executed multiple times
The instrumentation is done before the instruction in the original program
Tracks information flow based on data dependencies but not control dependencies
Information Flow Tracking <2>
For data movement-based instructions E.g., MOV, PUSH, POP Tag propagation: source operand destination
For arithmetic instructions E.g., ADD, OR Tag propagation: both source operands destination
For instructions that involve only one operand E.g., INC The tag does not change
Information Flow Tracking <3>
Special cases XOR reg, reg: reset reg to zero SUB reg, reg: Clear the corresponding tag
Exploit Detection
Also instrument instructions to detect exploits
Unsafe data cannot be used as a return address or the destination of an indirect jump instruction
Protection of Tag Space and Code
It is necessary to protect them
To protect the LIFT code Make the memory pages that store the LIFT code read-only
To protect the tag space Turn off the access permission of the pages that store the tag
values of the tag space itself Any access of the original program or hijacked code to the tag
space results in access to the corresponding tag and triggers a fault
Optimizations
47 times runtime overhead Three binary optimizations
Fast Path (FP): Motivation
Observation: for most server applications, majority of tag propagations are zero-to-zero From safe data sources to a safe destination
FP: Approach <1>
Before a code segment, insert a check Check whether all its live-in and live-out registers and
memory data are safe or not
If so, no need to do tracking inside the code segment Run the fast binary version (check version)
If not, run the slow version (track version)
FP: Approach <2>
Live-in: source operand Live-out: may change to safe after the execution if they
are unsafe before the execution
Others: (a) not used in the code segment (b) dead at the beginning or end of the code segment
FP: More Technique Details
Difficult to know the address of all units at the beginning Run the check version first Postpone the check until the memory location is known Jump to track version when the check fails
Granularity of code segments Basic blocks Hot trace
Remove unnecessary checks Network processing component
Merged Check (MC): Motivation
Temporal / Spatial Locality A recently accessed data is likely to be accessed again in a near
future After an access to a location, memory locations that are nearby
are also likely to be accessed again in near future
To combine multiple checks into one Combine the temporally and spatially nearby checks
Merged Check: Approach
Clustering the memory references into groups
Scan all the instructions and build a data dependency graph for each memory reference
Introduce version number to represent the timing attribute
Clustering based on spatially / temporally distance
Fast Switch (FS)
When the program execution switches between the original binary code and the instrumented code it requires saving and restoring the context
Introduce large runtime overhead because they are inserted at many locations
Use cheaper instructions and remove unnecessary saves / restores
Evaluation
Effectiveness Performance
Evaluation: Effectiveness
Evaluation: Performance <1>
Throughput and response time of Apache
Throughput: 6.2% (StarDBT: 3.4%) Time: 90.9%
Evaluation: Performance <2>
SPEC2000: 3.6 times on average
Conclusion
A “Practical” Information flow tracking system Low-overhead Not requiring hardware extension Not requiring source code
Discussions
Source-code instrumentation 81% on average for CPU-intensive C-programs 5% on average for IO-intensive (sever) program If we are able to apply similar optimization techniques to source-
code instrumentation, the performance could be “practical”
Binary-code instrumentation CPU-bound: 24 times Apache server: worst case 25 times, most cases: 5~10 times
More Discussions
Focus on basic design and three optimizations Not much details about the taint analysis
Evaluation Effectiveness: false positive / false negative Performance
• IO-incentive vs. CPU-incentive
• More benchmarks
Formal model to analyze taint analysis
Top Related