Download - LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks

LIFT: A Low-Overhead Practical Information Flow Tracking System for Detecting Security Attacks

Feng Qin, Cheng Wang, Zhenmin Li, Ho-seop Kim, Yuanyuan zhou, Youfeng Wu

University of Illinois at Urbana-Champaign

Intel Corporation

The Ohio State University

Information Flow Tracking

Taint Analysis

To detect / prevent security attacks For attacks that corrupts control data General: not for specific types of software vulnerabilities Even for unknown attacks

Approach

1. Tag (label) the input data from unsafe channels: network

2. Propagate the data tags through the computation Any data derived from unsafe data are also tagged as unsafe

3. Detect unexpected usages of the unsafe data Switch the program control to the unsafe data

A Simple Example

a is unsafe Information flows from a to b: b is unsafe If c is unsafe, jumping to the location pointed by c fails

Three Ways <1>

Language-based For programs written in special type-safe programming languages To track information flow at compile time

Good: No runtime overhead

Bad: Only for specific program languages

Not Practical

Three Ways <2>

Instrumentation To track the information flow and detect exploits at runtime

Source code instrumentation Lower overhead Cannot track in third-party library code Require a specification of library calls

• Complex, error-prone, side-effects

Binary code instrumentation Runtime overhead: 37 times

Three Ways <3>

Hardware-based RIFLE

Good: low overhead

Bad: Non-trivial hardware extensions

Overview of LIFT

Dynamically instruments the binary code (1) tracking information flow (2) detect security exploits

Advantages: Low overhead, software-only, No source code

Built on top of StarDBT Binary translator by Intel

Design of LIFT

Basic design Tag management Information flow tracking Exploit detection Protection of the tag space

Optimizations

Tag Management: Design

Associate a one-bit tag for each byte of data in memory and general data register 0: safe; 1: unsafe

At the beginning: all tags are cleared to zero Data may be tagged with 1 when

It is read from network or standard input Information flow from other unsafe data to it

An unsafe data can become safe if it is reassigned from some safe data

Tag Management: Storage

For memory data Storage: a special memory region (tag space) Look-up: one-to-one mapping between a tag bit and a memory

byte in the virtual address space Overhead: 12.5% Compression:

• memory data nearby each other usually have similar tag values

For general registers Store tags in a dedicated extra register (64-bit) Reduce overhead If no spare registers: a special memory area

• No significant overhead as the L1 cache• Hardware ??

Information Flow Tracking <1>

Dynamically instrument instructions Instrumented once at runtime, and executed multiple times

The instrumentation is done before the instruction in the original program

Tracks information flow based on data dependencies but not control dependencies


For data movement-based instructions E.g., MOV, PUSH, POP Tag propagation: source operand destination

For arithmetic instructions E.g., ADD, OR Tag propagation: both source operands destination

For instructions that involve only one operand E.g., INC The tag does not change


Special cases XOR reg, reg: reset reg to zero SUB reg, reg: Clear the corresponding tag

Exploit Detection

Also instrument instructions to detect exploits

Unsafe data cannot be used as a return address or the destination of an indirect jump instruction

Protection of Tag Space and Code

It is necessary to protect them

To protect the LIFT code Make the memory pages that store the LIFT code read-only

To protect the tag space Turn off the access permission of the pages that store the tag

values of the tag space itself Any access of the original program or hijacked code to the tag

space results in access to the corresponding tag and triggers a fault

Optimizations

47 times runtime overhead Three binary optimizations

Fast Path (FP): Motivation

Observation: for most server applications, majority of tag propagations are zero-to-zero From safe data sources to a safe destination

FP: Approach <1>

Before a code segment, insert a check Check whether all its live-in and live-out registers and

memory data are safe or not

If so, no need to do tracking inside the code segment Run the fast binary version (check version)

If not, run the slow version (track version)

FP: Approach <2>

Live-in: source operand Live-out: may change to safe after the execution if they

are unsafe before the execution

Others: (a) not used in the code segment (b) dead at the beginning or end of the code segment

FP: More Technique Details

Difficult to know the address of all units at the beginning Run the check version first Postpone the check until the memory location is known Jump to track version when the check fails

Granularity of code segments Basic blocks Hot trace

Remove unnecessary checks Network processing component

Merged Check (MC): Motivation

Temporal / Spatial Locality A recently accessed data is likely to be accessed again in a near

future After an access to a location, memory locations that are nearby

are also likely to be accessed again in near future

To combine multiple checks into one Combine the temporally and spatially nearby checks

Merged Check: Approach

Clustering the memory references into groups

Scan all the instructions and build a data dependency graph for each memory reference

Introduce version number to represent the timing attribute

Clustering based on spatially / temporally distance

Fast Switch (FS)

When the program execution switches between the original binary code and the instrumented code it requires saving and restoring the context

Introduce large runtime overhead because they are inserted at many locations

Use cheaper instructions and remove unnecessary saves / restores

Evaluation

Effectiveness Performance

Evaluation: Effectiveness

Evaluation: Performance <1>

Throughput and response time of Apache

Throughput: 6.2% (StarDBT: 3.4%) Time: 90.9%

Evaluation: Performance <2>

SPEC2000: 3.6 times on average

Conclusion

A “Practical” Information flow tracking system Low-overhead Not requiring hardware extension Not requiring source code

Discussions

Source-code instrumentation 81% on average for CPU-intensive C-programs 5% on average for IO-intensive (sever) program If we are able to apply similar optimization techniques to source-

code instrumentation, the performance could be “practical”

Binary-code instrumentation CPU-bound: 24 times Apache server: worst case 25 times, most cases: 5~10 times

More Discussions

Focus on basic design and three optimizations Not much details about the taint analysis

Evaluation Effectiveness: false positive / false negative Performance

• IO-incentive vs. CPU-incentive

• More benchmarks

Formal model to analyze taint analysis