RAIN: Refinable Attack Investigation with On-demand Inter ...

27
RAIN: Refinable Attack Investigation with On-demand Inter-process Information Flow Tracking Yang Ji , Sangho Lee, Evan Downing, Weiren Wang, Mattia Fazzini, Taesoo Kim, Alessandro Orso, and Wenke Lee ACM CCS 2017 Oct 31, 2017

Transcript of RAIN: Refinable Attack Investigation with On-demand Inter ...

Page 1: RAIN: Refinable Attack Investigation with On-demand Inter ...

RAIN: Refinable Attack Investigation with

On-demand Inter-process Information

Flow Tracking

Yang Ji, Sangho Lee, Evan Downing, Weiren Wang, Mattia

Fazzini, Taesoo Kim, Alessandro Orso, and Wenke Lee

ACM CCS 2017

Oct 31, 2017

Page 2: RAIN: Refinable Attack Investigation with On-demand Inter ...

More and more data breaches

2

658558

819924

1029853

1155

815918

513

1594

428

2459

316427

665 721

1901

0

500

1000

1500

2000

2500

3000

2013-H1 2013-H2 2014-H1 2014-H2 2015-H1 2015-H2 2016-H1 2016-H2 2017-H1

DATA BREACHES

(SOURCE: BREACH LEVEL INDEX BY GEMALTO)

Number of data breaches Number of breached records (mil)

Page 3: RAIN: Refinable Attack Investigation with On-demand Inter ...

Is attack investigation accurate?

3

send

sendsendsend

read

read

read

A

B

C

“Hmm,

I only want C!”

A, B, or C ?

Dependency confusion!

Page 4: RAIN: Refinable Attack Investigation with On-demand Inter ...

4

“Let me change the offer price.”

recv

write

File archive

write

write

writeread

?

?

?

Is this file affected ?Dependency confusion!

Page 5: RAIN: Refinable Attack Investigation with On-demand Inter ...

Related work

Accuracy

Runtime

Efficiency

Analysis

Efficiency

• System-call-based

• DTrace, Protracer, LSM, Hi-Fi

• Dynamic Information Flow Tracking (DIFT)

• Panorama, Dtracker

• DIFT + Record replay

• Arnold

4

Page 6: RAIN: Refinable Attack Investigation with On-demand Inter ...

RAIN

Accuracy

Runtime

Efficiency

Analysis

Efficiency

• We use

• Record replay

• Graph-based pruning

• Selective DIFT

• We achieve• High accuracy

• Runtime efficiency

• Highly improved analysis efficiency RAIN

5

Page 7: RAIN: Refinable Attack Investigation with On-demand Inter ...

Threat model

• Trusts the OS• RAIN tracks user-level attacks.

• Tracks explicit channels• Side or covert channel is out of scope.

• Records all attacks from their inception• Hardware trojans or OS backdoor is out of scope.

7

Page 8: RAIN: Refinable Attack Investigation with On-demand Inter ...

8

Analysis host

Provenance

graph builder

Triggering,

reachability

analysis

Replay and

selective DIFT

RAIN

Kernel Module

Customized

libc

Target host

Logs

Coarse-level graph Pruned sub-graph Refined sub-graph

Prune Refine

Architecture

Page 9: RAIN: Refinable Attack Investigation with On-demand Inter ...

OS-level record replay

File

Socket

Randomness

External

inputs

1.Records external inputs

2.Captures the thread

switching from the

pthread interface, not the

produced internal data

3.Records system-wide

executions

IPC

Thread 1 Thread 2

Process group

Internal

data

Thread switching

(via Pthread) 9

Page 10: RAIN: Refinable Attack Investigation with On-demand Inter ...

Coarse-level logging and graph building

• Keeps logging system-call events

• Constructs a graph to represent:• the processes, files, and sockets as nodes

• the events as causality edges

10

C

B

P1

A

Send

Read

Read

A: Attacker site

B: /docs/report.doc

C: /tmp/errors.zip

P1: /usr/bin/firefox

Page 11: RAIN: Refinable Attack Investigation with On-demand Inter ...

Pruning

• Does every recorded execution need replay and DIFT?

• Prunes the data in the graph based on trigger analysis results• Upstream

• Downstream

• Point-to-point

• Interference

No!

11

Page 12: RAIN: Refinable Attack Investigation with On-demand Inter ...

F

P3

C

E

P2

B

P1

A

D

Send

Read

Read

Write

Write

Read

Read

Mmap

A: Attacker site

B: /docs/report.doc

C: /tmp/errors.zip

D: /docs/ctct1.csv

E: /docs/ctct2.pdf

F: /docs/loss.csv

P1: /usr/bin/firefox

P2: /usr/bin/TextEditor

P3: /bin/gzip

Upstream

12

Page 13: RAIN: Refinable Attack Investigation with On-demand Inter ...

P3

C

EP2

B

P1

A

D

Read

Write

Write

Read

Write

ReadWrite

Read

A: Tampered file /docs/ctct.csv

B: Seasonal report docs/s1.csv

C: Seasonal report docs/s2.csv

D: Budget report docs/bgt.csv

E: Half-year report docs/h2.pdf

P1: Spreadsheet editor

P2: Auto-budget program

P3: Auto-report program

Downstream

13

Page 14: RAIN: Refinable Attack Investigation with On-demand Inter ...

Point-to-point

FP3

C

EP2

B

A

D

P1

P4

Read

Read

Read

Read

Read

Write

Write

Write

Write

Send

1

2

A: Tampered file /docs/ctct.csv

B: Seasonal report docs/s1.csv

C: Seasonal report docs/s2.csv

D: Budget report docs/bgt.csv

E: Half-year report docs/h2.pdf

F: Document archive server

P1: Spreadsheet editor

P2: Auto-budget program

P3: Auto-report program

P4: Firefox browser

14

Page 15: RAIN: Refinable Attack Investigation with On-demand Inter ...

Interference

• Insight: only inbound and outbound files that interfere in a process will possibly produce causality.• We determine interference according to the time order of inbound and

outbound IO events.

D

BP2

Write

Read

t1

t2

t1<t2

CF

P3

t1

t2

t3

E

Read

Mmap

Write

t1< t2< t3 15

Page 16: RAIN: Refinable Attack Investigation with On-demand Inter ...

Refinement - selective DIFT

• Replays and conducts DIFT to the necessary part of the execution• Aggregation

• Upstream

• Downstream

• Point-to-point

16

Page 17: RAIN: Refinable Attack Investigation with On-demand Inter ...

F

P3

C

E

P2

B

P1

A

D

Send

Read

Read

Write

Write

Read

Read

Mmap

Upstream refinement

17

Page 18: RAIN: Refinable Attack Investigation with On-demand Inter ...

Implementation summary

• RAIN is built on top of:• Arnold, the record replay framework

• Dtracker (Libdft) and Dytan, the taint engines

Host Module LoC

Target hostKernel module 2,200 C (Diff)

Trace logistics 1,100 C

Analysis host

Provenance graph 6,800 C++

Trigger/Pruning 1,100 Python

Selective refinement 900 Python

DIFT Pin tools 3,500 C/C++ (Diff)

18

Page 19: RAIN: Refinable Attack Investigation with On-demand Inter ...

Evaluations

• Runtime performance

• Accuracy

• Analysis cost

• Storage footprint

19

Page 20: RAIN: Refinable Attack Investigation with On-demand Inter ...

20

Runtime overhead: 3.22% SPEC CPU2006

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

140.00%

160.00%

bzi

p2

per

lben

ch

calc

ulix

gam

ess

bw

aves

sjen

g

om

net

pp

mcf

asta

r

h264r

ef

hm

mer

xala

ncb

mk

go

bm

k

libq

uan

tum

sphin

x3

milc

zeu

smp

gro

mac

s

lesl

ie3d

nam

d

lbm

dea

lII

sop

lex

po

vray

Gem

sFD

TD

ton

to wrf

gcc

Logging Logging+Recording

Page 21: RAIN: Refinable Attack Investigation with On-demand Inter ...

Multi-thread runtime overhead: 5.35% SPLASH-3

21

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

120.00%

140.00%

ocean-c ocean-n fmm radiosity water-n water-s barnes volrend

Page 22: RAIN: Refinable Attack Investigation with On-demand Inter ...

IO intensive application: less than 50%

22

0

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

kernelcopy

moviedownload

libccompilation

Firefoxsession

Logging Logging+Recording

Page 23: RAIN: Refinable Attack Investigation with On-demand Inter ...

23

0.00%

20.00%

40.00%

60.00%

80.00%

100.00%

Screengrab Cameragrab Audiograb NetRecon MotivatingExample

90.50%

32%39.70%

84.70%

67.00%

0% 0% 0%

13%

0%

Dependency confusion rate

Coarse-level Fine-level

High analysis accuracy

Scenarios from red team exercise of DARPA Transparent Computing program

Page 24: RAIN: Refinable Attack Investigation with On-demand Inter ...

Pruning effectiveness: ~94.2% reduction

24

0

100

200

300

400

500

600

700

800

Screengrab Cameragrab Audiograb NetRecon MotiveExample

99 141

310

138

720

5 19 11 13 34

Taint workload: #processes

None RAIN

Page 25: RAIN: Refinable Attack Investigation with On-demand Inter ...

Storage cost: ~4GB per day (1.5TB per year)

25

4000

740

200.6

166.1

133.6

105

113.9

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Per day desktop

Libc compilation

Motive Example

NetRecon

Audiograb

Cameragrab

Screengrab

Storage overhead (MB)

Page 26: RAIN: Refinable Attack Investigation with On-demand Inter ...

Discussion

• Limitations• RAIN trusts the OS that needs kernel integrity protection.

• Over-tainting issue

• Direction• Hypervisor-based RAIN

• Further reduce storage overhead

26

Page 27: RAIN: Refinable Attack Investigation with On-demand Inter ...

Conclusion

• RAIN adopts a multi-level provenance system to facilitate fine-grained analysis that enables accurate attack investigation.

• RAIN has low runtime overhead, as well as significantly improved analysis cost.

27