Logic-based, data-driven enterprise network security analysis

40
Logic-based, data-driven enterprise network security analysis Xinming (Simon) Ou Assistant Professor CIS Department Kansas State University COS 598D: Formal Methods in Networking Princeton University March 08, 2010 1

description

Logic-based, data-driven enterprise network security analysis. Xinming (Simon) Ou Assistant Professor CIS Department Kansas State University. COS 598D: Formal Methods in Networking Princeton University March 08, 2010. Self Introduction. Brief Bio PhD, Princeton University, 2005 - PowerPoint PPT Presentation

Transcript of Logic-based, data-driven enterprise network security analysis

Page 1: Logic-based, data-driven enterprise network security analysis

Logic-based, data-driven enterprise network security analysis

Xinming (Simon) OuAssistant Professor

CIS Department

Kansas State University

COS 598D: Formal Methods in Networking

Princeton University

March 08, 2010

1

Page 2: Logic-based, data-driven enterprise network security analysis

Self Introduction

• Brief Bio– PhD, Princeton University, 2005

– Post-doc, Purdue CERIAS, Idaho National Laboratory, 2006

– Assistant Professor, Kansas State University, 2006-now

• Research Interests– Computer and network security, especially on formal and quantitative

analysis

– Programming languages, formal methods

• Research Group– Argus: http://people.cis.ksu.edu/~xou/argus/

2

Page 3: Logic-based, data-driven enterprise network security analysis

Overview of the two lectures

• Lecture One– Datalog model for network attacks– SLG resolution for Datalog evaluation– Exhaustive proof generation for Datalog

• Lecture Two– Formulating security hardening problem as a SAT

solving problem– Applying MinCostSAT to achieve optimal security

configuration– Open research problems

3

Page 4: Logic-based, data-driven enterprise network security analysis

Cyber Defender’s Life

Security advisories

Apache1.3.4bug!

Vulnerability reports

Network configuration

IDS alertsUsers and data assets

Reasoning System

Automated Situation Awareness

4

Page 5: Logic-based, data-driven enterprise network security analysis

Multi-step Attacks

Internet

Demilitarized zone (DMZ)

Corporation

webServer

workStationwebPages

fileServer

Firewall 2

buffer

overrun

Trojan horsesharedBinaryNFS shell

Firewall 1

5

Page 6: Logic-based, data-driven enterprise network security analysis

Two Questions

• Are there potential attack paths in the system?– How can they happen?– How can they be addressed in an optimal way?

• Are there attacks that are going on/have succeeded in the system?– How do you know?– How to counter the attack?

What we are going to focus on

6

Page 7: Logic-based, data-driven enterprise network security analysis

MulVAL

Datalog Rules from Security Experts

Vulnerability Scanner

Analyzer

Could root be compromised on any of

the machines?Ou, Govindavajhala, and Appel. Usenix Security 2005

Answers

Network Analyzer

Vulnerability Information (e.g.

NIST NVD)

Network reachability information

Vulnerability definition (e.g. OVAL, Nessus

Scripting Language)

User information

Vulnerability Scanner

7

Page 8: Logic-based, data-driven enterprise network security analysis

Network config(firewall analyzer)

Host access-control lists

reachable(internet, webServer, tcp, 80)reachable(webServer, fileserver, nfs, -)

.

.

.

8

Page 9: Logic-based, data-driven enterprise network security analysis

Host config scanner

File permissions

fileOwner(webServer, /bin/apache, root)

fileAttr(webServer, /bin/apache, r,w,x,r,0,0,r,0,0)

9

Page 10: Logic-based, data-driven enterprise network security analysis

Host-based vulnerability scanner

Installed software

vulExists(webserver, ‘CVE-2006-3747’, httpd)

vulExists(dbServer, 'CVE-2009-2446', mySQL).

… …

10

Page 11: Logic-based, data-driven enterprise network security analysis

US-CERTNVD

Apache1.3.4bug!

Security advisories

vulProperty('CVE-2006-3747', remote, privEscalation).

vulProperty('CVE-2009-2446', remote, privEscalation).

… …

11

Page 12: Logic-based, data-driven enterprise network security analysis

Security expert

Datalog Rules

execCode(Host, PrivilegeLevel) :- vulExists(Host, Program, remote, privilegeEscalation), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel), networkAccess(Host, Protocol, Port).

Linux security behavior;Windows security behavior;Common attack techniques

The rules are completely independent of any site-specific

settings. 12

Page 13: Logic-based, data-driven enterprise network security analysis

Rule for NFS

dmz

corp

webServer

webPagesfileServer

sharedBinaryNFS shell

accessFile(Server, Access, Path) :-

nfsExport(Server, Path, Access, Client),

reachable(Client, Server, nfs, -),

execCode(Client, _Perm).

13

Page 14: Logic-based, data-driven enterprise network security analysis

Rule for Trojan Horse

corp

workStation

webPagesfileServer

Trojan horseprojectPlan

sharedBinary

execCode(H, User) :- accessFile(H, write, Path), fileOwner(H, Path, User).

14

Page 15: Logic-based, data-driven enterprise network security analysis

Deducing new facts

execCode(Host, PrivilegeLevel) :- vulExists(Host, Program, remote, privilegeEscalation), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel), networkAccess(Host, Protocol, Port).

internet

dmzwebServer

Firewall 1

vulExists(webServer, httpd, remote, privilegeEscalation).

serviceRunning(webServer, httpd, tcp, 80, apache).

networkAccess(webServer, tcp, 80).

execCode(attacker, webServer, apache).Oops!

From Vulnerability Scanner & NVD

From Vulnerability Scanner

Derived

15

Page 16: Logic-based, data-driven enterprise network security analysis

Advantages of using Prolog

• Prolog’s goal-oriented evaluation is potentially more efficient.

• Prolog provides more programming flexibility.

Can we evaluate Datalog programs in Prolog?

16

Page 17: Logic-based, data-driven enterprise network security analysis

However…

• Prolog as a programming language cannot be directly used to evaluate Datalog

ancestor(X,Y) :- parent(X,Y).

ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).

parent(bill,mary).

parent(mary,john).

?- ancestor(X,Y).

17

Page 18: Logic-based, data-driven enterprise network security analysis

However…

• Prolog as a programming language cannot be directly used to evaluate Datalog

ancestor(X,Y) :- parent(X,Y).

ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).

parent(bill,mary).

parent(mary,john).

?- ancestor(X,Y).

18

Page 19: Logic-based, data-driven enterprise network security analysis

However…

• Prolog as a programming language cannot be directly used to evaluate Datalog

ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).

ancestor(X,Y) :- parent(X,Y).

parent(bill,mary).

parent(mary,john).

?- ancestor(X,Y).

19

Page 20: Logic-based, data-driven enterprise network security analysis

Z2=john

X=mary

Y=john

Y=john

X=bill

Y=mary

Problem of SLD resolutionancestor(X,Y) :- parent(X,Y).

ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).

parent(bill,mary).

parent(mary,john).

parent(X,Y).

Success

Success

parent(X,Z), ancestor(Z,Y).

ancestor(X, Y).

X=bill

Z=mary

ancestor(mary,Y).

parent(mary,Y).

Success

parent(mary,Z2), ancestor(Z2,Y).

…Failure

…Failure

ancestor(john,Y).

X=mary

Z=john

ancestor(john,Y).

20

Page 21: Logic-based, data-driven enterprise network security analysis

Problem of SLD resolution

ancestor(X, Y).

ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).

ancestor(X,Y) :- parent(X,Y).

parent(bill,mary).

parent(mary,john).

ancestor(Z, Y), parent(X, Z).

ancestor(Z1, Y), parent(Z, Z1), parent(X, Z).

ancestor(Z2, Y), parent(Z1, Z2), parent(Z, Z1), parent(X, Z).

21

Page 22: Logic-based, data-driven enterprise network security analysis

Problem of SLD resolution

• Termination of cyclic Datalog programs not only depends on logical semantics, but also the order of the clauses and subgoals.– This creates problems since in network security

analysis, such cyclic rules are common place.• e.g. after compromising one machine, the attacker can use it as a

stepping stone to compromise another.

– Datalog is a declarative language; thus order should not matter.

– A pure Datalog program shall always terminate due to the bound on the number of tuples.

22

Page 23: Logic-based, data-driven enterprise network security analysis

Bottom-up Evaluation

Semi-naïve Evaluation:

Step(1) (base case)ancestor(bill,mary),ancestor(mary,john)

Step(2)Iteration 1ancestor(bill, john)

Iteration 2No new tuples (“fixpoint”)

ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).

ancestor(X,Y) :- parent(X,Y).

parent(bill,mary).

parent(mary,john).

23

Page 24: Logic-based, data-driven enterprise network security analysis

SLG Resolution

• Goal-oriented evaluation• Predicates can be “tabled”

– A table stores the evaluation results of a goal.– The results can be re-used later, i.e. dynamic

programming.– Entering an active table indicates a cycle.– Fixpoint operation is taken at such tables.

• The XSB system implements SLG resolution– Developed by Stony Brook (http://xsb.sourceforge.net/ ).– Provides full ISO Prolog compatibility.

24

Page 25: Logic-based, data-driven enterprise network security analysis

Z=bill

Y=mary

SLG resolution example

ancestor(X, Y).

ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).

ancestor(X,Y) :- parent(X,Y).

parent(bill,mary).

parent(mary,john).

ancestor(Z, Y), parent(X, Z).

25

generator nodenew table created for ancestor(X,Y)

active noderesolve ancestor(Z,Y) against the results in the table for ancestor(X,Y)

parent(X, bill).

parent(X,Y). X=mary

Y=john

X=bill

Y=mary

Success

Success

Failure

Z=mary

Y=john

parent(X, mary).

X=bill Success

Z=bill

Y=john

parent(X, bill). Failure

Page 26: Logic-based, data-driven enterprise network security analysis

SLG in MulVAL

netAccess(H2, Protocol, Port) :-

execCode(H1, User),

reachable(H1, H2, Protocol, Port).

netAccess(…)

Possible instantiations

table for goal

execCode(…)

Possible instantiations

table for first subgoal

from input tuples

26

Page 27: Logic-based, data-driven enterprise network security analysis

SLG complexity for Datalog

• Total time dominated by the rule that has the maximum number of instantiations– Time for computing one table = Computation of the subgoals + retrieving information from input tuples + matching results in the rules bodies– Time for computing all tables = retrieving information from input tuples + matching results in the rules’ bodies

• See “On the Complexity of Tabled Datalog Programs” http://www.cs.sunysb.edu/~warren/xsbbook/node21.html

27

Page 28: Logic-based, data-driven enterprise network security analysis

MulVAL complexity in SLG

execCode(Attacker, Host, User) :- vulExists(Host, _, Program, remote, privilegeEscalation), networkService(Host, Program, Protocol, Port, User), netAccess(Attacker, Host, Protocol, Port).

Scale with network size

O(N) different instantiations

28

Page 29: Logic-based, data-driven enterprise network security analysis

netAccess(Attacker, H2, Protocol, Port) :-

execCode(Attacker, H1, _),

reachable(H1, H2, Protocol, Port).

MulVAL complexity in SLG

Scale with network size

O(N2) different instantiations

Complexity of MulVAL

29

Page 30: Logic-based, data-driven enterprise network security analysis

Datalog proof generation

• In security analysis, not only do we want to know what attacks could happen, but also we want to know how attacks can happen– Thus, we need more than an yes/no answer for

queries.– We need the proofs for the true queries, which in the

case of security analysis will be attack paths.– We also want to know all possible attack paths; thus

we need exhaustive proof generation.

30

Page 31: Logic-based, data-driven enterprise network security analysis

An obvious approach

31

execCode(Host, PrivilegeLevel) :- vulExists(Host, Program, remote, privilegeEscalation), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel), networkAccess(Host, Protocol, Port).

execCode(Host, PrivilegeLevel, Pf) :- vulExists(Host, Program, remote, privilegeEscalation, Pf1), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel, Pf2), networkAccess(Host, Protocol, Port, Pf3), Pf=(execCode(Host, PrivilegeLevel), [Pf1, Pf2, Pf3]).

This will break the bounded-term property and result in non-termination

for cyclic Datalog programs

Page 32: Logic-based, data-driven enterprise network security analysis

MulVAL Attack-Graph Toolkit

Datalog representation

Machine configuration

Network configuration

Security advisories

XSB reasoning

engine

Datalog P

roof Steps

Grap

h

Bu

ilder Datlog proof

graph

Datalog rules

Ou, Boyer, and McQueen. ACM CCS 2006

Joint work with Idaho National Laboratory

32

Translated rules

Page 33: Logic-based, data-driven enterprise network security analysis

netAccess(H2, Protocol, Port, ProofStep) :-

execCode(H1, User),

reachable(H1, H2, Protocol, Port),

ProofStep= because( ‘multi-hop network access', netAccess(H2, Protocol, Port), [execCode(H1, User), reachable(H1, H2, Protocol, Port)] ).

Stage 1: Record Proof Steps

Proof step

33

Page 34: Logic-based, data-driven enterprise network security analysis

netAccess(fileServer, rpc, 100003)

Stage 2: Build the Exhaustive Proof

because(‘multi-hop network access', netAccess(fileServer, rpc, 100003), [execCode(webServer, apache), reachable(webServer, fileServer, rpc, 100003)])

1multi-hop network access

0

execCode(webServer, apache)

reachable(webServer, fileServer, rpc, 100003)

2

3

34

Page 35: Logic-based, data-driven enterprise network security analysis

Complexity of Proof Building

• O(N2) to complete Datalog evaluation– With proof steps generated

• O(N2) to build a proof graph from proof steps– Need to build O(N2) graph components– Building of one component

• Find the predecessor: table lookup• Find the successors: table lookup

Total time: O(N2), if table lookup is constant time

35

Page 36: Logic-based, data-driven enterprise network security analysis

Logical Attack Graphs

10

2

3

4

5

6

: OR

: AND

: ground fact

execCode(attacker,workStation,root)

Trojan horse installation

accessFile(attacker,workStation, write,/usr/local/share)

NFS semantics

networkService (webServer,httpd,tcp,80,apache)

vulExists(webServer, CAN-2002-0392, httpd, remoteExploit, privEscalation)

netAccess(attacker,webServer, tcp,80)

Remote exploitexecCode(attacker, webServer,apache)

accessFile(attacker,fileServer, write,/export)

NFS shell

36

Page 37: Logic-based, data-driven enterprise network security analysis

Performance and Scalability

0.01

0.1

1

10

100

1000

10000

1 10 100 1000

Number of hosts

CPU time (sec)

Fully connected

Partitioned

Ring

Star

37

Page 38: Logic-based, data-driven enterprise network security analysis

Related Work

• Sheyner’s attack graph tool (CMU)– Based on model-checking

• Cauldron attack graph tool (GMU)– Based on graph-search algorithms

• NetSPA attack graph tool (MIT LL)– Graph-search based on a simple attack model

38

Page 39: Logic-based, data-driven enterprise network security analysis

Advantages of the Logic-programming Approach

• Publishing and incorporation of knowledge/information through well-understood logical semantics

• Efficient and sound analysis by leveraging the reasoning power of well-developed logic-deduction systems

39

Page 40: Logic-based, data-driven enterprise network security analysis

Next Lecture

• How to make use of the proof graph– Optimizing mitigation measures through SAT solving

• Open problems– Uncertainty in reasoning

40