Logic-based, data-driven enterprise network security analysis
description
Transcript of Logic-based, data-driven enterprise network security analysis
Logic-based, data-driven enterprise network security analysis
Xinming (Simon) OuAssistant Professor
CIS Department
Kansas State University
COS 598D: Formal Methods in Networking
Princeton University
March 08, 2010
1
Self Introduction
• Brief Bio– PhD, Princeton University, 2005
– Post-doc, Purdue CERIAS, Idaho National Laboratory, 2006
– Assistant Professor, Kansas State University, 2006-now
• Research Interests– Computer and network security, especially on formal and quantitative
analysis
– Programming languages, formal methods
• Research Group– Argus: http://people.cis.ksu.edu/~xou/argus/
2
Overview of the two lectures
• Lecture One– Datalog model for network attacks– SLG resolution for Datalog evaluation– Exhaustive proof generation for Datalog
• Lecture Two– Formulating security hardening problem as a SAT
solving problem– Applying MinCostSAT to achieve optimal security
configuration– Open research problems
3
Cyber Defender’s Life
Security advisories
Apache1.3.4bug!
Vulnerability reports
Network configuration
IDS alertsUsers and data assets
Reasoning System
Automated Situation Awareness
4
Multi-step Attacks
Internet
Demilitarized zone (DMZ)
Corporation
webServer
workStationwebPages
fileServer
Firewall 2
buffer
overrun
Trojan horsesharedBinaryNFS shell
Firewall 1
5
Two Questions
• Are there potential attack paths in the system?– How can they happen?– How can they be addressed in an optimal way?
• Are there attacks that are going on/have succeeded in the system?– How do you know?– How to counter the attack?
What we are going to focus on
6
MulVAL
Datalog Rules from Security Experts
Vulnerability Scanner
Analyzer
Could root be compromised on any of
the machines?Ou, Govindavajhala, and Appel. Usenix Security 2005
Answers
Network Analyzer
Vulnerability Information (e.g.
NIST NVD)
Network reachability information
Vulnerability definition (e.g. OVAL, Nessus
Scripting Language)
User information
Vulnerability Scanner
7
Network config(firewall analyzer)
Host access-control lists
reachable(internet, webServer, tcp, 80)reachable(webServer, fileserver, nfs, -)
.
.
.
8
Host config scanner
File permissions
fileOwner(webServer, /bin/apache, root)
fileAttr(webServer, /bin/apache, r,w,x,r,0,0,r,0,0)
9
Host-based vulnerability scanner
Installed software
vulExists(webserver, ‘CVE-2006-3747’, httpd)
vulExists(dbServer, 'CVE-2009-2446', mySQL).
… …
10
US-CERTNVD
Apache1.3.4bug!
Security advisories
vulProperty('CVE-2006-3747', remote, privEscalation).
vulProperty('CVE-2009-2446', remote, privEscalation).
… …
11
Security expert
Datalog Rules
execCode(Host, PrivilegeLevel) :- vulExists(Host, Program, remote, privilegeEscalation), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel), networkAccess(Host, Protocol, Port).
Linux security behavior;Windows security behavior;Common attack techniques
The rules are completely independent of any site-specific
settings. 12
Rule for NFS
dmz
corp
webServer
webPagesfileServer
sharedBinaryNFS shell
accessFile(Server, Access, Path) :-
nfsExport(Server, Path, Access, Client),
reachable(Client, Server, nfs, -),
execCode(Client, _Perm).
13
Rule for Trojan Horse
corp
workStation
webPagesfileServer
Trojan horseprojectPlan
sharedBinary
execCode(H, User) :- accessFile(H, write, Path), fileOwner(H, Path, User).
14
Deducing new facts
execCode(Host, PrivilegeLevel) :- vulExists(Host, Program, remote, privilegeEscalation), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel), networkAccess(Host, Protocol, Port).
internet
dmzwebServer
Firewall 1
vulExists(webServer, httpd, remote, privilegeEscalation).
serviceRunning(webServer, httpd, tcp, 80, apache).
networkAccess(webServer, tcp, 80).
execCode(attacker, webServer, apache).Oops!
From Vulnerability Scanner & NVD
From Vulnerability Scanner
Derived
15
Advantages of using Prolog
• Prolog’s goal-oriented evaluation is potentially more efficient.
• Prolog provides more programming flexibility.
Can we evaluate Datalog programs in Prolog?
16
However…
• Prolog as a programming language cannot be directly used to evaluate Datalog
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).
parent(bill,mary).
parent(mary,john).
?- ancestor(X,Y).
17
However…
• Prolog as a programming language cannot be directly used to evaluate Datalog
ancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).
parent(bill,mary).
parent(mary,john).
?- ancestor(X,Y).
18
However…
• Prolog as a programming language cannot be directly used to evaluate Datalog
ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).
ancestor(X,Y) :- parent(X,Y).
parent(bill,mary).
parent(mary,john).
?- ancestor(X,Y).
19
Z2=john
X=mary
Y=john
Y=john
X=bill
Y=mary
Problem of SLD resolutionancestor(X,Y) :- parent(X,Y).
ancestor(X,Y) :- parent(X,Z), ancestor(Z,Y).
parent(bill,mary).
parent(mary,john).
parent(X,Y).
Success
Success
parent(X,Z), ancestor(Z,Y).
ancestor(X, Y).
X=bill
Z=mary
ancestor(mary,Y).
parent(mary,Y).
Success
parent(mary,Z2), ancestor(Z2,Y).
…Failure
…Failure
ancestor(john,Y).
X=mary
Z=john
ancestor(john,Y).
20
Problem of SLD resolution
ancestor(X, Y).
ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).
ancestor(X,Y) :- parent(X,Y).
parent(bill,mary).
parent(mary,john).
ancestor(Z, Y), parent(X, Z).
ancestor(Z1, Y), parent(Z, Z1), parent(X, Z).
ancestor(Z2, Y), parent(Z1, Z2), parent(Z, Z1), parent(X, Z).
…
21
Problem of SLD resolution
• Termination of cyclic Datalog programs not only depends on logical semantics, but also the order of the clauses and subgoals.– This creates problems since in network security
analysis, such cyclic rules are common place.• e.g. after compromising one machine, the attacker can use it as a
stepping stone to compromise another.
– Datalog is a declarative language; thus order should not matter.
– A pure Datalog program shall always terminate due to the bound on the number of tuples.
22
Bottom-up Evaluation
Semi-naïve Evaluation:
Step(1) (base case)ancestor(bill,mary),ancestor(mary,john)
Step(2)Iteration 1ancestor(bill, john)
Iteration 2No new tuples (“fixpoint”)
ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).
ancestor(X,Y) :- parent(X,Y).
parent(bill,mary).
parent(mary,john).
23
SLG Resolution
• Goal-oriented evaluation• Predicates can be “tabled”
– A table stores the evaluation results of a goal.– The results can be re-used later, i.e. dynamic
programming.– Entering an active table indicates a cycle.– Fixpoint operation is taken at such tables.
• The XSB system implements SLG resolution– Developed by Stony Brook (http://xsb.sourceforge.net/ ).– Provides full ISO Prolog compatibility.
24
Z=bill
Y=mary
SLG resolution example
ancestor(X, Y).
ancestor(X,Y) :- ancestor(Z,Y), parent(X,Z).
ancestor(X,Y) :- parent(X,Y).
parent(bill,mary).
parent(mary,john).
ancestor(Z, Y), parent(X, Z).
25
generator nodenew table created for ancestor(X,Y)
active noderesolve ancestor(Z,Y) against the results in the table for ancestor(X,Y)
parent(X, bill).
parent(X,Y). X=mary
Y=john
X=bill
Y=mary
Success
Success
Failure
Z=mary
Y=john
parent(X, mary).
X=bill Success
Z=bill
Y=john
parent(X, bill). Failure
SLG in MulVAL
netAccess(H2, Protocol, Port) :-
execCode(H1, User),
reachable(H1, H2, Protocol, Port).
netAccess(…)
Possible instantiations
table for goal
execCode(…)
Possible instantiations
table for first subgoal
from input tuples
26
SLG complexity for Datalog
• Total time dominated by the rule that has the maximum number of instantiations– Time for computing one table = Computation of the subgoals + retrieving information from input tuples + matching results in the rules bodies– Time for computing all tables = retrieving information from input tuples + matching results in the rules’ bodies
• See “On the Complexity of Tabled Datalog Programs” http://www.cs.sunysb.edu/~warren/xsbbook/node21.html
27
MulVAL complexity in SLG
execCode(Attacker, Host, User) :- vulExists(Host, _, Program, remote, privilegeEscalation), networkService(Host, Program, Protocol, Port, User), netAccess(Attacker, Host, Protocol, Port).
Scale with network size
O(N) different instantiations
28
netAccess(Attacker, H2, Protocol, Port) :-
execCode(Attacker, H1, _),
reachable(H1, H2, Protocol, Port).
MulVAL complexity in SLG
Scale with network size
O(N2) different instantiations
Complexity of MulVAL
29
Datalog proof generation
• In security analysis, not only do we want to know what attacks could happen, but also we want to know how attacks can happen– Thus, we need more than an yes/no answer for
queries.– We need the proofs for the true queries, which in the
case of security analysis will be attack paths.– We also want to know all possible attack paths; thus
we need exhaustive proof generation.
30
An obvious approach
31
execCode(Host, PrivilegeLevel) :- vulExists(Host, Program, remote, privilegeEscalation), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel), networkAccess(Host, Protocol, Port).
execCode(Host, PrivilegeLevel, Pf) :- vulExists(Host, Program, remote, privilegeEscalation, Pf1), serviceRunning(Host, Program, Protocol, Port, PrivilegeLevel, Pf2), networkAccess(Host, Protocol, Port, Pf3), Pf=(execCode(Host, PrivilegeLevel), [Pf1, Pf2, Pf3]).
This will break the bounded-term property and result in non-termination
for cyclic Datalog programs
MulVAL Attack-Graph Toolkit
Datalog representation
Machine configuration
Network configuration
Security advisories
XSB reasoning
engine
Datalog P
roof Steps
Grap
h
Bu
ilder Datlog proof
graph
Datalog rules
Ou, Boyer, and McQueen. ACM CCS 2006
Joint work with Idaho National Laboratory
32
Translated rules
netAccess(H2, Protocol, Port, ProofStep) :-
execCode(H1, User),
reachable(H1, H2, Protocol, Port),
ProofStep= because( ‘multi-hop network access', netAccess(H2, Protocol, Port), [execCode(H1, User), reachable(H1, H2, Protocol, Port)] ).
Stage 1: Record Proof Steps
Proof step
33
netAccess(fileServer, rpc, 100003)
Stage 2: Build the Exhaustive Proof
because(‘multi-hop network access', netAccess(fileServer, rpc, 100003), [execCode(webServer, apache), reachable(webServer, fileServer, rpc, 100003)])
1multi-hop network access
0
execCode(webServer, apache)
reachable(webServer, fileServer, rpc, 100003)
2
3
34
Complexity of Proof Building
• O(N2) to complete Datalog evaluation– With proof steps generated
• O(N2) to build a proof graph from proof steps– Need to build O(N2) graph components– Building of one component
• Find the predecessor: table lookup• Find the successors: table lookup
Total time: O(N2), if table lookup is constant time
35
Logical Attack Graphs
10
2
3
4
5
6
: OR
: AND
: ground fact
execCode(attacker,workStation,root)
Trojan horse installation
accessFile(attacker,workStation, write,/usr/local/share)
NFS semantics
networkService (webServer,httpd,tcp,80,apache)
vulExists(webServer, CAN-2002-0392, httpd, remoteExploit, privEscalation)
netAccess(attacker,webServer, tcp,80)
Remote exploitexecCode(attacker, webServer,apache)
accessFile(attacker,fileServer, write,/export)
NFS shell
36
Performance and Scalability
0.01
0.1
1
10
100
1000
10000
1 10 100 1000
Number of hosts
CPU time (sec)
Fully connected
Partitioned
Ring
Star
37
Related Work
• Sheyner’s attack graph tool (CMU)– Based on model-checking
• Cauldron attack graph tool (GMU)– Based on graph-search algorithms
• NetSPA attack graph tool (MIT LL)– Graph-search based on a simple attack model
38
Advantages of the Logic-programming Approach
• Publishing and incorporation of knowledge/information through well-understood logical semantics
• Efficient and sound analysis by leveraging the reasoning power of well-developed logic-deduction systems
39
Next Lecture
• How to make use of the proof graph– Optimizing mitigation measures through SAT solving
• Open problems– Uncertainty in reasoning
40