Towards a Theory and Practice of Digital Forensics - … · Towards a Theory and Practice of...
Transcript of Towards a Theory and Practice of Digital Forensics - … · Towards a Theory and Practice of...
Towards a Theory and Practice of
Digital Forensics
Raj RajagopalanHP Labs – Princeton
July 25, 2011
What is the Digital Forensics Problem?
• The Holy Grail of Forensics
– Given the recorded history of an environment in the form of log data from various sources, extract the significant (security) trajectories that are represented in the data.
– If multiple historical interpretations are possible, find the ones that are more likely to be true than others.
State of Security Practice:Check Lists
ears
neck
feet
What is the state of research?
• Computer Forensics research has been going on since at least 1980.
• But some fundamental problems remain essentially open.
• Theory and practice have diverged significantly.
• The practice remains largely manual and ad hoc, without theoretical support.
Recent changes in the security landscape
• Threat is much larger and much more subtle.– More attacks, more sophisticated, more diverse
• Risk is much higher.– We are much more dependent on Information Technology.
• Scale (size, speed, complexity) is much larger.– Existing solutions become unusable.
• Zero-day vulnerabilities always emerge.– We can no longer rely on good maintenance alone.
• However, there is not much good quality data to be had.– Privacy and fears of liability have made data-sharing nearly
impossible.
Intrusion Detection and Forensics
• Detect when and how a malicious event has occurred in a system– One of the oldest problems in system security
– One of the most difficult problems in security
• It is impossible to know everything about the system from recorded events.– Visibility, resources, nescience
– Uncertainty of detection is a certainty!
• A fundamental problem:– Can we know how certain we are about an intrusion?
Birth of Modern Intrusion Analysis: Axelsson’s Contribution
• TISS 2000 “The Base-Rate Fallacy and the Difficulty of Intrusion Detection”
– Effectiveness of intrusion detection?
• To what degree does an IDS detect intrusions P(A/I)?
• How good is it at rejecting false positives, so-called false alarms P(A/not I)?
– Showed that to be effective for common parameter values of events, we would need false alarm rates in our sensors of < 10-5!
Axelsson’s Base-rate fallacy argument:An inflexion point in IDS
• Made the case that all intrusion analysis had to be re-examined
• Why is this so? • We don’t have enough malicious activity as a fraction of
total events, i.e. too many benign events (e.g. normal packets in the network).
• Our vastly expanded data collection is the problem!
• Corroboration from Learning Theory• [Drummond and Holte, Machine Learning: ECML 2005]
Robust learning is imposing in the case of severe class imbalance.
A good theory is lacking
• E.g. Y. Zhai, P. Ning, P. Iyer, and D. Reeves, "Reasoning About Complementary Intrusion Evidence", ACSAC 2004
• Using Bayesian Networks, high-confidence traces can be distinguished from ones that are less certain.
• But, we still have the base rate fallacy problem …• For Bayesian analysis, you need a good prior distribution
(default uniform) otherwise results can be unpredictable.• We don’t have prior distributions for attacks but we do
know they are not uniform• Can we do classification with some other technique?• Not as long as the classes are “ill-conditioned” (or
alternately the SNR is very low)
Where do we go from here?
Come on! It can‘t go
wrong every time...
A First Step
• In spite of the lack of theory or good tools, sys adminsand investigators are coping. How do they do it?
• Can we build a system that mimics what they do (for a start):– An empirical approach to IDS/forensics using existing
reality, and– Making minimal assumptions?
• How do we get around the base rate fallacy problem?– Need to work at a higher level of abstraction than packets
• New goal: – Help a human analyst do a better job rather than replace
him
A Day in the Life of a real SA
• (Observation 1) SA notices an abnormally large spike in campus-network traffic.
• (Observation 2) SA takes netflow dump for that time period, searches for known malicious IP addresses, and identifies that four Trend Micro servers had initiated IRC connections to some known BotNet controllers.
• SA suspects that the four TrendMicro servers had been compromised. • (Observation 3) At a server consoles, SA dumps memory and finds
malicious-looking code modules. • (Observation 4) He also looks at the open TCP socket connections and
notices that the server had been connecting to some other Trend Micro servers on campus through the IRC channel.
• SA concludes that all those servers were compromised with zero-day vulnerability.
• Further investigation reveals 12 compromised machines!
A Real-life EventAnswer: 12 machines have been “certainly” compromised
system
administrator
IDS alerts netflow dump memory dump
1. Abnormally
high traffic
Search for
blacklisted IP
2. Four
TrendMicro
servers
communicating
with known
BotNet
controllers Examine the
machines’
memory
3. Found
seemingly
malicious code
modules
4. Found open
IRC sockets
with other
TrendMicro
servers
Query:Is my machine compromised?
Uncertainty abounds…
• Observation 1 (spike in network traffic): – May be the network is experiencing an attack or this is a
benign event.
• Observation 2 (connections to BotNet controllers): – Indicates a higher degree of likelihood that the identified
servers are compromised. – However, not certain!
• BotNet controller list may have false positives or someone is researching Botnets!
• Observation 3 (suspicious code in memory) – A strong indication that the machine may be under the
control of an attack. – But it is not certain that a suspicious module is malicious.
Uncertainty (Cont’d)
• Observation 4 (compromised server talking on IRC channel)– Not definitive because IRC channels are occasionally used as a
communication channel between servers.
• When we put all the four pieces of evidence together, – We can tell with almost certainty which machines have been
compromised.– Even then we are not completely certain but close! (No one is
completely certain)
• Our central goal: build a tool that help reduce the amount of time that the human SA had to spend in intrusion analysis
High-confidence Conclusions with Evidence
Targeting
subsequent
observations
Mapping
observations to
their semantics
IDS alerts, netflow dump, syslog,
server log …Observations
Internal model
Reasoning Engine
Our Strategy
Capture Uncertainty Qualitatively
Confidence level
Modes
Low Possible p
Moderate Likely l
High Certain c
• Arbitrarily precise quantitative measures not meaningful in practice.
• Roughly matches confidence levels practically used by practitioners.
• Could have more modes if deemed necessary.
17
Observation Correspondence
Mapping observations to Internal condition.
obs(anomalyHighTraffic) int(attackerNetActivity)
what you can see what you want to know
obs(netflowBlackListFilter(H, BlackListedIP))
obs(memoryDumpMaliciousCode(H))
obs(memoryDumpIRCSocket(H1,H2))
p
int(compromised(H))l
int(compromised(H))l
int(exchangeCtlMessage(H1,H2))l
18
Internal Model
• Logical relation among generic internal conditions.
Condition1 Condition2“leads to” relation
i.e. Condition1 may cause Condition2
m1 m2
int(compromised(H1)) int(probeOtherMachine(H1,H2))p c
int(compromised(H1)) int(sendExploit(H1,H2))p c
int(sendExploit(H1,H2)) int(compromised(H2))l p
int(compromised(H1)),int(exchangeCtlMessage(H1,H2))
p c
int(compromised(H2))
ExampleObervation
obs(memoryDumpIRCSocket(172.16.9.20, 172.16.9.1))
Observation Correspondence
int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), l )
obs(memoryDumpIRCSocket(172.16.9.20, 172.16.9.1))
Applying obsMap rule
obs(memoryDumpIRCSocket(H1,H2)) int(exchangeCtlMessage(H1,H2))l
obsMap
20
Example
int(exchangeCtlMsg(172.16.9.20, 172.16.9.1), )
obs(memoryDumpIRCSocket(172.16.9.20, 172.16.9.1))
int(compromised(H1)),int(exchangeCtlMessage(H1,H2))
p c
int(compromised(H2))
l
int(compromised(172.16.9.20), ) l obsMap
intR
21
Implementation: System Architecture
Reasoning Engine
Snort alerts
(convert to Datalog tuples)
Observation Correspondence
User query, e.g.which machines are “certainly” compromised?
High-confidence answers with
evidence
pre-processing
Internal ModelSnort Rule Repository
Done only once
22
Automate Model Building for Snort
alert tcp $EXTERNAL_NET any -> $HTTP_SERVERS $HTTP_PORTS
(msg:"WEB-MISC guestbook.pl access”;uricontent:"/guestbook.pl”;classtype:attempted-recon; sid:1140;)
obsMap(obsRuleId_3615,
obs(snort(’1:1140’, FromHost, ToHost, _Time)),
int(probeOtherMachine(FromHost, ToHost)), ?).
Internal predicate mapped from “classtype”
23
Automate Model Building for Snort
Impact: Information gathering and system integrity compromise.
Possible unauthorized administrative access to the server.
Possible execution of arbitrary code of the attackers choosing
in some cases.
Ease of Attack: Exploits exists
obsMap(obsRuleId_3614,
obs(snort(’1:1140’, FromHost, ToHost, _Time)),
int(compromised(ToHost)), p)
Hints from natural-language description of Snort rules
obsMap(obsRuleId_3615,
obs(snort(’1:1140’, FromHost, ToHost, _Time)),
int(probeOtherMachine(FromHost, ToHost)), ). l ?
24
Coverage
Internal Predicate % of rules
Predicates Handled by the internal model
59%
Suspicious 41%
• Snort has about 9000 rules.
• This is just a base-line and needs to be fine-tuned.
• Would make more sense for the rule writer to define the observation correspondence relation when writing a rule.
25
Methodology of evaluation
• Keep the model unchanged and apply the tool to different data sets
– Treasure Hunt data set from UCSB
– Honeypot data set from Purdue
26
Experiment on Treasure Hunt data (UCSB)
• Data collected during a graduate-level course exercise
• Data set contains multi-stage attacks as in real world scenario
• A large variety of monitoring data
27
Some Results| ?- show_trace(int(compromised(H), c)).
int(compromised(’192.168.10.90’),c) strengthenedPf
int(compromised(’192.168.10.90’),l) intRule_1
int(probeOtherMachine(’192.168.10.90’,’192.168.70.49’),l) obsRulePre_1
obs(snort(’122:1’,’192.168.10.90’,’192.168.70.49’,_h272))
int(compromised(’192.168.10.90’),l) intRule_3
int(sendExploit(’128.111.49.46’,’192.168.10.90’),c) obsRuleId_3749
obs(snort(’1:1807’,’128.111.49.46’,’192.168.10.90’,_h336))
An exploit was sent to
192.168.10.90
A probe was sent from
192.168.10.90
192.168.10.90 was certainly
compromised!
28
The Honeypot Data Set (Purdue)
• 68 GB TCP dump collected over a period of two months at a Honeypot set up at Purdue University.
• Used the TCP dump to generate Snort alerts
Alerts Number of Alerts
Number of of the alerts created from the TCP dumps 637,516
Number of alerts after removing redundant alerts 369,468
Number of alerts after removing Snort alerts “122:27” and “122:3”
2,117
29
Honeypot Analysis Result Statistics
A 100 200 400 600 800 1000 1200 1600 2117
B 26 42 56 67 77 79 91 107 115
A – Number of Input Alerts
B – Number of Hosts deemed “certainly” Compromised
30
Scalability of Reasoning Engine
Logarithmic Scale
Execution time is quadratic to the number of alerts.
31
Building a formal basis for uncertainty
• Treat a suspected attack as a hypothesis that may or may not be valid
• Quantify the uncertainty in the hypotheses ascribed to IDS alerts by correlating the observations that trigger the alert.
• A true successful attack will likely have multiple pieces of corroborating evidence, thus increasing the certainty of the attack hypothesis.
• Given a list of intrusion hypotheses sorted by confidence a human decides what merits further investigation.
A fundamental problem in uncertainty
• If we toss a coin with an unknown bias, what is the probability of Heads?
• Standard probability theory says 50% for Head and 50% for Tail by the principle of indifference.
• What we actually intend is 100% to the set {Head, Tail}, meaning “either Head or Tail”.
• Dempster-Schafer theory of belief functions allows us to do that.
• DS allows for three kinds of answers: {Yes,
• No, Don’t know}.
• In Intrusion analysis, the “Don’t know” is crucial.
The Dempster Shafer Theory
• DS Theory generalizes probabilities by assigning weights to (some) sets of events
(the sets may not be disjoint) in an event space.
• The weight assigned to a set is distributed in some unknown way among its
members.
• When the sets are singletons, DS belief becomes probability.
• To start with we need a frame of discernment , a set of disjoint hypotheses of
interest.
• In general, contains many several hypotheses but in our analysis we are usually
interested in a simple set such as {attack, no-attack}.
• DS Theory requires a basic probability assignment, (bpa) function m() and a belief
function Bel() both defined on sets of hypotheses.
• The bpa measures the uncertainty in a set whereas a belief is a measure of our
confidence in any one of the hypotheses in the set.
DS Theory Basics
The bpa function m measures the uncertainty in knowledge and the belief function measures confidence.
Rules of Combination
• Dempster defined a rule of combining belief from independent sources – multiply m
• Intersections in the basis of evidence between the sources are factored out in the formula.
• But security sensors (e.g. alert rules in an IDS system) are not always independent.
• How do we combine evidence i.e. define m
when the sensors are correlated?
Combining uncertainty in DS (independent case)
Applying DS to Intrusion Analysis
Starting from bpa on alerts (1—5), we need to arrive a belief value for hypothesis (9).But the alerts are not independent!
Mapping discrete certainty tags to numbers
• Unlikely 0.01
• Possible 0.10
• Likely 0.50
• Probable 0.80
• These are ad hoc assignments!
Combination formulas for correlated evidence
How to calculate the weighting factors?
• For soundness we want the combination formula to coincide with the probabilistic version when possible , i.e. for appropriate definitions of events
(h1, h2) == Pr (w1, w2)
• But uniqueness of may not be guaranteed. There may be many “correct” rules of combination.
• Our combination rule is sound and based on an analysis of Snort alerts.
Evaluation Methodology
• We evaluate our prototype on the KSU production network.
• In addition, we also tested our prototype on two additional data sets:
– Lincoln Lab DARPA intrusion detection evaluation data set.
– Data set from the Honeynet Project.
• Both of these data sets include “truth files” which we can use to compare against the ranking provided by our DS algorithms.
• The test would be whether our algorithm computes high belief values for hypotheses that are present in the truth file and low belief values for those that are not.
• We also compare our results with the results obtained by using standard DS theory that assume independence among alerts.
• All our analysis is on Snort alerts.
Experimental Results (LL)
Experimental Results (Production Network)
• In real data (eg a production network) we cannot get ground truth.
• Humans anecdotally validated that the low belief hypotheses were spurious.
Wrap up
• Real-time and forensic intrusion analysis is very much in its infancy.
• The hard problems are intrinsically related to uncertainty of information and lack of good prior distributions.
• We have proposed a new way to deal with uncertainty that is sound and addresses some of these issues.
• But much remains to be done.
Questions?
Our Contributions and work in progress
• An empirical approach to modeling uncertainty in intrusion analysis
• A theoretical basis for the empirical approach that avoids the problems of Bayesian analysis
• A human-friendly visualization front-end to explore and display incidents
• Based on watching systems administrators (SA) solve the problem
•