Empirical Analysis and Statistical Modelling of Attack ...
Transcript of Empirical Analysis and Statistical Modelling of Attack ...
Empirical Analysis and StatisticalModelling of Attack Processes
based on Honeypots
M. Kaâniche1, E. Alata1, V.Nicomette1, Y. Deswarte1, M. Dacier2
Workshop on Empirical Evaluation of Dependability and Security(WEEDS-DSN06), Philadelphia, PA, June 28, 2006
LAAS-CNRS1, Eurecom2
ACI “Sécurité & Informatique”http://acisi.loria.fr
Context
Need for real data and methodologies to learnabout malicious activities on the Internet andanalyze their impact on systems security
Several initiatives for monitoring maliciousthreats do exist■ CAIDA■ Motion Sensor project■ Dshield■ CADHo
CADHo Objectives
Build and deploy on the Internet a distributedplatform of identically configured low-interactionhoneypots in a large number of diverse locations
Carry out various analyses based on the collecteddata to better understand threats and buildmodels to characterize attack processes
Analyze and model the behavior of maliciousattackers once they manage to get access andcompromise a target■High-interaction honeypots
Leurré.com data collection platform
Mach0Windows 98
Workstation
Mach1Windows NT (ftp
+ web server)
Mach2Redhat 7.3 (ftp
server)
Virtual
Switch
Internet
Observer (tcpdump)
Reverse
FIrewall
Data analysis
Data collection since 2004 80 000 different IP addresses from 91 different countries
Information extracted from the logs■ Raw packets (entire frames including payloads)■ IP address of the attacking machine■ Time of the attack and duration■ Targeted virtual machines and ports■ Geographic location of the attacking machine (Maxmind, NetGeo)■ Os of the attacking machine (p0f, ettercap, disco)
Automatic data analyses have been developed to extractuseful trends and identify hidden phenomena from thedata■ Clustering techniques, Time series analysis, etc.■ Publications available at: www.leurrecom.org/paper.htm
Modeling Objectives
Identify probability distributions that best characterizeattack occurrence and attack propagation processes
Model the time relationships between attacks comingfrom different sources (or to different destinations)
Analyze whether data collected from different platformsexhibit similar or different malicious attack activities
Predict occurrence of new attacks on a given platformbased on past observations on this platform and otherplatforms
Estimate impact of attacks on security of target systems■ High-interaction honeypots to analyze attackers behavior
once they compromise and get access to a target
Examples
Analysis of the time evolution of the number of attackstaking into account the geographic location of attackingmachines
Characterization and statistical modeling of timesbetween attacks
Analysis of the propagation of attacks among thehoneypot platforms
Data■ 320 days from January 1st 2004 to April 17, 2005■ 14 honeypot platforms (the most active ones)■ 816475 observed attacks
The number of attacks per unit oftime, considering a single platform orall platforms, can be described as alinear regression of the attacksoriginating from a single country only
Y(t) = αj Xj(t) + βj0.94438.0325.93UK
0.94759.15.13USA
0.931555.6744.57Russia
R2βjαj
Attack occurrence and geographic distrib.
“Times between attacks” analysis
An attack is associated to an IP address■ occurrence time associated to the first time a
packet is received from the corresponding address
ti = time between attacks i and (i-1)
47859162156422309062079549#IP
515802249174626814894285890#ti
P23P20P9P6P5
“Times between attacks” distribution
0.000
0.005
0.010
0.015
0.020
0.025
1 31 61 91 121 151 181 211 241 271Time between attacks
pd
f
Pa = 0.0115k = 0.1183λ = 0.1364/sec.
Mixture (Pareto, Exp.)
Data
Exponential
!
pdf (t) = Pak
(t +1)k+1
+ (1" Pa )#e"#t
Best fit provided by a mixture distribution
Platform 6
Platform 20 Platform 23
Platform 5 Platform 9
“Times between attacks” distribution
0.00
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
1 31 61 91 121 151 181 211 241 271Time between attacks
pd
f
DataMixture (Pareto, Exp.)
Exponential
Pa = 0.0019k = 0.1668λ = 0.276/sec.
0.00
0.00
0.00
0.01
0.01
0.01
0.01
0.01
0.02
1 31 61 91 121 151 181 211 241 271
Time between attacks
pd
f
Data
Mixture (Pareto, Exp.)
Exponential
Pa = 0.0144k = 0.0183λ = 0.0136/sec.
0.00
0.01
0.02
0.03
0.04
0.05
0.06
1 31 61 91 121 151 181 211 241 271Time between attacks
pd
f
Data
Mixture (Pareto, Exp.)
Exponential
Pa = 0.0031k = 0.1240λ = 0.275/sec.
Time (sec.)0.00
0.01
0.01
0.02
0.02
0.03
0.03
1 31 61 91 121 151 181 211 241 271Time between attacks
pd
f
Pa = 0.0051k = 0.173λ = 0.121/sec.
DataMixture (Pareto, Exp.)
Exponential
Propagation of attacks
A Propagation is assumed to occur when an IPaddress of an attacking machine observed at agiven platform is observed at another platform
Propagation graph■ Nodes identify the platforms■ Transitions identify propagations
A propagation between Pi and Pj occurs from an IPaddress when the next occurrence of this address isobserved on Pj after visiting Pi
■ Probabilities are associated to the transitions toreflect their likelihood of occurrence
Propagation graph
Issues under investigation■ Focus on specific attacks (largest clusters, worms, etc.)■ Timing characteristics and probability distributions
P20
P6
P9
P5
P23
96.1%
0.9%
15.1%
43.2%
0.6%
2.7%
29%
4.1%1.35%
8.1%
12.6%
54.1%
1.4%
1.37%
15.4%
95.5%
1%
0.6%
1.1%
11.3%
59%
3.7%
30.3%
4.3%
96.1%
95.5%
29% 59%
15.1%
15.4%
11.3%
Summary and Conclusions
Preliminary models to characterize attack processesobserved on low-interaction honeypots
Several open issues■ Predictive models that can be used to support decision
making during design and operation stages■ How to assess the impact of attacks on the security of
target systems?
High-interaction honeypots■ Analyze attackers behavior once they get access to a
target■ Validate a theoretical model for quantitative
evaluation of security developed by LAAS in the 90’s Privilege graph to describe vulnerabilities and attack scenarios METF “Mean Effort To security Failure” to quantify security Assumptions about intruders behaviors