Empirical Analysis and Statistical Modelling of Attack ...

17
Empirical Analysis and Statistical Modelling of Attack Processes based on Honeypots M. Kaâniche 1 , E. Alata 1 , V.Nicomette 1 , Y. Deswarte 1 , M. Dacier 2 Workshop on Empirical Evaluation of Dependability and Security (WEEDS-DSN06), Philadelphia, PA, June 28, 2006 LAAS-CNRS 1 , Eurecom 2 [email protected] ACI “Sécurité & Informatique” http://acisi.loria.fr

Transcript of Empirical Analysis and Statistical Modelling of Attack ...

Empirical Analysis and StatisticalModelling of Attack Processes

based on Honeypots

M. Kaâniche1, E. Alata1, V.Nicomette1, Y. Deswarte1, M. Dacier2

Workshop on Empirical Evaluation of Dependability and Security(WEEDS-DSN06), Philadelphia, PA, June 28, 2006

LAAS-CNRS1, Eurecom2

[email protected]

ACI “Sécurité & Informatique”http://acisi.loria.fr

Outline

Context and motivation

Data collection

Attack processes modeling

Conclusion and open issues

Context

Need for real data and methodologies to learnabout malicious activities on the Internet andanalyze their impact on systems security

Several initiatives for monitoring maliciousthreats do exist■ CAIDA■ Motion Sensor project■ Dshield■ CADHo

CADHo Objectives

Build and deploy on the Internet a distributedplatform of identically configured low-interactionhoneypots in a large number of diverse locations

Carry out various analyses based on the collecteddata to better understand threats and buildmodels to characterize attack processes

Analyze and model the behavior of maliciousattackers once they manage to get access andcompromise a target■High-interaction honeypots

Leurré.com data collection platform

Mach0Windows 98

Workstation

Mach1Windows NT (ftp

+ web server)

Mach2Redhat 7.3 (ftp

server)

Virtual

Switch

Internet

Observer (tcpdump)

Reverse

FIrewall

35 platforms, 25 countries, 5 continents

Data analysis

Data collection since 2004 80 000 different IP addresses from 91 different countries

Information extracted from the logs■ Raw packets (entire frames including payloads)■ IP address of the attacking machine■ Time of the attack and duration■ Targeted virtual machines and ports■ Geographic location of the attacking machine (Maxmind, NetGeo)■ Os of the attacking machine (p0f, ettercap, disco)

Automatic data analyses have been developed to extractuseful trends and identify hidden phenomena from thedata■ Clustering techniques, Time series analysis, etc.■ Publications available at: www.leurrecom.org/paper.htm

Modeling Objectives

Identify probability distributions that best characterizeattack occurrence and attack propagation processes

Model the time relationships between attacks comingfrom different sources (or to different destinations)

Analyze whether data collected from different platformsexhibit similar or different malicious attack activities

Predict occurrence of new attacks on a given platformbased on past observations on this platform and otherplatforms

Estimate impact of attacks on security of target systems■ High-interaction honeypots to analyze attackers behavior

once they compromise and get access to a target

Examples

Analysis of the time evolution of the number of attackstaking into account the geographic location of attackingmachines

Characterization and statistical modeling of timesbetween attacks

Analysis of the propagation of attacks among thehoneypot platforms

Data■ 320 days from January 1st 2004 to April 17, 2005■ 14 honeypot platforms (the most active ones)■ 816475 observed attacks

The number of attacks per unit oftime, considering a single platform orall platforms, can be described as alinear regression of the attacksoriginating from a single country only

Y(t) = αj Xj(t) + βj0.94438.0325.93UK

0.94759.15.13USA

0.931555.6744.57Russia

R2βjαj

Attack occurrence and geographic distrib.

“Times between attacks” analysis

An attack is associated to an IP address■ occurrence time associated to the first time a

packet is received from the corresponding address

ti = time between attacks i and (i-1)

47859162156422309062079549#IP

515802249174626814894285890#ti

P23P20P9P6P5

Number of attacks per IP address

“Times between attacks” distribution

0.000

0.005

0.010

0.015

0.020

0.025

1 31 61 91 121 151 181 211 241 271Time between attacks

pd

f

Pa = 0.0115k = 0.1183λ = 0.1364/sec.

Mixture (Pareto, Exp.)

Data

Exponential

!

pdf (t) = Pak

(t +1)k+1

+ (1" Pa )#e"#t

Best fit provided by a mixture distribution

Platform 6

Platform 20 Platform 23

Platform 5 Platform 9

“Times between attacks” distribution

0.00

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

1 31 61 91 121 151 181 211 241 271Time between attacks

pd

f

DataMixture (Pareto, Exp.)

Exponential

Pa = 0.0019k = 0.1668λ = 0.276/sec.

0.00

0.00

0.00

0.01

0.01

0.01

0.01

0.01

0.02

1 31 61 91 121 151 181 211 241 271

Time between attacks

pd

f

Data

Mixture (Pareto, Exp.)

Exponential

Pa = 0.0144k = 0.0183λ = 0.0136/sec.

0.00

0.01

0.02

0.03

0.04

0.05

0.06

1 31 61 91 121 151 181 211 241 271Time between attacks

pd

f

Data

Mixture (Pareto, Exp.)

Exponential

Pa = 0.0031k = 0.1240λ = 0.275/sec.

Time (sec.)0.00

0.01

0.01

0.02

0.02

0.03

0.03

1 31 61 91 121 151 181 211 241 271Time between attacks

pd

f

Pa = 0.0051k = 0.173λ = 0.121/sec.

DataMixture (Pareto, Exp.)

Exponential

Propagation of attacks

A Propagation is assumed to occur when an IPaddress of an attacking machine observed at agiven platform is observed at another platform

Propagation graph■ Nodes identify the platforms■ Transitions identify propagations

A propagation between Pi and Pj occurs from an IPaddress when the next occurrence of this address isobserved on Pj after visiting Pi

■ Probabilities are associated to the transitions toreflect their likelihood of occurrence

Propagation graph

Issues under investigation■ Focus on specific attacks (largest clusters, worms, etc.)■ Timing characteristics and probability distributions

P20

P6

P9

P5

P23

96.1%

0.9%

15.1%

43.2%

0.6%

2.7%

29%

4.1%1.35%

8.1%

12.6%

54.1%

1.4%

1.37%

15.4%

95.5%

1%

0.6%

1.1%

11.3%

59%

3.7%

30.3%

4.3%

96.1%

95.5%

29% 59%

15.1%

15.4%

11.3%

Summary and Conclusions

Preliminary models to characterize attack processesobserved on low-interaction honeypots

Several open issues■ Predictive models that can be used to support decision

making during design and operation stages■ How to assess the impact of attacks on the security of

target systems?

High-interaction honeypots■ Analyze attackers behavior once they get access to a

target■ Validate a theoretical model for quantitative

evaluation of security developed by LAAS in the 90’s Privilege graph to describe vulnerabilities and attack scenarios METF “Mean Effort To security Failure” to quantify security Assumptions about intruders behaviors