Battista Biggio @ ICML 2015 - "Is Feature Selection Secure against Training Data Poisoning?"
Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering
-
Upload
pra-group-university-of-cagliari -
Category
Education
-
view
201 -
download
2
description
Transcript of Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering
![Page 1: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/1.jpg)
Pa#ern Recogni-on and Applica-ons Lab
University
of Cagliari, Italy
Department of Electrical and Electronic
Engineering
Poisoning Behavioral Malware Clustering
Ba#sta Biggio1, Konrad Rieck2, Davide Ariu1, Chris-an Wressnegger2, Igino Corona1, Giorgio Giacinto1, and Fabio Roli1
(1) University of Cagliari (IT)
(2) University of GoeLngen (GE)
Sco#sdale, Arizona, US, Nov., 7 2014 AISec 2014
![Page 2: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/2.jpg)
http://pralab.diee.unica.it
• Huge number of devices, services and apps on the Internet – Vulnerabilities in code, services, apps, etc.
• Attacks through malicious software (malware) – Botnets, spam, identity theft / stolen credit card numbers
• Manual analysis and crafting of signatures costly – Need for automated / assisted detection (and rule generation) – Machine learning-based defenses (data clustering)
Threats and Attacks in Computer Security
2
Evasion: malware families / variants +65% new malware variants from 2012 to 2013 Mobile Adware & Malw. Analysis, Symantec, 2014
Detection: antivirus systems Rule-based systems
![Page 3: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/3.jpg)
http://pralab.diee.unica.it
Data Clustering for Computer Security
• Goal: clustering of malware families to identify common characteristics and design suitable countermeasures • e.g., antivirus rules / signatures
3
x x x x x x x
x x x
x x
x x x x x
x1 x2 ... xd
feature extraction (e.g., executed
instructions, system calls, etc.)
clustering of malware families
(e.g., similar program behavior)
data analysis / countermeasure design (e.g., signature generation)
for each cluster if … then … else …
data collection (honeypots)
Malware samples
![Page 4: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/4.jpg)
http://pralab.diee.unica.it
Is Data Clustering Secure?
• Attackers can poison input data to subvert malware clustering
4
x x x
x x x x
x x
x
x
x x
x x x x
x1 x2 ... xd
Malware samples designed to subvert clustering
… is significantly compromised
… becomes useless (too many false alarms, low detection rate)
(1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
feature extraction (e.g., executed
instructions, system calls, etc.)
data collection (honeypots)
clustering of malware families
(e.g., similar program behavior)
data analysis / countermeasure design (e.g., signature generation)
for each cluster if … then … else …
![Page 5: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/5.jpg)
http://pralab.diee.unica.it
Is Data Clustering Secure?
• Our previous work (1,2): – Framework for security evaluation of clustering algorithms – Formalization of poisoning attacks (optimization) against
single- and complete-linkage hierarchical clustering
• In this work we focus on a realistic application example on
5
Poisoning a,acks against a behavioral malware clustering approach (3)
(1) B. Biggio et al.. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering. In S+SSPR 2014 (3) K. Rieck et al.. Automatic analysis of malware behavior using machine learning. JCS 2011
Malheur h,p://www.mlsec.org/malheur/
![Page 6: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/6.jpg)
http://pralab.diee.unica.it
Poisoning Attacks
• Goal: to maximally compromise the clustering output on D • Capability: adding m attack samples • Knowledge: perfect / worst-case attack
• Attack strategy:
6
maxA
dc Y, !Y (A)( ), A= ai{ }i=1
m
Distance between the clustering in the absence of attack and that under attack
!Y = fD (D∪A)
Attack samples A
x x x
x x
x x
x
x
x x x x
x x
Y = f (D)
x x x
x x
x x
x
x
x x x x
Clustering on untainted data D
(1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
![Page 7: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/7.jpg)
http://pralab.diee.unica.it
Poisoning Attacks
7
This distance counts how many pairs of samples have been clustered together in one clustering and not in the other, and vice-versa
dc Y, !Y( ) = YY T − !Y !Y T
F, Y =
1 0 00 0 10 0 11 0 00 1 0
#
$
%%%%%%
&
'
((((((
, YY T =
1 0 0 1 00 1 1 0 00 1 1 0 01 0 0 1 00 0 0 0 1
#
$
%%%%%%
&
'
((((((
For a given clustering: Sample 1
…
Sample 5
maxA
dc Y, !Y (A)( ), A= ai{ }i=1
m
(1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
![Page 8: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/8.jpg)
http://pralab.diee.unica.it
Single-Linkage Hierarchical Clustering
• Bottom-up agglomerative clustering – each point is initially considered as a cluster – closest clusters are iteratively merged
• Linkage criterion to define distance between clusters – single-linkage criterion
• Clustering output is a hierarchy of clusterings
– Criterion needed to select a given clustering (e.g., number of clusters) – Cutoff threshold on the maximum intra-cluster distance
8
x dist(Ci,Cj ) = min
a∈Ci , b∈Cj
d(a,b) x x x x
x x
x
(1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
![Page 9: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/9.jpg)
http://pralab.diee.unica.it
Poisoning Single-Linkage Clustering
• Attack strategy: • Heuristic-based solutions
– Greedy approach: adding one attack sample at a time – Bridge-based heuristics: local maxima are found in between the
closest points of adjacent clusters
9
maxA
dc Y, !Y (A)( ), A= ai{ }i=1
m
(1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
![Page 10: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/10.jpg)
http://pralab.diee.unica.it
Poisoning Single-Linkage Clustering
10
• Underlying idea: bridging the closest clusters – Given K clusters, K-1 candidate attack points
Candidate attack points
(1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
![Page 11: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/11.jpg)
http://pralab.diee.unica.it
Poisoning Single-Linkage Clustering
1. Bridge (Best): evaluates Y’(a) for each candidate attack, retaining the best one
– Clustering is run for each candidate attack point
2. Bridge (Hard): estimates Y’(a) assuming that each candidate will split the corresponding cluster, potentially merging it with a fragment of the closest cluster
– It does not require running clustering to find the best attack point
3. Bridge (Soft): estimates Y’(a) as Bridge (Hard), but using a soft probabilistic estimate instead of 0/1 sample-to-cluster assignments
– It does not require running clustering to find the best attack point
11 (1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
![Page 12: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/12.jpg)
http://pralab.diee.unica.it
Poisoning Single-Linkage Clustering
• The attack compromises the initial clustering by forming heterogeneous clusters
12
Clustering on untainted data
−2 −1.5 −1 −0.5 0 0.5 1 1.5
−2.5
−2
−1.5
−1
−0.5
0
0.5
1
1.5
2
2.5
Clustering after adding 20 attack samples
(1) B. Biggio et al. Is data clustering in adversarial settings secure? In AISec 2013 (2) B. Biggio et al.. Poisoning complete-linkage hierarchical clustering, In S+SSPR 2014
![Page 13: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/13.jpg)
http://pralab.diee.unica.it
Malheur Behavioral Malware Clustering
• Malware executed in a sandbox (e.g., virtual machine) – Monitoring of program behavior (instructions, system calls, etc.)
• Embedding of malware behavior in feature space – Each feature denotes presence / absence of a given instruction – Each vector is normalized to unit Euclidean norm
• Clustering using single-linkage (or other linkage variants)
13
Filesystem copy file ‘a’ to ’b’ open file ’foo.txt’ Network ping host ’10.1.2.3’ listen on port ‘31337’ Registry set key ‘reboot’ to ‘1’
14 01 | 11 04 … 02 02 | 02 02 … 0d 01 | 03 0a … 03 03 | 03 01 … 03 0a | 11 04 …
Sandbox MIST
Instruction (opcode)
arguments 14 01
02 02 +
Feature space
(1) K. Rieck et al.. Automatic analysis of malware behavior using machine learning. JCS 2011
(level 1)
![Page 14: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/14.jpg)
http://pralab.diee.unica.it
Poisoning Malheur
• Poisoning single-linkage hierarchical clustering
• Problem: how to create bridge points in this feature space? – Binary-valued vectors normalized to unit Euclidean norm
• Additional constraint on the manipulation of malware samples – Malware should be modified without affecting malicious functionality – Adding instructions after malware program execution – Feature values can be only incremented
14
x1 = 1 1 0 0 0( )
x2 = 0 0 1 1 1( )
0 1 2 30
0.5
1
1.5
number of added features
d(x,x1)
d(x,x2)
x214 01
02 02 +
+
+ Bridge point
x1xx
![Page 15: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/15.jpg)
http://pralab.diee.unica.it
Experimental Setup and Datasets
• Setup – Data split into two portions of equal size T and S – T used for extracting instructions and setting the cutoff threshold – S used for performance evaluation – F-measure: agreement between clusters and malware families
• Malheur data – 3131 malware samples collected in 2009 (publicly available) – 85 instructions / features (on average) – Cutoff distance (max. F-measure on T): 0.49 (on average)
• Recent Malware data – 657 malware samples from most prominent families in 2013 – 78 instructions / features (on average) – Cutoff distance (max. F-measure on T): 0.63 (on average)
15
![Page 16: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/16.jpg)
http://pralab.diee.unica.it
Experimental Results (Malheur data)
• Attack strategies – Bridge (Best/Hard/Soft), Random, Random (Best), F-measure (Best)
• Results for Malheur data – Random-based attacks are not effective (high-dimensional space) – Bridging is effective / clusters are fused together (cutoff threshold is fixed) – F-measure decreases while maximizing distance between clusterings
16
0 10 20 30 40 50 60 70 80−1
−0.8−0.6−0.4−0.2
00.20.40.60.8
1
Random Random (Best) Bridge (Best) Bridge (Soft) Bridge (Hard) F−measure (Best)
0% 2% 5% 7% 9% 11% 13%15%17% 18% 20%0
200
400
600
800
1000
1200
1400
1600
Ob
ject
ive funct
ion
0% 2% 5% 7% 9% 11% 13% 15% 17% 18% 20%102030405060708090
100F
−m
easu
re
Fraction of poisoning attacks
![Page 17: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/17.jpg)
http://pralab.diee.unica.it
Experimental Results (Recent Malware data)
• Attack strategies – Bridge (Best/Hard/Soft), Random, Random (Best), F-measure (Best)
• Results for Recent Malware data – Random-based attacks are not effective (high-dimensional space) – Bridging is effective / clusters are fused together (cutoff threshold is fixed) – F-measure decreases while maximizing distance between clusterings
17
0 10 20 30 40 50 60 70 80−1
−0.8−0.6−0.4−0.2
00.20.40.60.8
1
Random Random (Best) Bridge (Best) Bridge (Soft) Bridge (Hard) F−measure (Best)
0% 2% 5% 7% 9% 11%13%15%16%18%20%0
50
100
150
200
250
300
Obje
ctiv
e funct
ion
0% 2% 5% 7% 9% 11%13%15%16%18%20%5254565860626466687072
F−
measu
re
Fraction of poisoning attacks
![Page 18: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/18.jpg)
http://pralab.diee.unica.it
Conclusions and Future Work
• Poisoning attacks can subvert behavioral malware clustering • Future work
– Extensions to other clustering algorithms, common attack strategy • e.g., black-box optimization with suitable heuristics
– Attacks with limited knowledge of the data / clustering algorithm
18
Secure clustering algorithms
Attacks against clustering
![Page 19: Battista Biggio @ AISec 2014 - Poisoning Behavioral Malware Clustering](https://reader033.fdocuments.in/reader033/viewer/2022060122/559511561a28ab06108b4790/html5/thumbnails/19.jpg)
http://pralab.diee.unica.it
? 19
Any quesCons Thanks for your a#en-on!