Using Data Science Techniques to Detect Malicious Behavior
-
Upload
endgameinc -
Category
Technology
-
view
417 -
download
0
Transcript of Using Data Science Techniques to Detect Malicious Behavior
![Page 1: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/1.jpg)
Using Data Science Techniques to
Help Detect Malicious Behavior
Phil Roth, Data Scientist
![Page 2: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/2.jpg)
• An introduction to key data science concepts
• Challenges that exist to applying those concepts to security data
• Why focusing on aiding a human security analyst can lead to better machine learning tools
• How Endgame’s enterprise product benefits from that focus
Key Takeaways
![Page 3: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/3.jpg)
Data Science Process
![Page 4: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/4.jpg)
Gather Raw Data
Process and Clean
Data
Explore the Data
Apply a Model
Communicate the Result
Data Science Process
![Page 5: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/5.jpg)
Data Science Process
Data can come from many disparate sources.
Raw data must be cleaned and features extracted
Gather RawData
Process and Clean Data
Explore DataFinding relationships in the data provides hints about what features and models will be useful.
![Page 6: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/6.jpg)
Data Science Process
Models exploit features and relationships in the data to make a statement.
Apply a Model
Communicate the Result
The output of a data product is useless without effective and actionable communication.
![Page 7: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/7.jpg)
Introduction to Machine Learning Models
![Page 8: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/8.jpg)
In supervised learning, input data is labeled. An algorithm attempts to reproduce those labels on new unlabeled data.
input datalabel-3 -4 1 0 1-4 -3 1 1 1-4 -4 0 0 1+4 +3 1 0 0+3 +4 0 1 0+3 +3 1 0 0
new datalabel-3 -4 1 1 ???
Supervised learning
![Page 9: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/9.jpg)
A Support Vector Machine1 finds the best separating boundary between two classes in space.
Supervised learning example
1 http://scikit-learn.org/stable/modules/svm.html
![Page 10: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/10.jpg)
In unsupervised learning, input data is unlabeled. An algorithm attempts to find hidden structure in that data.
input data-3 -4 1 0-4 -3 1 1-4 -4 0 0+4 +3 1 0+3 +4 0 1+3 +3 1 0
group 1
group 2
Unsupervised learning
![Page 11: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/11.jpg)
Unsupervised learning example
step 1:
step 2:
etc…
k-means clustering iteratively improves the location of cluster centers by moving them closer to cluster means
![Page 12: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/12.jpg)
Challenges with Security Data
![Page 13: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/13.jpg)
Recommendation Systems
Character RecognitionMNIST Database of Handwritten Digits
Security lacks open datasets
![Page 14: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/14.jpg)
The DARPA Intrusion Detection Evaluation dataset is 15 years old, simulated, and techniques trained on it were never actionable.
Sharing data in the security industry will always be a challenge that even President Obama is attempting to address.
Security lacks open datasets
![Page 15: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/15.jpg)
Labeling is an expensive process that requires expertise.
vs.
Security lacks easy labels
Is this binary malicious?
Is this traffic an intrusion?
Are these products related?
![Page 16: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/16.jpg)
False positives lead to expensive analyst investigations and alert fatigue and
False negatives get CEOs fired
Security lacks tolerance for errors
![Page 17: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/17.jpg)
Machine Learning in security could benefit from focusing on “human in the loop” products over
“the algorithm does it all” products
Chess Analogy
1997: IBM’s supercomputer Deep Blue vs. Gary Kasparov2005: Team ZachS vs multiple Grandmasters in Freestyle Chess2
Human/Machine teams retained an edge over machines for decades
2 Cowen, Tyler. Average Is Over. Chapter 5. 2013
![Page 18: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/18.jpg)
Using the Human/Machine Model
![Page 19: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/19.jpg)
Cloud deployed virtual machines are clustered based on their behavior. The results are communicated to analysts and used to improve the detection of malicious behavior.
Endgame Implementation
![Page 20: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/20.jpg)
Package, process, and user information is collected from the machines.
DBSCAN, a clustering algorithm, groups the machines based on that information.
Endgame implementation
![Page 21: Using Data Science Techniques to Detect Malicious Behavior](https://reader038.fdocuments.in/reader038/viewer/2022110313/55c55f79bb61ebac1b8b462e/html5/thumbnails/21.jpg)
• An introduction to key data science concepts
• Existing challenges to applying those concepts to security data
• Why focusing on aiding a human security analyst can lead to better machine learning tools
• How Endgame’s enterprise product benefits from that focus
Key Takeaways