opml shooting the moving target - USENIX · Shooting the moving target : machine learning in...
Transcript of opml shooting the moving target - USENIX · Shooting the moving target : machine learning in...
1
Shooting the moving target :machine learning in cybersecurity
Ankit Arun*, Ignacio Arnaldo
MIT CSAIL top 16 2016
2
1. Machine Learning in Cybersecurity: problem statement and state-of-the-art
2. Machine Learning Platform
3. Current state of the system
4. Ongoing efforts
Outline
3
Vast number of data sources and attacks
100+Log Types
1000+Security Attacks
Reported in 2018
~ 24kmalicious mobile apps are blocked
everyday
600%IoT attacks in 2017
350%annually
Ransomware attacks
303USA faced
targeted attacks between 2015 and
2017
The Need for AI in InfoSec: Data Problem
86%are investigated
successfully
80%of Attacks GoUndetected
By machines (aka logs and network systems) during or after the attack
By human analysts, after an attack has been known to occur
4
Detection approaches
5
I’m Here
Coverage
False positives Dwell time
Threat intel and signatures Rules Anomaly detection Supervised models
6
Challenges…
Cybersecurity
ComputerVision
More ExpertKnowledgeRequired
DATA PROPERTY AVAILABILITY VARIETY LABELED STATIC / DYNAMIC
Siloed with BarriersAdversarial
and Dynamic
7
State-of-the-art ML in Cybersecurity
[1] M. Darling, G. Heileman, G. Gressel, A. Ashok, and P. Poornachandran, “A lexical approach for classifying malicious urls”
[2] M. S. I. Mamun, M. A. Rathore, A. H. Lashkari, N. Stakhanova, and A. A. Ghorbani, “Detecting malicious urls using lexical analysis”
[3] Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant, “Predicting domain generation algorithms with long short-term memory networks”
[4] H. S. Anderson, J. Woodbridge, and B. Filar, “DeepDGA: Adversarially-Tuned Domain Generation and Detection,”
[5] J. Saxe et al., “eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys”
● 2015-2016: lexical analysis to detect spam, malware hosting and phishing URLs [1][2]
● 2016: LSTMs for DGA detection [3][4]● 2017: Char-level CNNs for URL classification [5]
Academia Industry● Web traffic open by default● Blacklists based on threat intelligence● ML is rarely used for live detection
Are the approaches still valid or are they outdated?How do the models perform in real world scenarios?
Will the models work in my environment?Risks preventing ML adoption:
Logs
Data Pipelines
Labels
Models
Continuous Improvement Process
• Adding/Changing more data• Changing the entity to model• Adding more attack examples• Changing modeling strategy
8
9
Machine Learning Platform
10
The cloud repositories
Golden Data Set and Models
Threat Researchers
ML Engineers
Data Scientists
11
The cloud repositories
Horizontal Brute Force Attackenvironment_1raw_logsnormalized_logsfeatures
label.csvlabeled_feature_matrixmodelsBrute_force_attack_classifier_v1.1Brute_force_attack_outlier_v1.1
12
Configurable data pipelines
fields {name: ‘protocol’display_name: ‘Protocol’index: ‘proto’data_type: string
}
Log Parsing Engine
13
Configurable data pipelines
feature {name: ‘distinct_protocol’display_name: ‘Distinct Protocol’definition: ‘count_distinct(protocol)’data_type: int
}
Feature Compute Engine
14
Model Versioning
Brute_force_attack_Classifier_V2.3
Major Version
Minor Version
Brute Force Attack ClassifierParam Version Apr 2019 Mar 2019 Feb 2019 Jan 2019 Dec 2018 Nov 2018 Oct 2018 Sep 2018
5
5 1
4
v1
v2
v3
4 3 2 1
4 3 2
3 2 1
Current state of the system
Ping Sweep
Port Scan
DNS Reconnaissance
Zone Transfer
Social Eng Domains
Phishing Domains
Redirects
Dll Highjack
Task Sched
Mimikatz
Winroot
Domain Enumeration
Brute Force Login
Overpass the Hash
Skeleton Key Attack
Kerberoasting
DC Replication
Golden Ticket Attack
SSO Login Attack
Malware Backdoor
DGAs
TOR Connections
ICMP Tunneling
HTTP Tunneling
Twittor
SSH Tunneling
DNS Tunneling
DNS Beaconing
ICMP Exfiltration
HTTP Exfiltration
Gmail Exfiltration
Twitter Exfiltration
NTP Exfiltration
SMTP Exfiltration
DNS Exfiltration
Cloud Takeover
Reconnaissance Delivery Privilege Escalation
Lateral Movement
Command and Control Exfiltration
Fwd Proxy Logs / NGFW
AD Logs
EDR Logs
DNS Logs
App Logs
Network
Proxy Logs
Zscaler
BlueCoat
Squid
Bro HTTP
Intersafe
FW Logs
PANW
Cisco ASA
Fortigate
NetScreen
Bro Conn
Flow Logs
Netflow
VPC Flow
IBM QFlow
DNS Logs
Windows DNS
Suricata
Bro DNS
Authentication
Auth/Auth
Active Directory
Okta
End Point
EDR Logs
Carbon Black
osQuery
Applications
App Logs
Apache
Box
OneDrive
Audit Trail
AWS CloudTrail
Contextual
Contextual
DHCP
Tenable
STIX
Open IoC
Alexa Top 1M
31Data Sources
27Golden
Datasets
70Models
1000Model
Deployment
Weekly Model
Updates
15
Ongoing efforts
• Automating Feature Computation
• Data Shift Detection
• Automating Model Review/Update Process
16
References• https://www.ptsecurity.com/ww-en/analytics/cybersecurity-
threatscape-2018-q3/• https://www.checkpoint.com/downloads/product-related/report/2018-
security-report.pdf• https://www.varonis.com/blog/cybersecurity-statistics/
17
18
Questions?