opml shooting the moving target - USENIX · Shooting the moving target : machine learning in...

1

Shooting the moving target :machine learning in cybersecurity

Ankit Arun*, Ignacio Arnaldo

MIT CSAIL top 16 2016

2

1. Machine Learning in Cybersecurity: problem statement and state-of-the-art

2. Machine Learning Platform

3. Current state of the system

4. Ongoing efforts

Outline

3

Vast number of data sources and attacks

100+Log Types

1000+Security Attacks

Reported in 2018

~ 24kmalicious mobile apps are blocked

everyday

600%IoT attacks in 2017

350%annually

Ransomware attacks

303USA faced

targeted attacks between 2015 and

2017

The Need for AI in InfoSec: Data Problem

86%are investigated

successfully

80%of Attacks GoUndetected

By machines (aka logs and network systems) during or after the attack

By human analysts, after an attack has been known to occur

4

Detection approaches

5

I’m Here

Coverage

False positives Dwell time

Threat intel and signatures Rules Anomaly detection Supervised models

6

Challenges…

Cybersecurity

ComputerVision

More ExpertKnowledgeRequired

DATA PROPERTY AVAILABILITY VARIETY LABELED STATIC / DYNAMIC

Siloed with BarriersAdversarial

and Dynamic

7

State-of-the-art ML in Cybersecurity

[1] M. Darling, G. Heileman, G. Gressel, A. Ashok, and P. Poornachandran, “A lexical approach for classifying malicious urls”

[2] M. S. I. Mamun, M. A. Rathore, A. H. Lashkari, N. Stakhanova, and A. A. Ghorbani, “Detecting malicious urls using lexical analysis”

[3] Woodbridge, H. S. Anderson, A. Ahuja, and D. Grant, “Predicting domain generation algorithms with long short-term memory networks”

[4] H. S. Anderson, J. Woodbridge, and B. Filar, “DeepDGA: Adversarially-Tuned Domain Generation and Detection,”

[5] J. Saxe et al., “eXpose: A Character-Level Convolutional Neural Network with Embeddings For Detecting Malicious URLs, File Paths and Registry Keys”

● 2015-2016: lexical analysis to detect spam, malware hosting and phishing URLs [1][2]

● 2016: LSTMs for DGA detection [3][4]● 2017: Char-level CNNs for URL classification [5]

Academia Industry● Web traffic open by default● Blacklists based on threat intelligence● ML is rarely used for live detection

Are the approaches still valid or are they outdated?How do the models perform in real world scenarios?

Will the models work in my environment?Risks preventing ML adoption:

Logs

Data Pipelines

Labels

Models

Continuous Improvement Process

• Adding/Changing more data• Changing the entity to model• Adding more attack examples• Changing modeling strategy

8

9

Machine Learning Platform

10

The cloud repositories

Golden Data Set and Models

Threat Researchers

ML Engineers

Data Scientists

11

The cloud repositories

Horizontal Brute Force Attackenvironment_1raw_logsnormalized_logsfeatures

label.csvlabeled_feature_matrixmodelsBrute_force_attack_classifier_v1.1Brute_force_attack_outlier_v1.1

12

Configurable data pipelines

fields {name: ‘protocol’display_name: ‘Protocol’index: ‘proto’data_type: string

}

Log Parsing Engine

13

Configurable data pipelines

feature {name: ‘distinct_protocol’display_name: ‘Distinct Protocol’definition: ‘count_distinct(protocol)’data_type: int

}

Feature Compute Engine

14

Model Versioning

Brute_force_attack_Classifier_V2.3

Major Version

Minor Version

Brute Force Attack ClassifierParam Version Apr 2019 Mar 2019 Feb 2019 Jan 2019 Dec 2018 Nov 2018 Oct 2018 Sep 2018

5

5 1

4

v1

v2

v3

4 3 2 1

4 3 2

3 2 1

Current state of the system

Ping Sweep

Port Scan

DNS Reconnaissance

Zone Transfer

Social Eng Domains

Phishing Domains

Redirects

Dll Highjack

Task Sched

Mimikatz

Winroot

Domain Enumeration

Brute Force Login

Overpass the Hash

Skeleton Key Attack

Kerberoasting

DC Replication

Golden Ticket Attack

SSO Login Attack

Malware Backdoor

DGAs

TOR Connections

ICMP Tunneling

HTTP Tunneling

Twittor

SSH Tunneling

DNS Tunneling

DNS Beaconing

ICMP Exfiltration

HTTP Exfiltration

Gmail Exfiltration

Twitter Exfiltration

NTP Exfiltration

SMTP Exfiltration

DNS Exfiltration

Cloud Takeover

Reconnaissance Delivery Privilege Escalation

Lateral Movement

Command and Control Exfiltration

Fwd Proxy Logs / NGFW

AD Logs

EDR Logs

DNS Logs

App Logs

Network

Proxy Logs

Zscaler

BlueCoat

Squid

Bro HTTP

Intersafe

FW Logs

PANW

Cisco ASA

Fortigate

NetScreen

Bro Conn

Flow Logs

Netflow

VPC Flow

IBM QFlow

DNS Logs

Windows DNS

Suricata

Bro DNS

Authentication

Auth/Auth

Active Directory

Okta

End Point

EDR Logs

Carbon Black

osQuery

Applications

App Logs

Apache

Box

OneDrive

Audit Trail

AWS CloudTrail

Contextual

Contextual

DHCP

Tenable

STIX

Open IoC

Alexa Top 1M

31Data Sources

27Golden

Datasets

70Models

1000Model

Deployment

Weekly Model

Updates

15

Ongoing efforts

• Automating Feature Computation

• Data Shift Detection

• Automating Model Review/Update Process

16

References• https://www.ptsecurity.com/ww-en/analytics/cybersecurity-

threatscape-2018-q3/• https://www.checkpoint.com/downloads/product-related/report/2018-

security-report.pdf• https://www.varonis.com/blog/cybersecurity-statistics/

17

https://www.ptsecurity.com/ww-en/analytics/cybersecurity-threatscape-2018-q3/

https://www.checkpoint.com/downloads/product-related/report/2018-security-report.pdf

https://www.varonis.com/blog/cybersecurity-statistics/

18

Questions?

opml shooting the moving target - USENIX · Shooting the moving target : machine learning in...

Documents

Transcript of opml shooting the moving target - USENIX · Shooting the moving target : machine learning in...