Anomaly Detection for Security
-
Upload
cody-rioux -
Category
Data & Analytics
-
view
1.436 -
download
5
Transcript of Anomaly Detection for Security
Anomaly Detection for Security
Cody Rioux - @codyriouxReal-Time Analytics - Insight Engineering
Overview.● Real-Time Analytics● Anomaly: Fast Incident Detection
○ Techniques○ Case Study: Detecting Phishing○ Challenges: Base Rate Fallacy
● Outlier: Identifying Rogue Agents○ Clustering○ Case Study: Cleaning Up Rogue Agents
● Recap
We are drowning in information but starved for knowledge.- John Naisbitt
Real-Time Analytics
Real-Time Analytics● Part of Insight Engineering.● Build systems that make intelligent decisions about our operational environment.
○ Make decisions in near real-time.○ Automate actions in the production environment.
● Support operational availability and reliability.
Terminology
Outlier Anomaly
Case Study: Phishing● Just hired as the only security staff at a startup.● Fell victim to a phishing attack last week.
○ They did not know it happened when it was happening.○ They did not know what to do about it
● You’re tasked with solving this problem.
Incident Detection for Stats Geeks
Anomaly Detection
Unexpected value for a given generating mechanism.
Terminology
Outlier Anomaly
TechniquesBasic
● Static thresholds
● Exponential Smoothing
● Three-sigma rule
Advanced● Robust Anomaly Detection (RAD) - Netflix
● Kolmogorov-Smirnov
● Highest density interval (HDI)
● t-digest
● Linear models
TechniquesBasic
● Static thresholds - Doesn’t play well with nonstationary signals.● Exponential Smoothing - Black Swan days like Christmas, Superbowl cause issues.● Three-sigma rule - Works (very) well only for signals drawn from a Gaussian.
Show me the Money!● No threshold configuration● We require examples of normal, not examples of anomaly● Automatically adapt to moving signals● Higher accuracy enables automatic reaction● Ensemble (combination) of techniques eliminates some
downsides
Base Rate FallacyIntrusion is comparatively rare which affords you many opportunities to generate a false positive.
Base Rate Fallacy
● 10,000 log entries● 99% Accuracy● 0.01% Intrusions
1 Real incident
100 false + and 10% chance of false -
Case StudySo far we can automatically alert interested parties to the possibility of an intrusion.
Identifying Rogue Agents in a Production Environment
Outlier Detection
Rogue Agents?
● Identify brute force attempts on login systems● Flag cheaters in online video games● Identify participating ip addresses in a
phishing scam
Terminology
Outlier Anomaly
Case Study RevisitedYou’ve devised an automated technique for identifying attacks, now we require an autonomous system for remediation of attacks.
Goal: identify accounts and IP Addresses that are not behaving like their peers.
Clustering● DBSCAN● K-Means● Gaussian Mixture Models
Conceptually● If a point belongs to a group it should be near lots of other points as measured by
some distance function.
Case Study RevisitedLets cluster accounts based on their login habits and initiate an automatic password reset and notification.
Case Study RevisitedLets cluster IP addresses based on their login habits and automatically ban them.
Full stack autonomous incident detection and remediation.
Recap
Case Study Recap● Anomaly Detection enables us to...
○ Automatically identify potential attacks in real-time.○ Notify interested parties of the attack.○ React to those attacks without user intervention.
● Outlier Detection with Clustering enables us to…○ Identify rogue agents within the environment.○ Reset customer passwords for potentially compromised accounts.○ Ban IP Addresses identified to be participating in the phishing scheme.
Literature
Machine Learning: The High
Interest Credit Card of Technical
Debt (Sculley et al., 2014)
Literature● The Base-Rate Fallacy and its Implications for the Difficulty of Intrusion Detection
(Alexsson, 1999)● Practical Machine Learning: A New Look at Anomaly Detection (Dunning, 2014)● ALADIN: Active Learning of Anomalies to Detect Intrusion (Stokes and Platt, 2008)● Distinguishing cause from effect using observational data: methods and benchmarks
(Mooij et al., 2014)● Enhancing Performance Prediction Robustness by Combining Analytical Modeling
and Machine Learning (Didona et al., 2015)
Implementations
● Robust Anomaly Detection (RAD) - Netflix ● Seasonal Hybrid ESD - Twitter● Extendible Generic Anomaly Detection
System (EGADS) - Yahoo● Kale - Etsy
[email protected]@codyriouxlinkedin.com/in/codyrioux