Machine Learning for Network Anomaly Detection
description
Transcript of Machine Learning for Network Anomaly Detection
![Page 1: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/1.jpg)
Machine Learning for Network Anomaly Detection
Matt Mahoney
![Page 2: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/2.jpg)
Network Anomaly Detection
• Network – Monitors traffic to protect connected hosts
• Anomaly – Models normal behavior to detect novel attacks (some false alarms)
• Detection – Was there an attack?
![Page 3: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/3.jpg)
Host Based Methods
• Virus Scanners
• File System Integrity Checkers (Tripwire, DERBI)
• Audit Logs
• System Call Monitoring – Self/Nonself (Forrest)
![Page 4: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/4.jpg)
Network Based Methods
• Firewalls
• Signature Detection (SNORT, Bro)
• Anomaly Detection (eBayes, NIDES, ADAM, SPADE)
![Page 5: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/5.jpg)
User Modeling
• Source address – unauthorized users of authenticated services (telnet, ssh, pop3, imap)
• Destination address – IP scans
• Destination port – port scans
![Page 6: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/6.jpg)
Frequency Based Models
• Used by SPADE, ADAM, NIDES, eBayes, etc.
• Anomaly score = 1/P(event)
• Event probabilities estimated by counting
![Page 7: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/7.jpg)
Attacks on Public Services
PHF – exploits a CGI script bug on older Apache web servers
GET /cgi-bin/phf?Qalias=x%0a/usr
/bin/ypcat%20passwd
![Page 8: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/8.jpg)
Buffer Overflows
• 1988 Morris Worm – fingerd
• 2003 SQL Sapphire Wormchar buf[100];
gets(buf);
buf stackExploit code
Return Address0 100
![Page 9: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/9.jpg)
TCP/IP Denial of Service Attacks
• Teardrop – overlapping IP fragments
• Ping of Death – IP fragments reassemble to > 64K
• Dosnuke – urgent data in NetBIOS packet
• Land – identical source and destination addresses
![Page 10: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/10.jpg)
Protocol Modeling
• Attacks exploit bugs
• Bugs are most common in the least tested code
• Most testing occurs after delivery
• Therefore unusual data is more likely to be hostile
![Page 11: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/11.jpg)
Protocol Models
• PHAD, NETAD – Packet Headers (Ethernet, IP, TCP, UDP, ICMP)
• ALAD, LERAD – Client TCP application payloads (HTTP, SMTP, FTP, …)
![Page 12: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/12.jpg)
Time Based Models
• Training and test phases
• Values never seen in training are suspicious
• Score = t/p = tn/r where– t = time since last anomaly– n = number of training examples– r = number of allowed values– p = r/n = fraction of values that are novel
![Page 13: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/13.jpg)
Example tn/r
• Training: 0000111000 n/r = 10/2
• Testing: 01223– 0: no score– 1: no score– 2: tn/r = 6 x 10/2 = 30– 2: tn/r = 1 x 10/2 = 5– 3: tn/r = 1 x 10/2 = 5
![Page 14: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/14.jpg)
PHAD – Fixed Rules
• 34 packet header fields– Ethernet (address, protocol)– IP (TOS, TTL, fragmentation, addresses)– TCP (options, flags, port numbers)– UDP (port numbers, checksum)– ICMP (type, code, checksum)
• Global model
![Page 15: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/15.jpg)
LERAD – Learns conditional Rules
• Models inbound client TCP (addresses, ports, flags, 8 words in payload)
• Learns conditional rules
If port = 80 then word1 = GET, POST (n/r = 10000/2)
![Page 16: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/16.jpg)
LERAD Rule Learning
• If word1 = GET then port = 80 (n/r = 2/1)• word1 = GET, HELO (n/r = 3/2)• If address = Marx then port = 80, 25 (n/r =
2/2)
Address Port Word1 Word2
Hume 80 GET /
Marx 80 GET /index.html
Marx 25 HELO Pascal
![Page 17: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/17.jpg)
LERAD Rule Learning
• Randomly pick rules based on matching attributes
• Select nonoverlapping rules with high n/r on a sample
• Train on full training set (new n/r)
• Discard rules that discover novel values in last 10% of training (known false alarms)
![Page 18: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/18.jpg)
DARPA/Lincoln Labs Evaluation
• 1 week of attack-free training data
• 2 weeks with 201 attacks
SunOS Solaris Linux NT
RouterInternet
SnifferAttacks
![Page 19: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/19.jpg)
Attacks out of 201 Detected at 10 False Alarms per Day
0
20
40
60
80
100
120
140
PHAD ALAD LERAD NETAD
![Page 20: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/20.jpg)
Problems with Synthetic Traffic
• Attributes are too predictable: TTL, TOS, TCP options, TCP window size, HTTP, SMTP command formatting
• Too few sources: Client addresses, HTTP user agents, ssh versions
• Too “clean”: no checksum errors, fragmentation, garbage data in reserved fields, malformed commands
![Page 21: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/21.jpg)
Real Traffic is Less Predictable
r (Number ofvalues)
Time
Synthetic
Real
![Page 22: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/22.jpg)
Mixed Traffic: Fewer Detections, but More are Legitimate
0
20
40
60
80
100
120
140
PHAD ALAD LERAD NETAD
Total
Legitimate
![Page 23: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/23.jpg)
Project Status
• Philip K. Chan – Project Leader
• Gaurav Tandon – Applying LERAD to system call arguments
• Rachna Vargiya – Application payload tokenization
• Mohammad Arshad – Network traffic outlier analysis by clustering
![Page 24: Machine Learning for Network Anomaly Detection](https://reader035.fdocuments.in/reader035/viewer/2022062304/568146c2550346895db3f9de/html5/thumbnails/24.jpg)
Further Reading
• Learning Nonstationary Models of Normal Network Traffic for Detecting Novel Attacks by Matthew V. Mahoney and Philip K. Chan, Proc. KDD.
• Network Traffic Anomaly Detection Based on Packet Bytes by Matthew V. Mahoney, Proc. ACM-SAC.
• http://cs.fit.edu/~mmahoney/dist/