Post on 21-Dec-2015
Dynamic Application-Layer Protocol Analysis
For Network Intrusion Detection
Holger Dreger, TU Munchen
Anja Feldmann, T-Labs / TU Berlin
Michael Mai, TU Munchen
Vern Paxson, ICSI / LBNL
Robin Sommer, ICSI
Presented by: Jim Spadaro
NIDS: State-of-the-Art
• Protocol-specific traffic analysis Semantic context for (much) better detection quality
• How to decide which protocol to analyze? Relies on well-known port numbers
• (As in, HTTP if-and-only-if TCP port 80)• (or um maybe 8080 and 8000 and ….)
• And if it’s not on a well-known port? Perhaps use byte-level signatures to flag what
protocol it appears to be
Problem
• Applications use arbitrary ports! Benign reasons
• Lack of user privileges, obfuscation, multiple versions• Adversarial applications (maybe not so benign)
e.g. Skype bypassing firewalls
Malicious intent• Evasion of security monitoring
IRC-botnets on ports other than 666x/tcp Pirate FTP-servers on ports other than 21/tcp
• How to distinguish these?
Structure
• Prevalence of the problem
• Approach for dynamic analysis in NIDS
• Applications of new capabilities
• Performance evaluation
Prevalence of the Problem
• Data 24 hour full packet trace from MWN 3.2 TB of data in 6.3 billion pkts,
137M TCP connections
Successful TCP connections: ~78% Successful TCP connections on unpriv. Port: ~4%
• UCB: University of California, Berkeley, 45,000• MWN: Munich Scientific Network, 50,000• LBNL: Lawrence Berkeley National Laboratory, 13,000
Existing NIDS Solutions
• None known to fully address the problem• Bro, Snort, Dragon, and Intrushield all rely on
port-based protocol analysis Some can use signatures to detect inappropriate
protocol use
• Such detection is helpful, but has drawbacks Does not distinguish benign off-port traffic from
malicious:• Can only stop BitTorrent completely, not detect for illegal file
sharing• Can only turn off off-port IRC completely, not detect botnets
Protocol Detection - Alternatives
• Statistical approach E.g., packet size distribution
• Suitable for separating interactive/bulk traffic e.g., distinguish chat from file transfers
Detect protocol patterns• Signatures (already implemented)
Relatively easy to implement: most NIDS have signature-matching infrastructure
e.g., Linux netfilter l7-filter
• Very general signatures, not completely accurate
• Maybe: Protocol detection by plausibility heuristics
Protocol Detection: Signatures
• Most (but not all) successful connections trigger expected signature
• FTP: high percentage of false negatives ~ 21.7%
• “Other port” matches: needs further investigation
Protocol HTTP IRC FTP SMTP
Port (Succ.) 93,429K 75.9K 151.7K 1,447K
Signature 94,326K 74.0K 125.3K 1,416K
expected port 92,228K 71.5K 98.0K 1,415K
other port 2,126K 2.5K 27.3K 0.3K
Protocol Signatures:Well-known Ports
• Some connections trigger more than one signature Signature too general
• Some misappropriate use of well-known ports
Port HTTP IRC SMTP Other No match
80 92,228,291 59 0 41,086 1,158,977
666x 1,217 71,650 0 4,238 524
25 459 2 1,415,428 195 31,889
Observations
• Imprecision of signatures: False negatives highlight need for refined signatures
and/or more context False positives (e.g., multiple matches for single
connection) highlight limits in discriminating power Certain protocols are difficult to make signatures for
• Telnet: many legitimate initial byte patterns
• Problem is real: If we just believe port numbers, numerous
misidentifications
Structure
• Prevalence of the problem
• Approach for dynamic analysis in NIDS
• Applications of new capabilities
• Performance evaluation
Goals
• Detection Scheme Independent Currently predominantly use signatures
• However, flexibility is maintained to allow other approaches, like heuristics
• Dynamic Analysis Some protocol detection schemes need more data than others Analyzers should be disabled upon detecting a false positive
• Modularity Eases dealing with multiple network substacks
• IP-within-IP tunnels
• Efficiency Improvements must retain performance
• Customizability Result must easily adapt to specific needs
Approach for Dynamic Analysis
• Dynamic data path enhances flexibility and accuracy Example: A packet is received on port 80/tcp, but
really carries data for an IRC session• A traditional NIDS will still examine the packet as HTTP• Dynamic analysis can change the analysis to IRC even
though the analysis was initialized for HTTP
• Approach uses a PIA Protocol Identification Analyzer
Dynamic Data Path
• How can this be done? Associate each connection with a tree structure
• Each node represents an analyzer• Links represent data channels, with parent node’s output channels
connecting to childrens’ input channels The PIA instantiates the initial analyzers
• Each analyzer can insert or remove other analyzers on its input and output channels
Thus, each analyzer can add additional analyzers if it needs the support of additional functionality
• If the analyzer cannot determine which analyzer is needed, another PIA can be instantiated
• An analyzer that cannot analyze the data it is being given can remove its subtree from the tree
Allows siblings on the tree to be run in parallel
Analyzer Tree Example
• Example for an analyzer tree for an email connection: The IP Analyzer determines the connection is TCP The TCP Analyzer determines the connection looks like email Analyzers for SMTP, POP, and IMAP are instantiated to analyze
the data Any analyzers that determine that they cannot analyze the data
can remove themselves
Technical Issues
• Byte Streams vs Packet Streams Protocols over TCP vs Other Resolved by having both input channels for
each analyzer
• Starting an analyzer mid-connection Resolved by buffering the start of each stream
(Default 4KB)
Implementation
• Implemented in Bro NIDS New “Protocol Identification Analyzer” (PIA)
implements protocol-detection and buffering Stock Bro has modular design suited to implementing
the PIA Required changing Bro’s notion of one-to-one static
binding from transport analyzer to application analyzer(s)
• Running in three large environments: MWN, UCB, and LBNL
Implementation
• PIA examines the first few KB of each connection for efficiency Shown to be sufficient for protocol detection
• Can activate analyzers in four ways: Signatures Connection port Each analyzer can register a detection function
• Allows arbitrary heuristics
Using a prediction table
Deployment Trade-Offs
• Protocol detection signatures Loose signatures affordable
• false positives fixed later But too lose means slower
• Analyzer is more expensive than pattern-matching• Improve accuracy with bidirectional signatures
Server must respond with the same protocol Prevents attacker from intentionally triggering slow
analyzers
Deployment Trade-Offs
• At what point should an analyzer remove itself? Real-world traffic is not perfect
• Implementations can stretch protocol bounds
Should not parse the whole stream• Defeats the purpose of protocol analysis
Resolution: Analyzer should never disable itself• Generate Bro events on protocol violations• Allow user-level policy script to disable analyzer if necessary
E.g., after a certain number of violations
Structure
• Prevalence of the problem
• Approach for dynamic analysis in NIDS
• Applications of new capabilities
• Performance evaluation
New Capabilities
• In summary, can now: Detect connections on non-standard ports
reliably• Includes protocols that use others as transport
IE, distinguish Kazaa, BitTorrent, SOAP, etc over HTTP
Inspect payload of FTP transfers Detect IRC-based bots
• This has successfully worked in the field
Reliable Real-Time Protocol Detection on non-Standard Ports
• 1 day at UC Berkeley (MWN similar)• Connections on non-standard ports mainly HTTP
UCB: Split between real HTTP (e.g., Apache) and Gnutella MWN: Similar, but more P2P (BitTorrent), also some FTP Open HTTP proxies detected and closed Open SMTP relay detected and closed
Internal Remote
FTP servers 6 17
HTTP servers 568 54,830
IRC servers 2 33
SMTP servers 8 8
Payload Inspection of FTP Data Transfers
• FTP data transfers use arbitrary ports Identify based on prior PORT, PASV
• Dynamically added to prediction table
• Check connection payload use libmagic Actual file type == expected file type?
• E.g, could find rootkit tarball sent in .jpg Determined using file analyzer
• Extension: Use same mechanism for SMTP(mail attachments)
Detecting IRC Based Botnets
• Idea Botnet communication often uses IRC Botnet detector on top of IRC analyzer
• Check nicknames• Check channel names• Check contact to identified bot-servers
• Key consideration: must analyze IRC dialog seen off-port Because lots of benign IRC runs off-port too…
• > 100 bots found at MWN+UCB MWN employs auto-blocking based on detector
• Not as adept at detecting custom protocols
Performance Evaluation
Stock-Bro
PIA-Bro
PIA-Bro-M4K
Config-A Standard
Standard + sigs
3335s
3843s
3254s
3778s
Config-B All TCP pkts 3584s 3496s
Config-C All TCP pkts + sigs
All TCP pkts + sigs + reass.
4446s 4436s
4488s
3716s
4795s
Performance
• New framework does not add significant additional overhead Performance cost is about 13.8% between
PIA-Bro-M4K and Stock-Bro
• Protocol detection (signature matching on all packets) expensive but doable) Solutions:
• Specialized hardware• Load balancing possible
Summary
• Network traffic resists classification by port
• General framework for dynamic protocol analysis Use signatures to pre-filter for efficiency Use application parsing to make high-quality decisions
• Accurate enough for auto-blocking of bots at large-scale network Plus detection of illicit relays and servers Integrated into Development Release 1.2 of Bro