3/29/12 - CCSS - Center for Computer Systems Securityccss.usc.edu/530/spring12/3.23.pdf · 3/29/12...
Transcript of 3/29/12 - CCSS - Center for Computer Systems Securityccss.usc.edu/530/spring12/3.23.pdf · 3/29/12...
3/29/12
1
Grade Projec0ons • I’ve calculated two grades for everyone:
– Realis0c: assumes your performance in the course con0nues the same
– Op0mis0c: assumes you get maximum scores for the rest of the course
• Some sta0s0cs: – Realis0c: 1 A, 3 A-‐, 6 Bs, 4 Cs, 5 Ds, 1 F – Op0mis0c: 12 A, 7 A-‐, 1 B+, 1 B
• There is s0ll a lot of room to improve
Dynamic Quaran0ne • Worms spread very fast (minutes, seconds)
– Need automa0c mi0ga0on • If this is a new worm, no signature exists
– Must apply behaviour-‐based anomaly detec0on – But this has a false-‐posi0ve problem! We don’t want to drop legi0mate connec0ons!
• Dynamic quaran0ne – “Assume guilty un0l proven innocent” – Forbid access to suspicious hosts for a short 0me – This significantly slows down the worm spread
C. C. Zou, W. Gong, and D. Towsley. "Worm Propaga0on Modeling and Analysis under Dynamic Quaran0ne Defense," ACM CCS Workshop on Rapid Malcode (WORM'03),
Dynamic Quaran0ne • Behavior-‐based anomaly detec0on can point out suspicious hosts – Need a technique that slows down worm spread but doesn’t hurt legi0mate traffic much
– “Assume guilty un0l proven innocent” technique will briefly drop all outgoing connec0on adempts (for a specific service) from a suspicious host
– Aeer a while just assume that host is healthy, even if not proven so
– This should slow down worms but cause only transient interrup0on of legi0mate traffic
Dynamic Quaran0ne • Assume we have some anomaly detec0on program that flags a host as suspicious – Quaran0ne this host – Release it aeer 0me T – The host may be quaran0ned mul0ple 0mes if the anomaly detec0on raises an alarm
– Since this doesn’t affect healthy hosts’ opera0on a lot we can have more sensi0ve anomaly detec0on technique
Dynamic Quaran0ne • An infec0ous host is quaran0ned aeer 0me units
• A suscep0ble host is falsely quaran0ned aeer 0me units
• Quaran0ne 0me is T, aeer that we release the host
• A few new categories: – Quaran0ned infec0ous R(t) – Quaran0ned suscep0ble Q(t)
1
1λ
2
1λ
Slammer With DQ
3/29/12
2
DQ With Large T?
T=10sec T=30sec
DQ And Patching?
Patch Only Quaran0ned Hosts
Cleaning I(t) Cleaning R(t)
DOMINO • The goal is to build an overlay network so that nodes coopera0vely detect intrusion ac0vity – Coopera0on reduces the number of false posi0ves
• Overlay can be used for worm detec0on • Main feature are ac?ve-‐sink nodes that detect traffic to unused IP addresses
• The reac0on is to build blacklists of infected nodes
V. Yegneswaran, P. Barford, S. Jha, “Global Intrusion Detec0on in the DOMINO Overlay System,” NDSS 2004
DOMINO Architecture DOMINO Architecture • Axis nodes collect, aggregate and share data
– Nodes in large, trustworthy ISPs – Each node maintains a NIDS and an ac0ve sink over large por0on of unused IP space
• Access points grant access to axis nodes aeer thorough administra0ve checks
• Satellite nodes form trees below an axis node, collect informa0on, deliver it to axis nodes and pull relevant informa0on
• Terrestrial nodes supply daily summaries of port scan data
3/29/12
3
Informa0on Sharing • Every axis node maintains a global and local view of intrusion ac0vity
• Periodically a node receives summaries from peers which are used to update global view – List of worst offenders grouped per port – Lists of top scanned ports
• RSA is used to authen0cate nodes and signed SHA digests are used to ensure message integrity and authen0city
How Many Nodes We Need?
40 for port summaries
20 for worst offender list
How Frequent Info Exchange? Staleness doesn’t mader much but more frequent lists are beder to catch worst offenders
How Long Blacklists? About 1000 IPs are enough
How Close Monitoring Nodes? Blacklists in same /16 space are similar ⇒ satellites in /16 space should be grouped under the same axis node and sets of /16 spaces should be randomly distributed among different axis nodes
Automa0c Worm Signatures • Focus on TCP worms that propagate via scanning
• Idea: vulnerability exploit is not easily mutable so worm packets should have some common signature
• Step 1: Select suspicious TCP flows using heuris0cs
• Step 2: Generate signatures using content prevalence analysis
Kim, H.-‐A. and Karp, B., Autograph: Toward Automated, Distributed Worm Signature Detec0on, in the Proceedings of the 13th Usenix Security Symposium (Security 2004), San Diego, CA, August, 2004.
3/29/12
4
Suspicious Flows • Detect scanners as hosts that make many unsuccessful connec0on adempts (>2)
• Select their successful flows as suspicious • Build suspicious flow pool
– When there’s enough flows inside trigger signature genera0on step
Signature Genera0on • Use most frequent byte sequences across flows as the signature
• Naïve techniques fail at byte inser0on, dele0on, reordering
• Content-‐based payload par00oning (COPP) – Par00on if Rabin fingerprint of a sliding window matches breakmark = content blocks
– Configurable parameters: window size, breakmark – Analyze which content blocks appear most frequently and what is the smallest set of those that covers most/all samples in suspicious flow pool
How Well Does it Work? • Tested on traces of HTTP traffic interlaced with known worms
• For large block sizes and large coverage of suspicious flow pool (90-‐95%) Autograph performs very well – Small false posi0ves and false nega0ves
Distributed Autograph • Would detect more scanners • Would produce more data for suspicious flow pool – Reduce false posi0ves and false nega0ves
Automa0c Signatures (approach 2) • Detect content prevalence
– Some content may vary but some por0on of worm remains invariant
• Detect address dispersion – Same content will be sent from many hosts to many des0na0ons
• Challenge: how to detect these efficiently (low cost = fast opera0on)
S.Singh, C. Estan, G. Varghese and S. Savage “Automated Worm Fingerprin0ng,” OSDI 2004
Content Prevalence Detec0on • Hash content + port + proto and use this as key to a table where counters are kept – Content hash is calculated over overlapping blocks of fixed size
– Use Rabin fingerprint as hash func0on – Autograph calculates Rabin fingerprint over variable-‐length blocks that are non-‐overlapping
– Rabin fingerprint is a hash func0on that is efficient to recalculate if a por0on of the input changes
3/29/12
5
Address Dispersion Detec0on • Remembering sources and des0na0ons for each content would require too much memory
• Scaled bitmap: – Sample down input space, e.g., hash into values 0-‐63 but only remember those values that hash into 0-‐31
– Set the bit for the output value (out of 32 bits) – Increase sampling-‐down factor each 0me bitmap is full = constant space, flexible coun0ng
How Well Does This Work? • Implemented and deployed at UCSD network
How Well Does This Work? • Some false posi0ves
– Spam, common HTTP protocol headers .. (easily whitelisted)
– Popular BitTorrent files (not easily whitelisted) • No false nega0ves
– Detected each worm outbreak reported in news – Cross-‐checked with Snort’s signature detec0on
Polymorphic Worm Signatures • Insight: mul0ple invariant substrings must be present in all variants of the worm for the exploit to work – Protocol framing (force the vulnerable code down the path where the vulnerability exists)
– Return address • Substrings not enough = too short • Signature: mul0ple disjoint byte strings
– Conjunc0on of byte strings – Token subsequences (must appear in order) – Bayes-‐scored substrings (score + threshold) J. Newsome, B. Karp and D. Song, “Polygraph: Automa0cally Genera0ng Signatures for Polymorphic Worms,” IEEE Security and Privacy Symposium, 2005
Worm Code Structure • Invariant bytes: any change makes the worm fail
• Wildcard bytes: any change has no effect • Code bytes: Can be changed using some polymorphic technique and worm will s0ll work – E.g., encryp0on
Polygraph Architecture • All traffic is seen, some is iden0fied as part of suspicious flows and sent to suspicious traffic pool – May contain some good traffic – May contain mul0ple worms
• Rest of traffic is sent to good traffic pool • Algorithm makes a single pass over pools and generates signatures
3/29/12
6
Signature Detec0on • Extract tokens (variable length) that occur in at least K samples – Conjuc0on signature is this set of tokens – To find token-‐subsequence signatures samples in the pool are aligned in different ways (shieed lee or right) so that the maximum-‐length subsequences are iden0fied
– Con0guous tokens are preferred – For Bayes signatures for each token a probability is computed that it is contained by a good or a suspicious flow – use this as a score
– Set high value of threshold to avoid false posi0ves
How Well Does This Work? • Legi0mate traffic traces: HTTP and DNS
– Good traffic pool – Some of this traffic mixed with worm traffic to model imperfect separa0on
• Worm traffic: Ideally-‐polymorphic worms generated from 3 known exploits
• Various tests conducted
How Well Does This Work? • When compared with single signature (longest substring) detec0on, all proposed signatures result in lower false posi0ve rates – False nega0ve rate is always zero if the suspicious pool has at least three samples
• If some good traffic ends up in suspicious pool – False nega0ve rate is s0ll low – False posi0ve rate is low un0l noise gets too big
• If there are mul0ple worms in suspicious pool and noise – False posi0ves and false nega0ves are s0ll low
Borrowed from Brent ByungHoon Kang, GMU
A Network of Compromised Computers on the Internet
IP loca0ons of the Waledac botnet.
Borrowed from Brent ByungHoon Kang, GMU Botnets
• Networks of compromised machines under the control of hacker, “bot-‐master”
• Used for a variety of malicious purposes: • Sending Spam/Phishing Emails • Launching Denial of Service adacks • Hos0ng Servers (e.g., Malware download site) • Proxying Services (e.g., FastFlux network) • Informa0on Harves0ng (credit card, bank creden0als, passwords, sensi0ve data.)
Borrowed from Brent ByungHoon Kang, GMU
3/29/12
7
Botnet with Central Control Server
Aeer resolving the IP address for the IRC server, bot-‐infected machines CONNECT to the server, JOIN a channel, then wait for commands.
Borrowed from Brent ByungHoon Kang, GMU Botnet with Central Control Server
The botmaster sends a command to the channel. This will tell the bots to perform an ac0on.
Borrowed from Brent ByungHoon Kang, GMU
Botnet with Central Control Server
The IRC server sends (broadcasts) the message to bots listening on the channel.
Borrowed from Brent ByungHoon Kang, GMU Botnet with Central Control Server
The bots perform the command. In this example: adacking / scanning CNN.COM.
Borrowed from Brent ByungHoon Kang, GMU
Botnet Sophis0ca0on Fueled by Underground Economy
• Unfortunately, the detec0on, analysis and mi0ga0on of botnets has proven to be quite challenging
• Supported by a thriving underground economy – Professional quality sophis0ca0on in crea0ng malware codes
– Highly adap0ve to exis0ng mi0ga0on efforts such as taking down of central control server.
41
Borrowed from Brent ByungHoon Kang, GMU
Emerging Decentralized Peer to Peer Mul0-‐layered Botnets
• Tradi0onal botnet communica0on – Central IRC server for Command & Control (C&C) – Single point of mi0ga0on:
• C&C Server can be taken down or blacklisted • Botnets with peer to peer C&C
– No single point of failure. – E.g., Waldedac, Storm, and Nugache
• Mul0-‐layered architecture to obfuscate and hide control servers in upper 0ers.
Borrowed from Brent ByungHoon Kang, GMU
3/29/12
8
Expected Use of DHT P2P Network Publish and Search
¡ Botmaster publishes commands under the key.
¡ Bots are searching for this key periodically
¡ Bots download the commands
=>Asynchronous C&C
Borrowed from Brent ByungHoon Kang, GMU
Mul0-‐Layered Command and Control Architecture Through P2P
¡ Each Supernode (server) publishes its loca0on (IP address) under the key 1 and key 2
¡ Subcontrollers search for key 1
¡ Subnodes (workers) search for key 2
to open connec0on to the Supernodes
=> Synchronous C&C ¡
Borrowed from Brent ByungHoon Kang, GMU
Current Approaches to Botnet
• Virus Scanner at Local Host – Polymorphic binaries against signature scanning – Not installed even though it is almost free – Rootkit
• Network Intrusion Detec0on Systems – Keeping states for network flows – Deep packet inspec0on is expensive – Deployed at LAN, and not scalable to ISP-‐level – Requires Well-‐Trained Net-‐Security SysAdmin
Borrowed from Brent ByungHoon Kang, GMU
46
Conficker infec0ons are s0ll increasing aeer one year!!!
There are millions of computers on the Internet that do not have virus scanner nor IDS
Borrowed from Brent ByungHoon Kang, GMU
Botnet Enumera0on Approach
• Used for spam blocking, firewall configura0on, DNS rewri0ng, and aler0ng sys-‐admins regarding local infec0ons.
• Fundamentally differs from exis0ng Intrusion Detec0on System (IDS) approaches – IDS protects local hosts within its perimeter (LAN) – An enumerator would iden,fy both local as well as remote infec,ons
• Iden0fying remote infec0ons is crucial – There are numerous computers on the Internet that are not under the protec0on of IDS-‐based systems.
47
Borrowed from Brent ByungHoon Kang, GMU How to Enumerate Botnet
• Need to know the method and protocols for how a bot communicates with its peers
• Using sand-‐box technique – Run bot binary in a controlled environment – Network behaviors are captured/analyzed
• Inves0ga0ng the binary code itself – Reversing the binary into high level codes – C&C Protocol knowledge and opera0on details can be accurately obtained
Borrowed from Brent ByungHoon Kang, GMU
3/29/12
9
Simple Crawler Approach
• Given network protocol knowledge, crawlers: 1. collect list of ini0al bootstrap peers into queue 2. choose a peer node from the queue 3. send to the node look-‐up or get-‐peer requests 4. add newly discovered peers to the queue 5. repeat 2-‐5 un0l no more peer to be contacted
• Can’t enumerate a node behind NAT/Firewall • Would miss bot-‐infected hosts at home/office!
Borrowed from Brent ByungHoon Kang, GMU Passive P2P Monitor (PPM)
• Given P2P protocol knowledge that bot uses • A collec0on of “rou0ng-‐only” nodes that
– Act as peer in the P2P network, but – Controlled by us, the defender
• PPM nodes can observe the traffic from the peer infected hosts
• PPM node can be contacted by the infected hosts behind NAT/Firewall
Borrowed from Brent ByungHoon Kang, GMU
Crawler and Passive P2P Monitor (PPM)
Crawler
PPM
PPM
PPM
Borrowed from Brent ByungHoon Kang, GMU Crawler vs. PPM: # of IPs found
Borrowed from Brent ByungHoon Kang, GMU
Botnet Enumera0on Challenges
• DHCP • NATs • Non-‐uniform bot distribu0on • Churn • Most es0mates put size of largest botnets at tens of millions of bots – Actual size may be much smaller if we account for all of the above
Fast Flux • Botnets use a lot of newly-‐created domains for phishing and malware delivery
• Fast flux: changing name-‐to-‐IP mapping very quickly, using various IPs to thwart defense adempts to bring down botnet
• Single-‐flux: changing name-‐to-‐IP mapping for individual machines, e.g., a Web server
• Double-‐flux: changing name-‐to-‐IP mapping for DNS nameserver too
• Proxies on compromised nodes fetch content from backend servers
3/29/12
10
Fast Flux
• Advantages for the adacker: – Simplicity: only one back end server is needed to deliver content
– Layers of protec0on through disposable proxy nodes
– Very resilient to adempts for takedown
Fast Flux Detec0on
• Look for domain names where mapping to IP changes oeen – May be due to load balancing – May have other (non-‐botnet) cause, e.g., adult content delivery
– Easy to fabricate domain names • Look for DNS records with short-‐lived domain names, with lots of A records, lots of NS records and diverse IP addresses (wrt AS and network access type)
• Look for proxy nodes by poking them
Poking Botnets is Dangerous
• They have been known to fight back – DDoS IPs that poke them (even if low workers are scanned)
• They have been known to fabricate data for honeynets – Honeynet is a network of computers that sits in otherwise unused (dark) address space and is meant to be compromised by adackers
Privacy
What is Privacy? • Privacy is about PII • It is primarily a policy issue • Privacy is an issue of user educa0on
• Make sure users are aware of the poten0al use of the informa0on they provide
• Give the user control • Privacy is a security issue
• Security is needed to implement the policy
Security v. Privacy • Sometimes conflicting
• Many security technologies depend on identification
• Many approaches to privacy depend on hiding one’s identity
• Sometimes supportive • Privacy depends on protecting PII (personally
identifiable information) • Poor security makes it more difficult to protect
such information
3/29/12
11
Debate on Adribu0on • How much low level information should
be kept to help track down cyber attacks • Such information can be used to breach
privacy assurances • How long can such data be kept
Privacy is Not the Only Concern • Business Concerns
• Disclosing Information we think of as privacy-related can divulge business plans • Mergers • Product plans • Investigations
• Some “private” information is used for authentication • SSN • Credit card numbers
Aggrega0on of Data
• Consider whether it is safe to release information in aggregate • Such information is presumably no longer
personally identifiable • But given partial information, it is sometimes
possible to derive other information by combining it with the aggregated data.
Anonymiza0on of Data • Consider whether it is safe to release
information that has been stripped of so called personal identifiers • Such information is presumably no longer
personally identifiable • What is important is not just anonymity,
but linkability • If I can link multiple queries, I might be
able to infer the identity of the person issuing the query through one query, at which point, all anonymity is lost
Traffic Analysis
• Even when specifics of communication are hidden, the mere knowledge of communication between parties provides useful information to an adversary • E.g. pending mergers or acquisitions • Relationships between entities • Created visibility of the structure of an
organizations • Allows some inference about interests
Informa0on for Traffic Analysis
• Lists of the web sites you visit • Email logs • Phone records • Perhaps you expose the linkages
through web sites like linked in • Consider what information remains in
the clear when you design security protocols
3/29/12
12
Network Trace Sharing • Researchers need network data
– To validate their solu0ons – To mine and understand trends
• Sharing network data creates necessary diversity – Enables generaliza0on of results – Creates a lot of privacy concerns – Very few public traffic trace archives (CAIDA, WIDE, LBNL, ITA, PREDICT, CRAWDAD, MIT DARPA)