Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo,...
-
Upload
norman-fletcher -
Category
Documents
-
view
213 -
download
0
Transcript of Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo,...
Surviving Internet CatastrophesSurviving Internet Catastrophes
Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker
Hot Topics in Operating Systems (HotOS’03)
USENIX Annual Technical Conference (USENIX’05)
University of California, San Diego - 2004
2
A typical day of an Internet worm…A typical day of an Internet worm…
Host A runs the Widows OS
Host B runs runs the Sux OS
Shut up!!
I exploit a vulnerability in the
Widows OS…
A
B
…but not in the Sux OS!
Data
3
OutlineOutline
Introduction System model Host diversity Searching for replica sets
heuristics and simulations
The Phoenix Recovery System Implementations Security issues Prototype evaluation
Conclusions
4
Setting up the stageSetting up the stage
Past worm outbreaks Code red (2001): compromised over 359,000 hosts Nimda (2001): multiple forms of infection Slammer (2003): fastest worm in history (90% of vulnerable hosts
in 10 minutes) Witty (2004): first to contain malicious payload
Coping with worms Containment is hard [Moore03]
Not possible if human intervention required Automatic detection [Singh04] Problem: Network evasion
Recover from catastrophes [HotOS03] Goal: minimize data loss
5
Defining the problemDefining the problem
How are Internet pathogens successful? Shared vulnerabilities Vulnerability: design or implementation flaw in a software system
Survivable data Replicate data Informed replication
Replica sets based on shared vulnerabilities
How do we identify sets of vulnerabilities? Common software systems Leverage Internet diversity
6
ChallengesChallenges
Understand the limitations Appropriate settings Quantify diversity
Searching for replica sets Scalable Balance load Small replica sets
7
System modelSystem model
A set of hosts (H) A host fails by losing its state A set of attributes (A)
Attribute = software system Operating systems + Applications
Configuration One operating system Applications
A set of configurations ( )
AC 2 CHConf :
Attributes (Software systems)
Ho
sts
{ , , }
{ , , }
{ , , }
{ , , }
Co
nfig
ura
tion
s
8
CoresCores
A set S H is a core iff:
Ideally A’ = A
Cores
minimal is
)(::':'
S
hConfaShAaAA
Ho
sts
{ , , }
{ , , }
{ , , }
{ , , }
Co
nfig
ura
tion
s
9
Host diversityHost diversity
Diversity: distribution of configurations Skewed: not uniform
Study of the UCSD network nmap tool
Port scans: detect open ports OS fingerprinting: guess OS out of error messages
Total number of scanned devices: 11,963 2,963 general-purpose hosts (port data + OS)
Conservative assumptions Same open port = run the same service Ignore OS versions
10
Top 10 operating systems and servicesTop 10 operating systems and services
OS Service
Windows 54.1% netbios-ssn 55.3%
Solaris 10.1% epmap 50.4%
Mac OS X 10.0% microsoft-ds 39.0%
Linux 10.0% sshd 30.7%
Mac OS 6.9% sunrpc 25.3%
FreeBSD 2.2% active directory 24.8%
IRIX 2.0% smtp 19.4%
HP-UX 1.1% httpd 18.0%
BSD/OS 0.9% ftpd 17.8%
Tru64 Unix 0.7% printer 15.6%
11
Configuration distributionConfiguration distribution
Distribution is skewed 50% of hosts comprise:
All: 20% Multiple: 15% Top 100: 8%
12
Visualizing diversityVisualizing diversity
Qualitative view More diversity across
operating systems Still a fair amount of
diversity for the same OS
13
Searching for coresSearching for cores
What is the practical problem? Determine replica sets Our approach: find cores
Computing a core of optimal size is NP-complete Use heuristics Host as both client and server
Client: request cores Server: participates in cores
Core Host that requests it (original copy) Replicas
14
Basic ideaBasic idea
Configuration
{ , , }
Configuration
{ , , }
Configuration
{ , , }
Configuration
{ , , }
Attributes (Software systems)
or
Possible cores
15
Representing advertised configurationsRepresenting advertised configurations
Container abstraction Containers (B)
One for each operating system in A
Each container b B has a set SB(b) of sub-containers, one for each non-OS attribute in A
A host h advertises its configuration by associating itself with every sub-container s SB(b) b is the container for the OS of h s is the sub-container in SB(b) for some attribute of h
16
Container abstractionContainer abstraction
{ , , }
{ , , }{ , , }
{ , , }
17
HeuristicsHeuristics
Random Ignore configurations Choose randomly a number n of hosts from H
UniformI. Different OS
1. Choose a container b randomly
2. Choose a sub-container sb randomly from b
3. Choose a host randomly from sb
II. Same OS (same b where h is placed)1. Choose a sub-container sb randomly from b
2. Choose a host randomly from sb
Weighted: containers weighted by the number of hosts Doubly-weighted: sub-containers also weighted
18
SimulationsSimulations
Population: 2,963 general-purpose hosts One run: Each host computes a core Questions
How much replication is needed? How many other hosts a particular host has to service? How well chosen cores protect hosts?
Metrics Average core size (core size)
Core size averaged across all the hosts
Maximum load (load) Maximum number of other hosts that any host services
Average coverage (coverage) Coverage: percentage of attributes covered in a core
19
A sample runA sample run
Random Better load balance Worse coverage Worse core size
Load is too high for other heuristics Proposed modification
Limit the load of each host Intuition: force load balance Each host services at most L other hosts L = load limit or simply limit
Core size
Coverage Load
Random 5 0.977 12
Uniform 2.56 0.9997 284
Weighted 2.64 0.9995 84
DWeighted 2.58 0.9997 91
20
Core sizeCore size
Random increases linearly with load Intrinsic to the
heuristic
Other heuristics Core size less than 3 For many hosts, one
single replica
21
CoverageCoverage
Lower bound on limit: 2 Dependent on the diversity
Uniform: limit at least 3 to achieve 3 nines coverage
Weighted: achieves 3 nines coverage for limit values at least 2
Random: core size at least 9 to achieve same coverage
22
Uncovered hostsUncovered hosts
Share of hosts that are not fully covered is small Uniform
Limit 3: slightly over 1% Limit > 4: around 0.5%
Weighted Around 0.5%
Random Core size greater than 8 to achieve similar results
23
Load varianceLoad variance
Downside of uniform Worst variance
Variance is similar for small values of limit
Load limit forces better distribution
24
Summary of simulation resultsSummary of simulation results
How many replicas are needed? Around 1.3 on average
How many other hosts a particular host has to service? Uniform: 3 for good coverage Weighted: 2 for good coverage
How well chosen cores protect hosts? Uniform: coverage greater than 0.999, L 3 Weighted: coverage greater than 0.999, L 2
Uniform heuristic Simpler
Weighted heuristics Better load balance
25
Translating to real pathogensTranslating to real pathogens
Uniform, limit > 3, tolerates with high probability attacks to a single attribute
Previous worms One or more vulnerabilities on a single platform
Our approach tolerates Attacks to vulnerabilities on the same software system, possibly
cross-platform Attacks to vulnerabilities on different software systems in the same
platform
Attacks to vulnerabilities on different software systems, cross-platform Extensible approach
26
Exploits on Exploits on kk attributes attributes
Illustrate with k=2 A variant of uniform
1. Client c chooses a host h with different OS
2. Find a core for c using uniform
3. Find a core for h using uniform
4. Combine the 2 cores to form a 2-resilient core
L 2-cov 1-cov Core size
5 0.76 0.86 4.18
6 0.86 0.92 4.58
7 0.95 0.99 5.00
8 0.97 1.00 5.11
9 0.98 1.00 5.16
10 0.98 1.00 5.17
27
The Phoenix Recovery SystemThe Phoenix Recovery System
Backup data on cores Requirement: set of
operating systems and applications is not known
Macedon framework Pastry DHT
Advertising configurations Container Zone Sub-container Sub-zone
OS hint lists Empty zones Doesn’t need to be accurate
28
ProtocolProtocol
Client Server ClientServer
Request
Reply
Data
Announce
Announce
Restorerequest
Restore
Backupmode
Recoverymode
29
Security in PhoenixSecurity in Phoenix
Using security primitives Security goals
Data privacy: no host other than the owner of the data can obtain any partial information from the data stored on a server host
Data integrity: any tampering of the backup data should be detectable by the client host
Data availability: if a client stores data in an honest server, then it is eventually able to recover its data
Two modes Basic: software libraries Enhanced: requires devices such as smartcards
Cannot prevent servers from acting maliciously Proofs of operations
30
Prototype evaluationPrototype evaluation
On PlanetLab Total number of hosts: 63
62 PlanetLab hosts 1 UCSD host
Configurations manually set 63 randomly chosen out of the 2,963
31
Evaluation resultsEvaluation results
Simulated attack Parameters
Backup file: 5MB L = 3 Interval between
announcements: 120s
Target: Windows hosts (60%) Caused hosts to crash almost
simultaneously All hosts recovered
For 35: avg 100s For 3: several minutes
(transient network failures)
L Core size Coverage Load var.
Imp. Sim. Imp. Sim. Imp. Sim.
3 2.12 2.22 1 1 1.65 1.94
5 2.10 2.23 1 1 2.88 2.72
7 2.10 2.22 1 1 4.44 3.33
Imp. = implementation
Sim. = simulation
32
ConclusionsConclusions
Informed replication Replica sets based on attributes Internet catastrophes: software systems
Survivable data at a low replication cost Core size is less than 3 on average Hosts service at most 3 other hosts
Diversity study Approach is realistic
Side-effects of load limit scheme Upper bounds the amount of work any host has to accomplish Constrain damage in case of individual malicious behavior
33
Future workFuture work
Real deployment Tune current prototype Security features Cope with real threats
More data sets to determine diversity Mechanism to monitor resource usage Informed replication
With other approaches for cooperative backup With other types of attributes
E.g. Resource utilization
34
END