Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo,...

34
Surviving Internet Surviving Internet Catastrophes Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems (HotOS’03) USENIX Annual Technical Conference (USENIX’05) University of California, San Diego - 2004

Transcript of Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo,...

Page 1: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

Surviving Internet CatastrophesSurviving Internet Catastrophes

Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker

Hot Topics in Operating Systems (HotOS’03)

USENIX Annual Technical Conference (USENIX’05)

University of California, San Diego - 2004

Page 2: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

2

A typical day of an Internet worm…A typical day of an Internet worm…

Host A runs the Widows OS

Host B runs runs the Sux OS

Shut up!!

I exploit a vulnerability in the

Widows OS…

A

B

…but not in the Sux OS!

Data

Page 3: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

3

OutlineOutline

Introduction System model Host diversity Searching for replica sets

heuristics and simulations

The Phoenix Recovery System Implementations Security issues Prototype evaluation

Conclusions

Page 4: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

4

Setting up the stageSetting up the stage

Past worm outbreaks Code red (2001): compromised over 359,000 hosts Nimda (2001): multiple forms of infection Slammer (2003): fastest worm in history (90% of vulnerable hosts

in 10 minutes) Witty (2004): first to contain malicious payload

Coping with worms Containment is hard [Moore03]

Not possible if human intervention required Automatic detection [Singh04] Problem: Network evasion

Recover from catastrophes [HotOS03] Goal: minimize data loss

Page 5: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

5

Defining the problemDefining the problem

How are Internet pathogens successful? Shared vulnerabilities Vulnerability: design or implementation flaw in a software system

Survivable data Replicate data Informed replication

Replica sets based on shared vulnerabilities

How do we identify sets of vulnerabilities? Common software systems Leverage Internet diversity

Page 6: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

6

ChallengesChallenges

Understand the limitations Appropriate settings Quantify diversity

Searching for replica sets Scalable Balance load Small replica sets

Page 7: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

7

System modelSystem model

A set of hosts (H) A host fails by losing its state A set of attributes (A)

Attribute = software system Operating systems + Applications

Configuration One operating system Applications

A set of configurations ( )

AC 2 CHConf :

Attributes (Software systems)

Ho

sts

{ , , }

{ , , }

{ , , }

{ , , }

Co

nfig

ura

tion

s

Page 8: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

8

CoresCores

A set S H is a core iff:

Ideally A’ = A

Cores

minimal is

)(::':'

S

hConfaShAaAA

Ho

sts

{ , , }

{ , , }

{ , , }

{ , , }

Co

nfig

ura

tion

s

Page 9: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

9

Host diversityHost diversity

Diversity: distribution of configurations Skewed: not uniform

Study of the UCSD network nmap tool

Port scans: detect open ports OS fingerprinting: guess OS out of error messages

Total number of scanned devices: 11,963 2,963 general-purpose hosts (port data + OS)

Conservative assumptions Same open port = run the same service Ignore OS versions

Page 10: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

10

Top 10 operating systems and servicesTop 10 operating systems and services

OS Service

Windows 54.1% netbios-ssn 55.3%

Solaris 10.1% epmap 50.4%

Mac OS X 10.0% microsoft-ds 39.0%

Linux 10.0% sshd 30.7%

Mac OS 6.9% sunrpc 25.3%

FreeBSD 2.2% active directory 24.8%

IRIX 2.0% smtp 19.4%

HP-UX 1.1% httpd 18.0%

BSD/OS 0.9% ftpd 17.8%

Tru64 Unix 0.7% printer 15.6%

Page 11: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

11

Configuration distributionConfiguration distribution

Distribution is skewed 50% of hosts comprise:

All: 20% Multiple: 15% Top 100: 8%

Page 12: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

12

Visualizing diversityVisualizing diversity

Qualitative view More diversity across

operating systems Still a fair amount of

diversity for the same OS

Page 13: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

13

Searching for coresSearching for cores

What is the practical problem? Determine replica sets Our approach: find cores

Computing a core of optimal size is NP-complete Use heuristics Host as both client and server

Client: request cores Server: participates in cores

Core Host that requests it (original copy) Replicas

Page 14: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

14

Basic ideaBasic idea

Configuration

{ , , }

Configuration

{ , , }

Configuration

{ , , }

Configuration

{ , , }

Attributes (Software systems)

or

Possible cores

Page 15: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

15

Representing advertised configurationsRepresenting advertised configurations

Container abstraction Containers (B)

One for each operating system in A

Each container b B has a set SB(b) of sub-containers, one for each non-OS attribute in A

A host h advertises its configuration by associating itself with every sub-container s SB(b) b is the container for the OS of h s is the sub-container in SB(b) for some attribute of h

Page 16: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

16

Container abstractionContainer abstraction

{ , , }

{ , , }{ , , }

{ , , }

Page 17: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

17

HeuristicsHeuristics

Random Ignore configurations Choose randomly a number n of hosts from H

UniformI. Different OS

1. Choose a container b randomly

2. Choose a sub-container sb randomly from b

3. Choose a host randomly from sb

II. Same OS (same b where h is placed)1. Choose a sub-container sb randomly from b

2. Choose a host randomly from sb

Weighted: containers weighted by the number of hosts Doubly-weighted: sub-containers also weighted

Page 18: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

18

SimulationsSimulations

Population: 2,963 general-purpose hosts One run: Each host computes a core Questions

How much replication is needed? How many other hosts a particular host has to service? How well chosen cores protect hosts?

Metrics Average core size (core size)

Core size averaged across all the hosts

Maximum load (load) Maximum number of other hosts that any host services

Average coverage (coverage) Coverage: percentage of attributes covered in a core

Page 19: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

19

A sample runA sample run

Random Better load balance Worse coverage Worse core size

Load is too high for other heuristics Proposed modification

Limit the load of each host Intuition: force load balance Each host services at most L other hosts L = load limit or simply limit

Core size

Coverage Load

Random 5 0.977 12

Uniform 2.56 0.9997 284

Weighted 2.64 0.9995 84

DWeighted 2.58 0.9997 91

Page 20: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

20

Core sizeCore size

Random increases linearly with load Intrinsic to the

heuristic

Other heuristics Core size less than 3 For many hosts, one

single replica

Page 21: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

21

CoverageCoverage

Lower bound on limit: 2 Dependent on the diversity

Uniform: limit at least 3 to achieve 3 nines coverage

Weighted: achieves 3 nines coverage for limit values at least 2

Random: core size at least 9 to achieve same coverage

Page 22: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

22

Uncovered hostsUncovered hosts

Share of hosts that are not fully covered is small Uniform

Limit 3: slightly over 1% Limit > 4: around 0.5%

Weighted Around 0.5%

Random Core size greater than 8 to achieve similar results

Page 23: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

23

Load varianceLoad variance

Downside of uniform Worst variance

Variance is similar for small values of limit

Load limit forces better distribution

Page 24: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

24

Summary of simulation resultsSummary of simulation results

How many replicas are needed? Around 1.3 on average

How many other hosts a particular host has to service? Uniform: 3 for good coverage Weighted: 2 for good coverage

How well chosen cores protect hosts? Uniform: coverage greater than 0.999, L 3 Weighted: coverage greater than 0.999, L 2

Uniform heuristic Simpler

Weighted heuristics Better load balance

Page 25: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

25

Translating to real pathogensTranslating to real pathogens

Uniform, limit > 3, tolerates with high probability attacks to a single attribute

Previous worms One or more vulnerabilities on a single platform

Our approach tolerates Attacks to vulnerabilities on the same software system, possibly

cross-platform Attacks to vulnerabilities on different software systems in the same

platform

Attacks to vulnerabilities on different software systems, cross-platform Extensible approach

Page 26: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

26

Exploits on Exploits on kk attributes attributes

Illustrate with k=2 A variant of uniform

1. Client c chooses a host h with different OS

2. Find a core for c using uniform

3. Find a core for h using uniform

4. Combine the 2 cores to form a 2-resilient core

L 2-cov 1-cov Core size

5 0.76 0.86 4.18

6 0.86 0.92 4.58

7 0.95 0.99 5.00

8 0.97 1.00 5.11

9 0.98 1.00 5.16

10 0.98 1.00 5.17

Page 27: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

27

The Phoenix Recovery SystemThe Phoenix Recovery System

Backup data on cores Requirement: set of

operating systems and applications is not known

Macedon framework Pastry DHT

Advertising configurations Container Zone Sub-container Sub-zone

OS hint lists Empty zones Doesn’t need to be accurate

Page 28: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

28

ProtocolProtocol

Client Server ClientServer

Request

Reply

Data

Announce

Announce

Restorerequest

Restore

Backupmode

Recoverymode

Page 29: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

29

Security in PhoenixSecurity in Phoenix

Using security primitives Security goals

Data privacy: no host other than the owner of the data can obtain any partial information from the data stored on a server host

Data integrity: any tampering of the backup data should be detectable by the client host

Data availability: if a client stores data in an honest server, then it is eventually able to recover its data

Two modes Basic: software libraries Enhanced: requires devices such as smartcards

Cannot prevent servers from acting maliciously Proofs of operations

Page 30: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

30

Prototype evaluationPrototype evaluation

On PlanetLab Total number of hosts: 63

62 PlanetLab hosts 1 UCSD host

Configurations manually set 63 randomly chosen out of the 2,963

Page 31: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

31

Evaluation resultsEvaluation results

Simulated attack Parameters

Backup file: 5MB L = 3 Interval between

announcements: 120s

Target: Windows hosts (60%) Caused hosts to crash almost

simultaneously All hosts recovered

For 35: avg 100s For 3: several minutes

(transient network failures)

L Core size Coverage Load var.

Imp. Sim. Imp. Sim. Imp. Sim.

3 2.12 2.22 1 1 1.65 1.94

5 2.10 2.23 1 1 2.88 2.72

7 2.10 2.22 1 1 4.44 3.33

Imp. = implementation

Sim. = simulation

Page 32: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

32

ConclusionsConclusions

Informed replication Replica sets based on attributes Internet catastrophes: software systems

Survivable data at a low replication cost Core size is less than 3 on average Hosts service at most 3 other hosts

Diversity study Approach is realistic

Side-effects of load limit scheme Upper bounds the amount of work any host has to accomplish Constrain damage in case of individual malicious behavior

Page 33: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

33

Future workFuture work

Real deployment Tune current prototype Security features Cope with real threats

More data sets to determine diversity Mechanism to monitor resource usage Informed replication

With other approaches for cooperative backup With other types of attributes

E.g. Resource utilization

Page 34: Surviving Internet Catastrophes Flavio Junqueira, Alejandro Hevia, Ranjita Bhagwan, Keith Marzullo, and Geoffrey M. Voelker Hot Topics in Operating Systems.

34

END