Heading Off Correlated Failures through Independence-as-a-Service
Ennan Zhai1
Ruichuan Chen2, David Isaac Wolinsky1, Bryan Ford1
1 Yale University 2 Bell Labs/Alcatel-Lucent
• Cloud services ensure reliability by redundancy:- Amazon S3 replicates data on multiple racks- iCloud rents EC2 and Azure redundantly
Background
• Cloud services ensure reliability by redundancy:- Amazon S3 replicates data on multiple racks- iCloud rents EC2 and Azure redundantly
Background
• Cloud services ensure reliability by redundancy:- Amazon S3 replicates data on multiple racks- iCloud rents EC2 and Azure redundantly
Background
• Cloud services ensure reliability by redundancy:- Amazon S3 replicates data on multiple racks- iCloud rents EC2 and Azure redundantly
Background
Unexpected common dependencies!
Service Outage Losses
Data Center Outages Generate Big LossesDowntime in a data center can cost an average of $505,500 per incident, according to a Ponemon Institute study.
Analytics Slideshow: 2010 Data Center
Operational Trends Report
Service Outage Losses
Data Center Outages Generate Big LossesDowntime in a data center can cost an average of $505,500 per incident, according to a Ponemon Institute study.
Analytics Slideshow: 2010 Data Center
Operational Trends Report
What is correlated failure?
Rack1
Switch1
Rack2
Switch2
Rack3
Switch3
What is correlated failure?
Rack1
Switch1
Rack2
Switch2
Rack3
Switch3
Primary Backup Backup
What is correlated failure?
Agg Switch
Rack1
Switch1
Rack2
Switch2
Rack3
Switch3
Primary Backup Backup
What is correlated failure?
Agg Switch
Rack1
Switch1
Rack2
Switch2
Rack3
Switch3
Primary Backup Backup
What is correlated failure?
Agg Switch
Rack1
Switch1
Rack2
Switch2
Rack3
Switch3
Primary Backup Backup
What is correlated failure?
Realistic Example
Summary of the October 22, 2012 AWS Service Event in the US-East Region
We’d like to share more about the service event that occurred on Monday, October 22nd in the US-East Region. We have now completed the analysis of the events that affected AWS customers, and we want to describe what happened, our understanding of how customers were affected, and what we are doing to prevent a similar issue from occurring in the future.
The Primary Event and the Impact to Amazon Elastic Block Store (EBS) and Amazon Elastic Compute Cloud (EC2)
Realistic Example
Correlated failures resulting from EBSdue to bugs in one EBS server
Elastic Compute Cloud (EC2)
Elastic Block Store (EBS)
Realistic Example
... ...
Elastic Block Store (EBS)
Realistic Example
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
Elastic Block Store (EBS)
Realistic Example
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
EBS Server2EBS Server1
Realistic Example
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
EBS Server2EBS Server1
Realistic Example
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
EBS Server2EBS Server1
Realistic Example
... ...... ...
VM1 VM2 VM1 VM3 VM1 VM2 VM3 VM4
EBS Server2EBS Server1
Realistic Example
Even Worse
Video App
Cloud Provider A Cloud Provider B
Video App
Cloud Provider A Cloud Provider B
Third-party infrastructure components
Video App
Cloud Provider A Cloud Provider B
Third-party infrastructure components
ISP Router BISP Router A ISP Router C
Video App
Cloud Provider A Cloud Provider B
Third-party infrastructure components
Power Source
ISP Router BISP Router A ISP Router C
Video App
Cloud Provider A Cloud Provider B
Third-party infrastructure components
Power Source
ISP Router BISP Router A ISP Router C
Video App
Cloud Provider A Cloud Provider B
Third-party infrastructure components
Power Source
ISP Router BISP Router A ISP Router C
Video App
Cloud Provider A Cloud Provider B
Third-party infrastructure components
Power Source
ISP Router BISP Router A ISP Router C
Cloud providers do not usually share
information about their dependencies
• Cloud providers allocate or tolerate failures via: - diagnosis systems;- fault-tolerant systems.
Existing Efforts
• Solving the problem after outage occurs.
• We want to prevent the problem before the outage occurs.
Existing Efforts
• Cloud providers allocate or tolerate failures via: - diagnosis systems;- fault-tolerant systems.
• Solving the problem after outage occurs.
• Prevent correlated failures before outage occurs.
Existing Efforts
• Cloud providers allocate or tolerate failures via: - diagnosis systems;- fault-tolerant systems.
• Solving the problem after outage occurs.
• Prevent correlated failures before outage occurs.
Existing Efforts
Independence-as-a-Service(INDaaS)
• Cloud providers allocate or tolerate failures via: - diagnosis systems;- fault-tolerant systems.
INDaaS
INDaaS Workflow
Service Provider, Alice
A Given Redundancy Configuration
INDaaS
INDaaS Workflow
Two-Way Redundancy Configuration
Service Provider, Alice
DependencyData Source1
DependencyData Source2
INDaaS
INDaaS Workflow
Service Provider, Alice
DependencyData Source1
DependencyData Source2
Two-Way Redundancy Configuration
Independence of this two-way redundancy ?
INDaaS
Step1: Specification Submission
DependencyData Source1
DependencyData Source2
INDaaS Workflow
Service Provider, Alice
INDaaS
Step1
DependencyData Source1
DependencyData Source2
Step2: Dependency data collection
Step2: Dependency data collection
INDaaS Workflow
Service Provider, Alice
INDaaS
Step1
Step2 Step2
DependencyData Source1
DependencyData Source2
Step3: Independence Evaluation
INDaaS Workflow
Service Provider, Alice
INDaaS
Step1
Relative Independence is 0.3
Step2 Step2
Step3
DependencyData Source1
DependencyData Source2
INDaaS Workflow
Service Provider, Alice
INDaaS
Step1
Step2 Step2
Multiple Cloud Providers
Step3
DependencyData Source1
DependencyData Source2
Service Provider, Alice
INDaaS
Step1
Step2 Step2Unwilling to share the
dependency dataUnwilling to share the
dependency data
Step3
Multiple Cloud Providers
✘ ✘
Data Source1(Cloud1)
Data Source2(Cloud2)
Service Provider, Alice
INDaaS
Step1
Step2 Step2
Private Independence Evaluation
Data Source1(Cloud1)
Data Source2(Cloud2)
Step3:Private auditing
Service Provider, Alice
INDaaS
Step1
Step2 Step2
Step3Step4 St
ep4
Service Provider, Alice
Private Independence Evaluation
Data Source1(Cloud1)
Data Source2(Cloud2)
INDaaS
Step1
Step2 Step2
Step3
Service Provider, Alice
Private Independence Evaluation
Only know my information
Only know my information
Data Source1(Cloud1)
Data Source2(Cloud2)
Step4 Step
4
INDaaS
Step1
Step2 Step2
Step3
Only the relative independence
Service Provider, Alice
Private Independence Evaluation
Data Source1(Cloud1)
Data Source2(Cloud2)
Step4 Step
4
INDaaS
Step1
Step2 Step2
Step3
Step5
Service Provider, Alice
Private Independence Evaluation
Data Source1(Cloud1)
Data Source2(Cloud2)
Step4 Step
4
Technical Challenges
INDaaS
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #1: Dependency collections- Solution: Reusing existing tools
Technical Challenges
INDaaS
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #2: Dependency representation- Solution: Fault graphs
• #1: Dependency collections- Solution: Reusing existing tools
Technical Challenges
INDaaS
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #2: Dependency representation- Solution: Fault graphs
• #1: Dependency collections- Solution: Reusing existing tools
Technical Challenges
• #3: Efficient auditing- Solution: Failure sampling algorithm
INDaaS
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
Technical Challenges
• #4: Private independence audit- Solution: Private Jaccard similarity
• #3: Efficient auditing- Solution: Failure sampling algorithm
• #1: Dependency collections- Solution: Reusing existing tools
INDaaS
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #2: Dependency representation- Solution: Fault graphs
INDaaS
Step1 Step5
Step3
• #4: Private independence audit- Solution: Private Jaccard similarity
• #3: Efficient auditing- Solution: Failure sampling algorithm
• #1: Dependency collections- Solution: Reusing existing tools
RoadMap
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #2: Dependency representation- Solution: Fault graphs
INDaaS
Step1 Step5
Step3
• #4: Private independence audit- Solution: Private Jaccard similarity
• #3: Efficient auditing- Solution: Failure sampling algorithm
• #1: Dependency collections- Solution: Reusing existing tools
RoadMap
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #2: Dependency representation- Solution: Fault graphs
Dependency Data Collections
Type Dependency Expression
Network <src=”S” dst=”D” route=”x,y,z”/>
Hardware <hw=”H” type=”T” dep=”x”/>
Software <pgm=”S” hw=”H” dep=”x,y,z”/>
Our defined format
• Reuse existing data collection tools: - Convert the outputs to uniform format. - Three types of format: NET, HW and SW.
Please see our paper for more details
• #2: Dependency representation- Solution: Fault graphs
• #3: Efficient auditing- Solution: Failure sampling algorithm
• #1: Dependency collections- Solution: Reusing existing tools
RoadMap
INDaaS
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #4: Private independence audit- Solution: Private Jaccard similarity
• #2: Dependency representation- Solution: Fault graphs
• #3: Efficient auditing- Solution: Failure sampling algorithm
• #1: Dependency collections- Solution: Reusing existing tools
RoadMap
INDaaS
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #4: Private independence audit- Solution: Private Jaccard similarity
Example Redundancy
Example Redundancy
SW
HW
NET
Example Redundancy
Building Fault Graph Top-to-Bottom
SW
HW
NET
Redundancy configuration fails
Step1: Root Node
Server 2 failsServer 1 fails
Redundancy configuration fails
Step2: Server Nodes
AND gate: all the sublayer nodes fail, the upper layer node fails
Server 2 failsServer 1 fails
Step2: Server NodesRedundancy configuration fails
Net fails
+" +"
HW fails SW fails SW fails
Server 2 failsServer 1 fails
Net fails HW fails
Redundancy configuration fails
Step3: Dependency Nodes
OR gate: one of the sublayer nodes fails, the upper layer node fails
Net fails
+" +"
HW fails SW fails SW fails
Server 2 failsServer 1 fails
Net fails HW fails
Step3: Dependency NodesRedundancy configuration fails
Net fails
+" +"
HW fails
Disk2CPU2
+"
SW fails SW fails
Server 2 failsServer 1 fails
Disk1CPU1
+"
Net fails HW fails
Step4: Hardware DependencyRedundancy configuration fails
Net fails
+" +"
ToR1 Core2Core1
HW fails
+" +"
Disk2CPU2
+"
Path1 Path2
SW fails SW fails
Server 2 failsServer 1 fails
Disk1CPU1
+"
Net fails HW fails
Step5: Network DependencyRedundancy configuration fails
Net fails
+" +"
ToR1 Core2Core1
HW fails
libc6 libccllibsvnl
+" +"
Disk2CPU2
+"
Path1 Path2
+"
Riak
+"
Query
+"
SW fails SW fails
Server 2 failsServer 1 fails
Disk1CPU1
+"
Net fails HW fails
Step6: Software DependencyRedundancy configuration fails
• #2: Dependency representation- Solution: Fault graphs
• #3: Efficient auditing- Solution: Failure sampling algorithm
• #1: Dependency collections- Solution: Reusing existing tools
RoadMap
INDaaS
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #4: Private independence audit- Solution: Private Jaccard similarity
• #2: Dependency representation- Solution: Fault graphs
• #3: Efficient auditing- Solution: Failure sampling algorithm
• #1: Dependency collections- Solution: Reusing existing tools
RoadMap
INDaaS
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #4: Private independence audit- Solution: Private Jaccard similarity
• Two algorithms balancing cost and accuracy: - Minimal fault set algorithm- Failure sampling algorithm
Efficient Auditing
• Two algorithms balancing cost and accuracy: - Minimal fault set algorithm- Failure sampling algorithm
Efficient Auditing
Minimal Fault Set Algorithm
• Traditional algorithm in safety engineering- Exponential complexity (NP-hard)
• We are the first to apply it in Cloud area:- Analyzing a fat tree with 30,528 with ~40 hours
• We propose efficient failure sampling algorithm.
Minimal Fault Set Algorithm
• Traditional algorithm in safety engineer- Exponential complexity (NP-hard)
• We are the first to apply it in Cloud area:- Analyzing a fat tree with 30,528 with ~40 hours
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
Failure Sampling Algorithm
1 or 0 1 or 0 1 or 0
Failure Sampling Algorithm
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
1 or 0 ?
1 or 0 1 or 0 1 or 0
Failure Sampling Algorithm
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
Fault Sets
∅
1 or 0 1 or 0 1 or 0
Failure Sampling Algorithm
1 or 0 ?Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
The 1st Sampling Round
Fault Sets
∅
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
1 or 0 1 or 0 1 or 0
The 1st Sampling Round
Fault Sets
∅
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔✘ ✘
The 1st Sampling Round
Fault Sets
∅
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔✘ ✘
✘
The 1st Sampling Round
Fault Sets
∅
✘ ✘
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔✘ ✘Fault Sets
{Server1’s HW, Server2’s HW}
The 1st Sampling Round
✘
✘ ✘
Fault Sets
{Server1’s HW, Server2’s HW}
The 2nd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
1 or 0 1 or 0 1 or 0
Fault Sets
{Server1’s HW, Server2’s HW}
The 2nd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔ ✘✔Fault Sets
{Server1’s HW, Server2’s HW}
The 2nd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔ ✘✔
✔
Fault Sets
{Server1’s HW, Server2’s HW}
The 2nd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔ ✘
Fault Sets
{Server1’s HW, Server2’s HW}
The 3rd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔✘✔Fault Sets
{Server1’s HW, Server2’s HW}
The 3rd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✔✘✔Fault Sets
{Server1’s HW, Server2’s HW}
The 3rd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✘
✘ ✘
✔✘✔Fault Sets
{Server1’s HW, Server2’s HW}
{Switch1}
The 3rd Sampling Round
Redundancy configuration fails
+" +"
Switch1 fails
Server 2 failsServer 1 fails
Server2’s HW failsServer1’s HW fails
✘
✘ ✘
Fault Sets{Server1’s HW, Server2’s HW}
{Switch1}
{Switch1, Server2’s HW}
{Switch1}
{Switch1, Server2’s HW}
... ...
After Many (e.g., 107) Rounds
Fault Sets{Server1’s HW, Server2’s HW}
{Switch1}
{Switch1, Server2’s HW}
{Switch1}
{Switch1, Server2’s HW}
... ...
Size-Based Ranking
Fault Sets{Switch1}
{Switch1}
{Switch1, Server2’s HW}
{Switch1, Server2’s HW}
{Server1’s HW, Server2’s HW}
... ...
Size-Based Ranking
• Multiple equations for option: - summation of sizes- weighted average of sizes
Independence Evaluation
Fault Sets{Switch1}
{Switch1}
{Switch1, Server2’s HW}
{Switch1, Server2’s HW}
{Server1’s HW, Server2’s HW}
... ...
• #2: Dependency representation- Solution: Fault graphs
• #3: Efficient auditing- Solution: Failure sampling algorithm
• #1: Dependency collections- Solution: Reusing existing tools
RoadMap
INDaaS
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #4: Private independence audit- Solution: Private Jaccard similarity
• #2: Dependency representation- Solution: Fault graphs
• #3: Efficient auditing- Solution: Failure sampling algorithm
• #1: Dependency collections- Solution: Reusing existing tools
RoadMap
INDaaS
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #4: Private independence audit- Solution: Private Jaccard similarity
• #2: Dependency representation- Solution: Fault graphs
• #3: Efficient auditing- Solution: Failure sampling algorithm
• #1: Dependency collections- Solution: Reusing existing tools
RoadMap
INDaaS
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
Data Source1(Cloud1)
Data Source2(Cloud2)
• #4: Private independence audit- Solution: Private Jaccard similarity
Cloud A Cloud B Cloud C
INDaaS AgentService Provider
Cloud A Cloud B Cloud C
INDaaS Agent
ISP A Power B ISP B Power CPower A
Service Provider
Cloud A Cloud B Cloud C
Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent
ISP A Power B ISP B Power CPower A
Service Provider
Cloud A Cloud B Cloud C
Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent
Trusted Third-Party
ISP A Power B ISP B Power CPower A
Service Provider
Service Provider
Cloud A Cloud B Cloud C
Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent
Trusted Third-Party
Cloud providers are reluctant to share this information!
ISP A Power B ISP B Power CPower A
Cloud A Cloud B Cloud C
Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent
Secure Multiparty Computation (SMPC)
SMPC
ISP A Power B ISP B Power CPower A
Service Provider
Cloud A Cloud B Cloud C
Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent
Secure Multiparty Computation (SMPC)
SMPCSMPC is hard to scale![Xiao et al. CCSW’13]
ISP A Power B ISP B Power CPower A
Cloud A Cloud B Cloud C
Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent
ISP A Power B ISP B Power CPower A
Service Provider
Cloud A Cloud B Cloud C
INDaaS Agent
ISP A Power B ISP B Power CPower A
Service Provider
Cloud A Cloud B Cloud C
Evaluating independence by the dataset similarity between clouds
INDaaS Agent
ISP A Power B ISP B Power CPower A
Service Provider
Cloud A Cloud B Cloud C
INDaaS Agent
ISP A Power B ISP B Power CPower A
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS AgentService Provider
INDaaS Agent
Cloud A Cloud B Cloud C
App Provider Auditor
ISP APower APower B
ISP BPower APower B
ISP BPower C
Using Jaccard similarity to evaluate the independence of each redundancy configuration.
ISP A Power B ISP B Power CPower A
INDaaS Agent
Cloud A Cloud B Cloud C
App Provider
ISP APower APower B
ISP BPower APower B
ISP BPower C
S1 S2 ... ... Sn
S1 S2 ... ... Sn
J(S1, S2, ..., Sn) = | |
| |
ISP A Power B ISP B Power CPower A
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS AgentService Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
=2=4
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
=2=4
J = 2/4
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower APower B
ISP BPower C
INDaaS Agent
Deployment SimCloud A&B 0.5
ISP A Power B ISP B Power CPower A
=2=4
J = 2/4
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower APower B
ISP BPower C
INDaaS Agent
Deployment SimCloud A&B 0.5Cloud B&C 0.25
ISP A Power B ISP B Power CPower A
=1=4
J = 1/4
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
Deployment SimCloud A&B 0.5Cloud B&C 0.25Cloud A&C 0
=0 =5 J = 0/5,,
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
Deployment SimCloud A&B 0.5Cloud B&C 0.25Cloud A&C 0
=0 =5 ,,
0 means fully independentService Provider
J = 0/5
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
Deployment SimCloud A&B 0.5Cloud B&C 0.25Cloud A&C 0
=0 =5 J = 0/5,,
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
Deployment SimCloud A&C 0Cloud B&C 0.25Cloud A&B 0.5
=0 =5 J = 0/5,,
Service Provider
Cloud A Cloud B Cloud C
INDaaS Agent
Deployment SimCloud A&C 0Cloud B&C 0.25Cloud A&B 0.5
ISP A Power B ISP B Power CPower A
Service Provider
S1 S2 ... ... Sn
S1 S2 ... ... Sn
J(S1, S2, ..., Sn) = | |
| |
P-SOP [Vaidya et al. JCS05]
• We apply Private Set Operation Protocol (P-SOP): - Private set intersection cardinality.- Private set union cardinality.
P-SOP [Vaidya et al. JCS05]
• Allow k parties to compute both intersection and union cardinalities without learning other information.
11
3
10
1
5
20
3
7
3
P-SOP [Vaidya et al. JCS05]
• Allow k parties to compute both intersection and union cardinalities without learning other information.
11
3
10
1
5
20
3
7
3
P-SOP
P-SOP [Vaidya et al. JCS05]
• Allow k parties to compute both intersection and union cardinalities without learning other information.
11
3
10
1
5
20
3
7
3
P-SOP
=1, =7=1, =7
=1, =7
P-SOP [Vaidya et al. JCS05]
• Allow k parties to compute both intersection and union cardinalities without learning other information.
11
3
10
1
5
20
3
7
3
Protocol
But I do not know which elements are overlapping/union.
P-SOP
But I do not know which elements are overlapping/union.
But I do not know which elements are overlapping/union.
P-SOP [Vaidya et al. JCS05]
• Allow k parties to compute both intersection and union cardinalities without learning other information.
Each party maintains a commutative encryption key
Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))
37
15
203Kd
Kf
Ke
113
10
P-SOP [Vaidya et al. JCS05]
Each party maintains a commutative encryption key
Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))
37
15
203Kd
Kf
Ke
113
10
P-SOP [Vaidya et al. JCS05]
37
15
203Kd
Kf
Ke
113
10
P-SOP [Vaidya et al. JCS05]
Each party maintains a commutative encryption key
Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))
37
15
203Kd
Kf
Ke
113
10
Ef(Ee(Ed(3)))Ef(Ee(Ed(10)))Ef(Ee(Ed(11)))
Ee(Ed(Ef(3)))Ee(Ed(Ef(7)))
Ed(Ef(Ee(3)))Ed(Ef(Ee(5)))Ed(Ef(Ee(1)))
Ed(Ef(Ee(20)))
P-SOP [Vaidya et al. JCS05]
Each party maintains a commutative encryption key
Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))
Kd
Kf
Ke
P-SOP [Vaidya et al. JCS05]
Each party maintains a commutative encryption key
Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))
113
10
Ef(Ee(Ed(3)))Ef(Ee(Ed(10)))Ef(Ee(Ed(11)))
37
Ee(Ed(Ef(3)))Ee(Ed(Ef(7)))
15
203
Ed(Ef(Ee(3)))Ed(Ef(Ee(5)))Ed(Ef(Ee(1)))
Ed(Ef(Ee(20)))
Kd
Kf
Ke
P-SOP [Vaidya et al. JCS05]
Each party maintains a commutative encryption key
Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))
113
10
Ef(Ee(Ed(3)))Ef(Ee(Ed(10)))Ef(Ee(Ed(11)))
37
Ee(Ed(Ef(3)))Ee(Ed(Ef(7)))
15
203
Ed(Ef(Ee(3)))Ed(Ef(Ee(5)))Ed(Ef(Ee(1)))
Ed(Ef(Ee(20)))
Kd
Kf
Ke
P-SOP [Vaidya et al. JCS05]
Each party maintains a commutative encryption key
Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))
113
10
Ef(Ee(Ed(3)))Ef(Ee(Ed(10)))Ef(Ee(Ed(11)))
37
Ee(Ed(Ef(3)))Ee(Ed(Ef(7)))
15
203
Ed(Ef(Ee(3)))Ed(Ef(Ee(5)))Ed(Ef(Ee(1)))
Ed(Ef(Ee(20)))
Ef(Ee(Ed(3)))
Kd
Kf
Ke
P-SOP [Vaidya et al. JCS05]
Each party maintains a commutative encryption key
Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))
11Ef(Ee(Ed(3)))Ef(Ee(Ed(10)))Ef(Ee(Ed(11)))
3Ee(Ed(Ef(3)))
Ee(Ed(Ef(7)))
1Ed(Ef(Ee(3)))
Ed(Ef(Ee(5)))Ed(Ef(Ee(1)))
Ed(Ef(Ee(20)))
Ef(Ee(Ed(3)))
Kd
Kf
Ke
P-SOP [Vaidya et al. JCS05]
Each party maintains a commutative encryption key
Commutative encryption holds: Ex(Ey(m)) = Ey(Ex(m))
Ef(Ee(Ed(10)))Ef(Ee(Ed(11)))Ee(Ed(Ef(7)))Ed(Ef(Ee(5)))Ed(Ef(Ee(1)))
Ed(Ef(Ee(20)))
Ef(Ee(Ed(3)))
7
Private Independence Evaluation
Cloud A Cloud B Cloud C
Select two clouds for redundancy: A&B? B&C? or A&C? INDaaS Agent
ISP A Power B ISP B Power CPower A
Service Provider
Cloud A Cloud B Cloud C
INDaaS Agent
ISP A Power B ISP B Power CPower A
Service Provider
Cloud A Cloud B Cloud C
INDaaS Agent
ISP A Power B ISP B Power CPower A
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS AgentService Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
P-SOP
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower APower B
ISP BPower C
=2
ISP A Power B ISP B Power CPower A
INDaaS AgentService Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower APower B
ISP BPower C
=2=4
ISP A Power B ISP B Power CPower A
INDaaS AgentService Provider
Cloud A Cloud B Cloud C
ISP BPower C
=2=4
J = 2/4
ISP A Power B ISP B Power CPower A
INDaaS Agent
ISP APower APower B
ISP BPower APower B
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower APower B
ISP BPower C
=2=4
J = 2/4
ISP A Power B ISP B Power CPower A
INDaaS Agent
Deployment SimCloud A&B 0.5
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
P-SOP
Deployment SimCloud A&B 0.5
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP A Power B ISP B Power CPower A
INDaaS Agent
ISP BPower APower B
ISP BPower C
=1=4
J = 1/4
Deployment SimCloud A&B 0.5
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP A Power B ISP B Power CPower A
INDaaS Agent
ISP BPower APower B
ISP BPower C
=1=4
J = 1/4
Deployment SimCloud A&B 0.5Cloud B&C 0.25
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
P-SOP
Deployment SimCloud A&B 0.5Cloud B&C 0.25
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
=0 =5 J = 0/5,,
Deployment SimCloud A&B 0.5Cloud B&C 0.25
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
=0 =5 J = 0/5,,
Deployment SimCloud A&B 0.5Cloud B&C 0.25Cloud A&C 0
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
=0 =5 J = 0/5,,
Deployment SimCloud A&B 0.5Cloud B&C 0.25Cloud A&C 0
Service Provider
Cloud A Cloud B Cloud C
ISP APower APower B
ISP BPower C
ISP A Power B ISP B Power CPower A
INDaaS Agent
=0 =5 J = 0/5,,
Deployment SimCloud A&C 0Cloud B&C 0.25Cloud A&B 0.5
Service Provider
Cloud A Cloud B Cloud C
INDaaS Agent
Deployment SimCloud A&C 0Cloud B&C 0.25Cloud A&B 0.5
ISP A Power B ISP B Power CPower A
Service Provider
Cloud A Cloud B Cloud C
INDaaS AgentDeployment SimCloud A&C 0Cloud B&C 0.25Cloud A&B 0.5
ISP A Power B ISP B Power CPower A
Service Provider
• #2: Dependency representation- Solution: Fault graphs
• #3: Efficient auditing- Solution: Failure sampling algorithm
• #1: Dependency collections- Solution: Reusing existing tools
RoadMap
INDaaS Agent
Step1 Step5
Step3
Step2
Step
4
Step2
Step4
DependencyData Source1
DependencyData Source2
• #4: Private independence audit- Solution: Private Jaccard similarity
Evaluation
Evaluation
• Three realistic case studies.• Tradeoff between auditing algorithms• Overhead of P-SOP protocol
Evaluation
• Three realistic case studies.• Tradeoff between auditing algorithms• Overhead of P-SOP protocol
Three Case Studies
• Common network dependency
• Common hardware dependency
• Common software dependency
Please see our paper for more details
Evaluation
• Three realistic case studies.• Tradeoff between auditing algorithms• Overhead of P-SOP protocol
• We evaluate efficiency/accuracy tradeoff.• We generate topology based on fat tree model.• We also bu
Tradeoff Evaluation
Topology A Topology B Topology C
# of Core Routers 64 144 576
# of Agg Switches 128 288 1,152
# of ToR Switches 128 288 1,152
# of Servers 1,024 3,456 27,648
Total # of devices 1,344 4,176 30,528
Tradeoff Evaluation
Tradeoff Evaluation
Topology A Topology B Topology C
# of Core Routers 64 144 576
# of Agg Switches 128 288 1,152
# of ToR Switches 128 288 1,152
# of Servers 1,024 3,456 27,648
Total # of devices 1,344 4,176 30,528
Topology C: 30,528 Devices
30
40
50
60
70
80
90
100
1 2 4 8 16 32 64 128 256 512 2048% c
ritic
al fa
ult s
ets
dete
cted
Computational time (minutes)
Minimal Fault Set AlgorithmFailure Sampling Algorithm
Failure Sampling Algorithm (106)
Failure Sampling Algorithm (104)
Minimal Fault Set Algorithm
Topology C: 30,528 Devices
30
40
50
60
70
80
90
100
1 2 4 8 16 32 64 128 256 512 2048% c
ritic
al fa
ult s
ets
dete
cted
Computational time (minutes)
Minimal Fault Set AlgorithmFailure Sampling Algorithm
Failure Sampling Algorithm (106)
Failure Sampling Algorithm (104)
Minimal Fault Set Algorithm
Evaluation
• Three realistic case studies.• Tradeoff between auditing algorithms• Overhead of P-SOP protocol
What we evaluate?
• Kissner and Song (KS) protocol for comparison.• Bandwidth overhead of P-SOP and KS.• Computational overhead of P-SOP and KS.
Bandwidth Overhead
0
50
100
150
200
1000 10000 100000
Tota
l tra
ffic
sent
(MB)
Number of elements in each provider’s dataset
KS (4)P-SOP (4)
KS (3)P-SOP (3)
KS (2)P-SOP (2)
Bandwidth Overhead
0
50
100
150
200
1000 10000 100000
Tota
l tra
ffic
sent
(MB)
Number of elements in each provider’s dataset
KS (4)P-SOP (4)
KS (3)P-SOP (3)
KS (2)P-SOP (2)
Bandwidth Overhead
0
50
100
150
200
1000 10000 100000
Tota
l tra
ffic
sent
(MB)
Number of elements in each provider’s dataset
KS (4)P-SOP (4)
KS (3)P-SOP (3)
KS (2)P-SOP (2)
~80MB
Computational Overhead
1
10
100
1000
10000
100000
1e+06
1000 10000 100000
Com
puta
tiona
l tim
e (s
econ
ds)
Number of elements in each provider’s dataset
KS (4)KS (3)KS (2)
P-SOP (4)P-SOP (3)P-SOP (2)
~103 sec
Conclusions
• INDaaS is a first step towards reliable clouds: - Dependency collections- Dependency representations- Efficient auditing- Private independence auditing
• We evaluated INDaas with three realistic case studies and large-scale simulations.
Thanks, questions?
• INDaaS: Heading off correlated failures in clouds
• Find out more at:- http://dedis.cs.yale.edu/cloud/
• We will be at the poster session tonight.
Top Related