Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee...

8
Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee Tudor Marian Professor Ken Birman

description

Tempest: Goal Provide programmers replicated data storage primitives Very fast average performance and good worst-case timing guarantees Easy Deployment, Monitoring and Management of time-critical scalable services in a clustered environment

Transcript of Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee...

Page 1: Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee Tudor Marian Professor Ken Birman.

Tempest: An Architecture for Scalable Time-Critical Services

Mahesh BalakrishnanAmar Phanishayee

Tudor MarianProfessor Ken Birman

Page 2: Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee Tudor Marian Professor Ken Birman.

Clusters of commodity computers used in mission-critical settings

(commercial and military) Advantages

cost-effectiveness, incremental scalability and high availability

Issues failures, arbitrary load, network losses

affect real-time guarantees

Page 3: Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee Tudor Marian Professor Ken Birman.

Tempest: Goal Provide programmers replicated data

storage primitives

Very fast average performance and good worst-case timing guarantees

Easy Deployment, Monitoring and Management of time-critical scalable services in a clustered environment

Page 4: Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee Tudor Marian Professor Ken Birman.

Tempest: Approach clone services for scalability, fault tolerance automate replica placement (service

colocation) fine-grained data caching response time monitoring to detect service

slowdown redundant querying for faster response UI to drag and drop services onto a cluster

Page 5: Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee Tudor Marian Professor Ken Birman.

Accomplishments Ricochet: Low-Latency Multicast for

Scalable Time-Critical Services Submitted to NSDI 2006 (Oct 2005)

Scalable Services Architecture (SSA) Submitted to ICDCS (Nov 2005)

Page 6: Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee Tudor Marian Professor Ken Birman.

Ricochet vs SRMSRM Recovery

0.0E+00

2.0E+06

4.0E+06

6.0E+06

8.0E+06

1.0E+07

1.2E+07

1.4E+07

1 2 4 8 16 32 64 128

Groups

Mic

rose

cond

s

Average Recovery Delay Average Discovery Delay

Ricochet Recovery

0

5000

10000

15000

20000

25000

30000

35000

40000

45000

50000

2 4 8 16 32 64 128 256 512 1024

GroupsM

icro

seco

nds

Average Recovery Delay

• SRM’s discovery delay is the lower bound on recovery

• SRM’s recovery delay scales poorly with # of Groups (delay in seconds!)

• Ricochet scales in # of Groups (~14ms in 1 group to 24 ms in 1024 groups)

64 Groups

9 seconds

64 Groups16ms !

Page 7: Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee Tudor Marian Professor Ken Birman.

Ricochet vs SRM in 64 groups

Histogram of SRM Recoveries (64 Groups)

0.00

2.00

4.00

6.00

8.00

10.00

12.00

14.00

0.0E

+00

2.3E

+06

4.7E

+06

7.0E

+06

9.3E

+06

1.2E

+07

1.4E

+07

1.6E

+07

1.9E

+07

2.1E

+07

2.3E

+07

2.6E

+07

2.8E

+07

3.0E

+07

3.3E

+07

3.5E

+07

3.7E

+07

Microseconds

Perc

enta

ge

Histogram of Ricochet Recoveries (64 Groups)

0.00

5.00

10.00

15.00

20.00

25.00

30.00

2.8E

+03

2.0E

+04

3.7E

+04

5.4E

+04

7.1E

+04

8.8E

+04

1.0E

+05

1.2E

+05

1.4E

+05

1.6E

+05

1.7E

+05

1.9E

+05

2.1E

+05

2.2E

+05

2.4E

+05

2.6E

+05

2.7E

+05

MicrosecondsPe

rcen

tage

SRM Recovery centered around 9 seconds… Ricochet around 15 milliseconds.

1-2 orders of magnitude!Improvement increases with number of groups

Page 8: Tempest: An Architecture for Scalable Time-Critical Services Mahesh Balakrishnan Amar Phanishayee Tudor Marian Professor Ken Birman.

Inconsistency WindowsHistogram of Ricochet Recoveries (64 Groups)

0.00

10.00

20.00

30.00

40.00

50.00

60.00

70.00

1.3E

+03

5.1E

+03

9.0E

+03

1.3E

+04

1.7E

+04

2.1E

+04

2.4E

+04

2.8E

+04

3.2E

+04

3.6E

+04

4.0E

+04

4.4E

+04

4.7E

+04

5.1E

+04

5.5E

+04

5.9E

+04

6.3E

+04

Microseconds

Perc

enta

ge

Ricochet Replication:

Updates are reflected at all

replicas within…

65% within 1.25 ms90% within 18 ms99% within 77 ms100% within 125 ms