Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency...

27
Volley: Automated Data Placement for Geo-Distributed Cloud Services

Transcript of Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency...

Page 1: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Volley:Automated Data Placement

for Geo-Distributed Cloud Services

Page 2: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.
Page 3: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Why data placement important?

Minimize latency

Eliminate redundant cost

Optimize utilization of data center

•user wants lower latency

•cloud service operator wants to limit cost

•partitioning data across DCs

Page 4: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Live Messenger Live Mesh

• Cover all users and devices that accessed these services over this entire month

• clients are identified by application-level unique identifiers.

Commercial cloud service trace analysis

Page 5: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Challenge of data placement

Geographic Diversity

Data Sharing

Data-inter Dependency

Data Center Capacity

Client Mobility

Page 6: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Challenge: Geographic Diversity

Page 7: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Challenge: Data Sharing

Page 8: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Data-inter dependency in Live mesh

Challenge: Data-inter Dependency

Page 9: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

The rush in industry to build additional datacenters is motivated in part by reaching the capacity constraints of individual datacenters as new users are added. This in turn requires automatic mechanisms to rapidly migrate application data to new datacenters to take advantage of their capacity

Challenge: Datacenter Capacity

Page 10: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Challenge: User Mobility

Page 11: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Proven algorithms do not apply to this problem

Page 12: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Volley

Page 13: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Three phases

Volley Algorithm

Compute Initial Placement

Iteratively Move Data to Reduce Latency

Iteratively Collapse Data to Datacenters

Page 14: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Common IPPut data close to the IP address that accesses it most frequently oneDCPut all data in one data center HashRandomly allocate data Volley

Data placement heuristics

Page 15: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Capacity Skew

Inter-Datacenter Traffic

Latency

Evaluation

Metrics

Page 16: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Hash> Volley> Common IP> oneDC

Capacity Skew

Page 17: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

oneDC> Volley> Common IP> Hash

Inter-datacenter Traffic

Page 18: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Volley> Common IP> oneDC> Hash>

Latency

Page 19: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Capacity skew:Hash>Volley>Common IP>oneDC

Inter-DC traffic:oneDC>Volley>Common IP>Hash

LatencyVolley>Common IP>oneDC>Hash

Evaluation

Page 20: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Iteration Count• In phase 2, exceeded iterations do not have significant

improvement• 5 iterations enough• Phase 3 determines the capacity skew

Re-computation• Do make sense• Reason: data migration

Improvement of Volley

Page 21: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Data placement is vital in cloud service

Volley has a comprehensive advantagesimultaneously reduces user latency and operator cost reduces datacenter capacity skew by over 2X reduces inter-DC traffic by over 1.8X reduces user latency by 30% at 75th percentile runs in under 16 clock-hours for 400 machine-hours

computation across 1 week of traces

The re-computation of Volley algorithm is necessary

Conclusion

Page 22: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Limitation of the evaluation conducted by the paper No good contrast Can geo-distance stand for latency? Client mobility? Large space for development

Let’s go on…….

Page 23: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Thank You!

Page 24: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Phase 1:calculate geographic centroid for each data

Page 25: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.

Phase 2:Refine centroid for each data iteratively

•considering client locations, and data inter-dependencies •using weighted spring model that attracts data items , but on a spherical coordinate system

Page 26: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.
Page 27: Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency cloud service operator wants to limit cost partitioning.