Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency...

Post on 18-Dec-2015

224 views 2 download

Tags:

Transcript of Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency...

Volley:Automated Data Placement

for Geo-Distributed Cloud Services

Why data placement important?

Minimize latency

Eliminate redundant cost

Optimize utilization of data center

•user wants lower latency

•cloud service operator wants to limit cost

•partitioning data across DCs

Live Messenger Live Mesh

• Cover all users and devices that accessed these services over this entire month

• clients are identified by application-level unique identifiers.

Commercial cloud service trace analysis

Challenge of data placement

Geographic Diversity

Data Sharing

Data-inter Dependency

Data Center Capacity

Client Mobility

Challenge: Geographic Diversity

Challenge: Data Sharing

Data-inter dependency in Live mesh

Challenge: Data-inter Dependency

The rush in industry to build additional datacenters is motivated in part by reaching the capacity constraints of individual datacenters as new users are added. This in turn requires automatic mechanisms to rapidly migrate application data to new datacenters to take advantage of their capacity

Challenge: Datacenter Capacity

Challenge: User Mobility

Proven algorithms do not apply to this problem

Volley

Three phases

Volley Algorithm

Compute Initial Placement

Iteratively Move Data to Reduce Latency

Iteratively Collapse Data to Datacenters

Common IPPut data close to the IP address that accesses it most frequently oneDCPut all data in one data center HashRandomly allocate data Volley

Data placement heuristics

Capacity Skew

Inter-Datacenter Traffic

Latency

Evaluation

Metrics

Hash> Volley> Common IP> oneDC

Capacity Skew

oneDC> Volley> Common IP> Hash

Inter-datacenter Traffic

Volley> Common IP> oneDC> Hash>

Latency

Capacity skew:Hash>Volley>Common IP>oneDC

Inter-DC traffic:oneDC>Volley>Common IP>Hash

LatencyVolley>Common IP>oneDC>Hash

Evaluation

Iteration Count• In phase 2, exceeded iterations do not have significant

improvement• 5 iterations enough• Phase 3 determines the capacity skew

Re-computation• Do make sense• Reason: data migration

Improvement of Volley

Data placement is vital in cloud service

Volley has a comprehensive advantagesimultaneously reduces user latency and operator cost reduces datacenter capacity skew by over 2X reduces inter-DC traffic by over 1.8X reduces user latency by 30% at 75th percentile runs in under 16 clock-hours for 400 machine-hours

computation across 1 week of traces

The re-computation of Volley algorithm is necessary

Conclusion

Limitation of the evaluation conducted by the paper No good contrast Can geo-distance stand for latency? Client mobility? Large space for development

Let’s go on…….

Thank You!

Phase 1:calculate geographic centroid for each data

Phase 2:Refine centroid for each data iteratively

•considering client locations, and data inter-dependencies •using weighted spring model that attracts data items , but on a spherical coordinate system