Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency...
-
Upload
cynthia-griffith -
Category
Documents
-
view
220 -
download
2
Transcript of Minimize latencyEliminate redundant costOptimize utilization of data center user wants lower latency...
Volley:Automated Data Placement
for Geo-Distributed Cloud Services
Why data placement important?
Minimize latency
Eliminate redundant cost
Optimize utilization of data center
•user wants lower latency
•cloud service operator wants to limit cost
•partitioning data across DCs
Live Messenger Live Mesh
• Cover all users and devices that accessed these services over this entire month
• clients are identified by application-level unique identifiers.
Commercial cloud service trace analysis
Challenge of data placement
Geographic Diversity
Data Sharing
Data-inter Dependency
Data Center Capacity
Client Mobility
Challenge: Geographic Diversity
Challenge: Data Sharing
Data-inter dependency in Live mesh
Challenge: Data-inter Dependency
The rush in industry to build additional datacenters is motivated in part by reaching the capacity constraints of individual datacenters as new users are added. This in turn requires automatic mechanisms to rapidly migrate application data to new datacenters to take advantage of their capacity
Challenge: Datacenter Capacity
Challenge: User Mobility
Proven algorithms do not apply to this problem
Volley
Three phases
Volley Algorithm
Compute Initial Placement
Iteratively Move Data to Reduce Latency
Iteratively Collapse Data to Datacenters
Common IPPut data close to the IP address that accesses it most frequently oneDCPut all data in one data center HashRandomly allocate data Volley
Data placement heuristics
Capacity Skew
Inter-Datacenter Traffic
Latency
Evaluation
Metrics
Hash> Volley> Common IP> oneDC
Capacity Skew
oneDC> Volley> Common IP> Hash
Inter-datacenter Traffic
Volley> Common IP> oneDC> Hash>
Latency
Capacity skew:Hash>Volley>Common IP>oneDC
Inter-DC traffic:oneDC>Volley>Common IP>Hash
LatencyVolley>Common IP>oneDC>Hash
Evaluation
Iteration Count• In phase 2, exceeded iterations do not have significant
improvement• 5 iterations enough• Phase 3 determines the capacity skew
Re-computation• Do make sense• Reason: data migration
Improvement of Volley
Data placement is vital in cloud service
Volley has a comprehensive advantagesimultaneously reduces user latency and operator cost reduces datacenter capacity skew by over 2X reduces inter-DC traffic by over 1.8X reduces user latency by 30% at 75th percentile runs in under 16 clock-hours for 400 machine-hours
computation across 1 week of traces
The re-computation of Volley algorithm is necessary
Conclusion
Limitation of the evaluation conducted by the paper No good contrast Can geo-distance stand for latency? Client mobility? Large space for development
Let’s go on…….
Thank You!
Phase 1:calculate geographic centroid for each data
Phase 2:Refine centroid for each data iteratively
•considering client locations, and data inter-dependencies •using weighted spring model that attracts data items , but on a spherical coordinate system