Joshua Reich, Oren Laadan, Eli Brosh, Alex Sherman, Vishal Misra, Jason Nieh, and Dan Rubenstein 1...
Transcript of Joshua Reich, Oren Laadan, Eli Brosh, Alex Sherman, Vishal Misra, Jason Nieh, and Dan Rubenstein 1...
1
Joshua Reich, Oren Laadan, Eli Brosh, Alex Sherman, Vishal Misra,
Jason Nieh, and Dan Rubenstein
VMTorrent: Scalable P2P Virtual Machine Streaming
2
• VM: software implementation of computer
• Implementation stored in VM image
• VM runs on VMM– Virtualizes HW
– Accesses image
VM Basics
VMMVMImage
VM
3
Where is Image Stored?
VMMVMImage
VM
4
Traditionally: Local Storage
VMM VMLocalStorage
5
IaaS Cloud: on Network Storage
VMM VMVM
Image
NetworkStorage
6
Can Be Primary
VMM VMNFS/iSCSIVM
Image
NetworkStorage
e.g., OpenStack Glance Amazon EC2/S3
vSphere network storage
7
Or Secondary
VMMVM
Image
VMNetworkStorage Local
Storage
e.g., Amazon EC2/EBSvSphere local
storage
8
Either Way, No Problem Here
VMM VMVM
Image
NetworkStorage
9
Here?
VMImage
NetworkStorage
Bottleneck!
10
Lots of Unique VM Images
NetworkStorage
on EC2 alone
54784 unique images*
*http://thecloudmarket.com/stats#/totals , 06 Dec 2012
11
Unpredictable Demand
NetworkStorage
• Lots of customers
• Spot-pricing
• Cloud-bursting
12
Don’t Just Take My Word• “The challenge for IT teams will be
finding way to deal with the bandwidth strain during peak demand - for instance when hundreds or thousands of users log on to a virtual desktop at the start of the day - while staying within an acceptable budget” 1
• “scale limits are due to simultaneous loading rather than total number of nodes” 2
• Developer proposals to replace or supplement VM launch architecture for greater scalability 3
1. http://www.zdnet.com/why-so-many-businesses-arent-ready-for-virtual-desktops-7000008229/?s_cid=e539
2. http://www.openstack.org/blog/2011/12/openstack-deployments-abound-at-austin-meetup-129
3. https://blueprints.launchpad.net/nova/+spec/xenserver-bittorrent-images
13
Challenge: VM Launch in IaaS
• Minimize delay in VM execution
• Starting from time launch request arrives
• For lots of instances (scale!)
14
Naive Scaling Approaches
• Multicast – Setup, configuration, maintenance, etc.
1
– ACK implosion
– “multicast traffic saturated the CPU on [Etsy] core switches causing all of Etsy to be unreachable“ 2
1. [El-Sayed et al., 2003; Hosseini et al., 2007]2. http://codeascraft.etsy.com/2012/01/23/solr-bittorrent-index-replication
15
Naive Scaling Approaches
• P2P bulk data download (e.g., Bit-Torrent)– Files are big (waste bandwidth)
–Must wait until whole file available (waste time)
– Network primary? Must store GB image in RAM!
16
Both Miss Big Opportunity
VM image access • Sparse• Gradual
• Most of image doesn’t need to be transferred
• Can start w/ just a couple of blocks
17
VMTorrent Contributions
• Architecture
–Make (scalable) streaming possible: Decouple data delivery from presentation
–Make scalable streaming effective: Profile-based image streaming techniques
• Understanding / Validation
–Modeling for VM image streaming
– Prototype & evaluation not highly optimized
18
Talk
• Make (scalable) streaming possible: Decouple data delivery from presentation
• Make scalable streaming effective: Profile-based image streaming techniques
• VMTorrent Prototype & Evaluation
(Modeling along the way)
19
Decoupling Data Delivery from Presentation
(Making Streaming Possible)
20
Generic Virtualization Architecture
VM
VMM Host
Hardware
VM Imag
e
FS
• Virtual Machine Monitor virtualizes hardware
• Conducts I/O to image through file system
21
Cloud Virtualization Architecture
VM
VMM
Hardware
VM Imag
e
FS
• Either to download image
NetworkBacken
d
Network backend used
• Or to access via remote FS
22
CustomFS
VMTorrent Virtualization Architecture
VM
VMM
Hardware
VM Imag
e
FS
NetworkBacken
d
• Divide image into pieces• But provide appearance
of complete image to VMM
• Introduce custom file system
23
CustomFS
Decoupling Delivery from Presentation
VM
VMM
Hardware
NetworkBacken
d
0 1 23 4 56 7 8
VMM attempts to read piece 1Piece 1 is present, read completes
24
CustomFS
VM
VMM
Hardware
NetworkBacken
d
0 1 23 4 56 7 8
VMM attempts to read piece 0Piece 0 isn’t local, read stallsVMM waits for I/O to completeVM stalls
Decoupling Delivery from Presentation
25
CustomFS
VM
VMM
Hardware
NetworkBacken
d
0 1 23 4 56 7 8
FS requests piece from backendBackend requests from network
Decoupling Delivery from Presentation
26
VM
VMM
Hardware
NetworkBacken
d
0
Later, network delivers piece 0
CustomFS
1 23 4 56 7 8
0
Read completesCustom FS receives, updates piece
VMM resumes VM’s execution
Decoupling Delivery from Presentation
27
Decoupling Improves Performance
VM
VMM
Hardware
NetworkBacken
d
CustomFS
1 23 4 56 7 8
0
Primary StorageNo waiting for image download to complete
28
Decoupling Improves Performance
VM
VMM
Hardware
NetworkBacken
d
CustomFS
1 23 4 56 7 8
0
X
X
Secondary StorageNo more writes or re-reads over network w/ remote FS
29
But Doesn’t Scale
Assuming a single server,the time to download a single piece is
t = W + S / (rnet / n)
• W : wait time for first bit • rnet : network speed
• S : piece size• n : # of clients
Transfer time,each client getsrnet / n of server BW
30
Read Time Grows Linearly w/ n
Assuming a single server,the time to download a single piece is
t = W + n * S / rnet
• W : wait time for first bit • rnet : network speed
• S : piece size• n : # of clients
Transfer timelinear w/ n
31
This Scenario
VM
VMM
Hardware
NetworkBacken
d
CustomFS
1 23 4 56 7 8
0
csd
32
Alleviate network storage bottleneck
VM
VMM
Hardware
NetworkBacken
d
CustomFS
1 23 4 56 7 8
0
Decoupling Enables P2P Backend
Swarm
P2PManage
r
1 23 4 56 7 8
0
• Exchange pieces w/ swarmP2P copy must remain pristine
33
VM
VMM
Hardware
CustomFS
1 23 4 56 7 8
0
Space Efficient
Swarm
P2PManage
r
1 23 4 56 7 8
0
FS uses pointers to P2P imageFS does copy-on-write
6 7
34
VM
VMM
Hardware
CustomFS
1 23 4 56 7 8
0
Minimizing Stall Time
Swarm
P2PManage
r
1 23 4 56 7 8
0
Non-local piece accesses
6 7
4?
4?
4!
Trigger high priority requests
35
Now, the time to download a single piece is
t = W(d) + S / rnet
• W(d) : wait time for first bit as function of
• d : piece diversity• rnet : network speed
• S : piece size• n : # of peers
P2P Helps
Wait is function of diversity
Transfer timeindependent of n
36
High Diversity Swarm Efficiency
37
Low Diversity Little Benefit
Nothing to share
38
All peers request same pieces at same time
t = W(d) + S / rnet
Low piece diversity Long wait (gets worse as n grows) Long download times
P2P Helps, But Not Enough
39
VM
VMM
Hardware
CustomFS
1 23 4 56 7 8
0
This Scenario
Swarm
P2PManage
r
1 23 4 56 7 8
0
6 7
p2pd
40
Profile-based Image Streaming Techniques
(Making Streaming Effective)
41
How to Increase Diversity?
Need to fetch pieces that are
• Rare: not yet demanded by many peers
• Useful: likely to be used by some peer
Profiling
• Need useful pieces
• But only small % of VM image accessed
• We need to know which pieces accessed
• Also, when (need later for piece selection)
42
43
Build Profile
• One profile for each VM/workload
• Ran one or more times (even online)
• Use FS to track –Which pieces accessed
–When pieces accessed
• Entries w/ average appearance time, piece index, and frequency
44
Piece Selection
• Want pieces not yet demanded by many
• Don’t know piece distribution in swarm
• Guess others like self
• Gives estimate when pieces likely needed
45
Piece Selection Heuristic
• Randomly (rarest first) pick one of first k pieces in predicted playback window
• fetch w/ medium priority (demand wins)
46
Profile-based Prefetching
• Increases diversity
• Helps even w/ no peers (when ideal access exceeds network rate)
47
Profile-based window-randomized prefetch
t = W(d) + S / rnet
High piece diversity Short wait (shouldn’t grow much w/ n)
Quick piece download
Obtain Full P2P Benefit
48
VM
VMM
Hardware
CustomFS
1 23 4 56 7 8
0
Full VMTorrent Architecture
Swarm
P2P Manager
1 23 4 56 7 8
0
6 7
profile
p2pp
49
Prototype
50
VM
Hardware
CustomFS
1 23 4 56 7 8
0
VMTorrent Prototype
BT Swarm
P2P Manager
1 23 4 56 7 8
0
6 7
profile
Custom CUsing FUSE
Custom C++& Libtorrent
51
Evaluation Setup
52
Testbeds
• Emulab [White, et. al, 2002]
– Instances on 100 dedicated hardware nodes
– 100 Mbps LAN
• VICCI [Peterson, et. al, 2011]
– Instances on 64 vserver hardware node
slices
– 1 Gbps LAN
53
VMs
54
Workloads
• Short VDI-like tasks• Some cpu-intensive, some I/O
intensive
55
• Measured total runtime – Launch through shutdown
– (Easy to measure)
• Normalized against memory-cached execution– Ideal runtime for that set of hardware
– Allows easy cross-comparison • different VM/workload combinations
• Different hardware platforms
Assessment
56
Evaluation
57
100 Mbps Scaling
Starting to increase
58
Due to Decreased Diversity
# peers increases more demand requests to seed less opportunity to build diversity longer to reach max swarming efficiency + lower max
59
Due to Decreased Diversity
# peers increases more demand requests to seed less opportunity to build diversity longer to reach max swarming efficiency + lower max
60
Due to Decreased Diversity
# peers increases more demand requests to seed less opportunity to build diversity longer to reach max swarming efficiency + lower max
We optimized too much for single instance!
(choosing demand requests take precedence)
61
(Some) Future Work
• Piece selection for better diversity
• Improved profiling
• DC-specific optimizations
Current work orders of magnitude better than state-of-art
62
Demo(video omitted for space)
63
See Paper for More Details
• Modeling– Playback process dynamics – Buffering (for prefetch)– Full characterization of r incorporating
impact of centralized and distributed models on W
– Other elided details
• Plus–More architectural discussion!– Lots more experimental results!
64
Summary• Scalable VM launching needed
• VMTorrent addresses by– Decoupling data presentation from streaming
– Profile-based VM image streaming
• Straightforward techniques, implementation, no special optimizations for DC
• Performance much better than state-of-art– Hardware evaluation on multiple testbeds
– As predicted by modeling