Post on 16-Apr-2017
Vtastic: Innovations In Distributed Systems Testing
Jack Wadden, Sr. Engineering ManagerAkamai Technologies, Inc.
©2015 AKAMAI | FASTER FORWARDTM
AKAMAI CDN OVERVIEW
• We Make the Internet Fast, Reliable and Secure
• Globally-Distributed Network of Servers• Caching Content Close to End Users• Scalable Live Media Streaming• Protocol Optimizations
• DNS-Based Load Balancing System • Chooses the Best Server to Handle Your Requests
©2015 AKAMAI | FASTER FORWARDTM
MASSIVE SCALE• 15-30% of All Internet Traffic• 3+ Trillion Hits/day (2 x 1012)• 30+ Tbps
• 215,000+ Servers• Located in 120+ Countries
• 1000+ Software Components
• 100+ of Server Roles
©2015 AKAMAI | FASTER FORWARDTM
SYSTEM TESTING AT AKAMAI
©2015 AKAMAI | FASTER FORWARDTM
TESTNETS: AKAMAI’S SYSTEM TEST ENVIRONMENT
©2015 AKAMAI | FASTER FORWARDTM
HOWEVER, AT AKAMAI TESTNETS ARE A SCARCE RESOURCE
©2015 AKAMAI | FASTER FORWARDTM
THEY ARE EXPENSIVE TO BUILD
©2015 AKAMAI | FASTER FORWARDTM
AND REQUIRE A HUGE TEAM TO MAINTAIN
©2015 AKAMAI | FASTER FORWARDTM
SHARING LEADS TO DISRUPTIONS
©2015 AKAMAI | FASTER FORWARDTM
SOMETIMES THE FIT IS POOR
©2015 AKAMAI | FASTER FORWARDTM
CONFLICTING USES NEED TO BE COORDINATED
©2015 AKAMAI | FASTER FORWARDTM
AND RESULT IN INEVITABLE DELAYS
©2015 AKAMAI | FASTER FORWARDTM
FEATURES OF A BETTER TESTNETLow barrier to access Eliminate coordination
No-block debugging Automation
Portable, restorable configuration Efficient maintenance
Permit destructive testing Optimal platform utilization
CONTINUOUS, AUTOMATED,
END-TO-END TESTINGFOR ALL ENGINEERS
ON EVERY COMPONENT ACROSS AKAMAI
The Vision:
©2015 AKAMAI | FASTER FORWARDTM
TESTNET CLONINGTest Harness
VTASTIC ResourceTracker
OpenNebulaMaster Storage
TestnetClones
©2015 AKAMAI | FASTER FORWARDTM
VTASTIC MASTER TESTNET
• Supported by SME teams
• Running Production Versions
• Vtastic Team Coordinates Changes
• Custom Clones can be Saved, Shared
Master Master MasterCandidate
Snapshot
Clone
Old Master
©2015 AKAMAI | FASTER FORWARDTM
CLONES USE PRIVATE IP SPACE
100.80.0.8 (MDT)
100.80.0.15 (KDC)
100.80.0.21 (UMP)
GWSH, SOCKS
172.26.238.16 (NAT Exit)
100.80.0.1(NAT Gateway)
IP (Anything)
VLAN #83
©2015 AKAMAI | FASTER FORWARDTM
NAT TUNNELING TOOLS
• vpoint: Testnet-Attached bash Shell• LD_PRELOAD for Transparent SOCKS Tunneling (dante-client)• Proprietary SSH-proxy client
• chrome-vpoint, firefox-vpoint• Dedicated browser session with SOCKS configuration
©2015 AKAMAI | FASTER FORWARDTM
DESIGN APPROACH
• Centrally-Managed Infrastructure• Resources Granted to Users/Groups
• Distributed Storage & Compute Platform
• Commodity Hardware
• Open Source Technology• Virtualization: Qemu/KVM• Storage: GlusterFS• Orchestration: OpenNebula!!• Vtastic VRT: Python, Django, Apache
©2015 AKAMAI | FASTER FORWARDTM
SPECS, SCALE
• 40 VM Hosts• 32 Cores• 128 GB RAM• 2 x 10 Gbps Ethernet• Average 35 VMs per Host
• 40-50+ Testnets• 30-120 Nodes per Testnet• 1500-2000+ Total VMs
• 40 Storage Nodes• 8 Cores• 32 GB RAM• 10 Gbps Ethernet• 6 x 384 GB SSD + RAID0 = 2.1 TB• Total Usable Space = 42 TB
• Master Testnet• 120 Nodes• ~1.5 TB (After virt-sparsify)
©2015 AKAMAI | FASTER FORWARDTM
1.0: GLUSTER & FUSE
• Backing Files and Scratch Images on Remote Storage• Qemu Uses POSIX Path (/glusterclient/foo)
• Problems:• Memory Leaks, Hangs in GlusterFS FUSE Mount• Occasional Loss of VMs• Performance Concerns
©2015 AKAMAI | FASTER FORWARDTM
1.1: GLUSTER DIRECT
• Qemu uses libgfapi (gluster://SERVER:PORT/foo)
• Backing Files and Scratch Images on Remote Storage
• FUSE Mount Used for Image Management
• Problems:• Frequent, Catastrophic Loss of VMs• Occasional FUSE Mount Problems (Image Management)
©2015 AKAMAI | FASTER FORWARDTM
1.2: FUSE + LOCAL SCRATCH
• Qemu Uses POSIX Path (/glusterclient/foo) for Backing Image
• FUSE Mount Used for Image Management
• Scratch Images Stored on Local Disk
• Problems:• Increased Snapshot Time• No Live Migration• Occasional FUSE Mount Problems (Image Management)• Lack of Trust (VM Loss Experienced before Re-creating Gluster Volume)
©2015 AKAMAI | FASTER FORWARDTM
IN DEVELOPMENT: CEPH
• Static and Scratch Images on Remote Storage• Live Migration Possible• Holy Grail, or New Devil?
• Challenges:• Learning Curve• Ceph Stability?• Need Support for Trees of RBD Clones
©2015 AKAMAI | FASTER FORWARDTM
FUTURE POSSIBILIES
• Incorporating Physical Hardware (Load/Performance Testing)
• Realistic Network Conditions (Latency, Loss)
• Subnetting / Internetworking
VTASTIC.AKAMAI.COM
©2015 AKAMAI | FASTER FORWARDTM
IMAGE CREDITS• http://www.huffingtonpost.com/2013/04/18/embarassing-data-disasters_n_3109254.html• http://exchange.nottingham.ac.uk/research/files/2012/08/drinks-production-line-912x343.jpg• http://machinelearningmastery.com/wp-content/uploads/2013/12/test-harness.jpg• http://www.constructionweekonline.com/pictures/drought.gif • http://static.giantbomb.com/uploads/original/23/232017/2612483-supercomputer_neu_03.jpg • http://blog.straphq.com/wp-content/uploads/sites/18/2015/02/hackathon-hackers.jpg• https://nationalsafety.files.wordpress.com/2011/07/071511_2104_safetyfails4.jpg?w=595• http://img.khelnama.com/sites/default/files/styles/gallery_content_big/public/mediaimages/gallery/2013/Feb/Tug%20of%20War%20image.jpg • http://www.globalnerdy.com/wordpress/wp-content/uploads/2013/06/WWDC-bathroom-line.jpg• http://media.masslive.com/republican/photo/2010/11/9022738-large.jpg• Unlock by Joel Bryant from the Noun Project• debug by Lemon Liu from the Noun Project• Robot by Angela Dinh from the Noun Project• Server by Mister Pixel from the Noun Project• coin by Rohith M S from the Noun Project• Waiting Room by Luis Prado from the Noun Project• users by TukTuk Design from the Noun Project• Traffic Light by Arthur Shlain from the Noun Project• Wrench by Rashida Luqman Kheriwala from the Noun Project• http://product-images.www8-hp.com/digmedialib/prodimg/lowres/c02632282.png• http://www.i2clipart.com/cliparts/2/c/3/a/clipart-database-symbol-256x256-2c3a.png • http://piedmontnewsonline.com/wp-content/uploads/awpcp/help_wanted_sign-large2.png • https://upload.wikimedia.org/wikipedia/commons/thumb/3/31/XM12_and_XM2.png/220px-XM12_and_XM2.png• http://www.follytoxnetsystems.net/movie%20pix/cisco%20router_2801.gif • http://fcw.com/~/media/GIG/FCWNow/Topics/Records%20Management/electronic%20records%20management.jpg• play by Convoy from the Noun Project• Camera by iconoci from the Noun Project