Manchester OpenStack Meetup: I have an OpenStack Cloud, now what? OpenStack 101
Tapjoy OpenStack Summit Paris Breakout Session
-
Upload
weston-jossey -
Category
Software
-
view
427 -
download
2
Transcript of Tapjoy OpenStack Summit Paris Breakout Session
![Page 1: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/1.jpg)
Tapjoy & OpenStackDelivering Billions of
Requests Daily
Wes JosseyHead of Operations @Tapjoy
![Page 2: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/2.jpg)
Tapjoy
● Global App-Tech Startup● We Power For Mobile Developers:
○ Monetization○ Analytics○ User Acquisition○ User Retention
● 450M+ Monthly Users Across 270k+ Apps● Worldwide Presence
![Page 3: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/3.jpg)
Technical Details
● Early AWS Adopter. ● Grew Predominantly on AWS.● Over 1,100 AWS VMs Daily (10/2014)● Active Regions in Asia, Europe, N.A.● Over One Trillion Requests Handled
Annually
![Page 4: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/4.jpg)
Tech Philosophy
● Compute (EC2 & Nova) Driven Company○ Operate Your Own Infrastructure
■ But Not Necessarily Built-From-Scratch○ Zero Heart-Attack Nodes
■ All Nodes Are Ephemeral■ Data is Always Distributed■ Failure is Always Tolerated■ Misbehaving Instances Are Terminated Quickly
![Page 5: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/5.jpg)
Services We Use
● SQS○ Simple, Inexpensive, Durable. ○ Currently Building New Internal System Influenced
by SQS, but with Different Guarantees○ No Lock-In (See https://github.com/Tapjoy/chore)
● RDS○ No Lock in. Simple. Easy.
● Cloudwatch (but also statsd)
![Page 6: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/6.jpg)
Services We Use Cont.
● ELB○ SSL Termination Only. Routing Handled Elsewhere.
● Auto-Scaling○ Traffic can fluctuate 30% peak to valley
● S3○ Where we store ALL the things○ Still price competitive for what it provides. No plans
to leave as of today.
![Page 7: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/7.jpg)
Use Compute Everywhere
● Every Dev Has Access to Either AWS or Tapjoy-1 (Tapjoy’s OpenStack Deployment)
● Simulate Changes Against Useful Data● Test Algorithms on Large Hadoop Clusters● Practice for Failure With Access to Real
Services (not mock endpoints)
![Page 8: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/8.jpg)
Going Hybrid● We Spend in the Millions on AWS● Picked Data-Science Infrastructure because
of Portability, and Ability to Leverage More Nodes
● Lower Risk than Tier-1 Production Services● Wanted a Partner to Maintain OpenStack
like Amazon ‘Maintains’ AWS● We Want to Operate Apps
![Page 9: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/9.jpg)
OpenStack Timeline
![Page 10: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/10.jpg)
Vendors (It Matters)
● Metacloud○ Verified our Design○ Deployed Openstack○ Provisioned Network○ Allowed Us to Focus on Business Applications
● Equinix○ Cooling & Power Design○ Remote Hands○ Went Above and Beyond on Numerous Occasions
![Page 11: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/11.jpg)
Vendors: Full List
● Metacloud● Equinix● Quanta● Cumulus● Level3● Newegg
![Page 12: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/12.jpg)
Challenges● Hardware Delays Killed Our Timelines
○ Blew through our contingency windows.○ Hurt our budgets.○ Delayed subsequent purchases
● Setting Up IP Transit Can Be Slow● No Physical Presence in DC
○ Also a Pro● No Internal Previous Success Story… So
Lots of Skepticism
![Page 13: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/13.jpg)
The Not So Glamorous Job
● Negotiations Can Be Exhausting● If You’re An Engineer, the Turn Around Time
Can Be Frustrating● You Probably Need a Gantt Chart● There’s Nothing Agile About Writing a Big
Check
![Page 14: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/14.jpg)
348 ‘Data’ All Purpose Nodes● Quanta S910-X31E: 12 Node Configuration● Per Node
○ Intel 1265Lv3 @ 2.5GHz○ 4x1TB 7200RPM ○ 32GB RAM○ Dual 1Gig NIC
● ‘Recyclable’ for Other Tasks if we Evolve
Tapjoy-1: Data Nodes
![Page 15: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/15.jpg)
12 ‘Management’ Nodes● Quanta S180: 4 Node Configuration● Per Node
○ Intel 2650v2 x2 @2.60GHz○ 128GB RAM○ 6x480GB SSD○ Dual 10Gig NIC
Tapjoy-1: Management Nodes
![Page 16: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/16.jpg)
Glamor Shot
![Page 17: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/17.jpg)
Same Price, Different Outcome
![Page 18: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/18.jpg)
Diagrams!
![Page 19: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/19.jpg)
High-Level Request Flow Architecture
![Page 20: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/20.jpg)
Detailed Flow
![Page 21: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/21.jpg)
Data Pipeline
Tapjoy-1
![Page 22: Tapjoy OpenStack Summit Paris Breakout Session](https://reader034.fdocuments.in/reader034/viewer/2022042817/55a2b4451a28abb77c8b45cc/html5/thumbnails/22.jpg)
Plan For Failure
● Hardware○ I’m Not Saying You Shouldn’t Use CEPH…
■ But You’ll Notice it’s Absent Here● Service Boundaries
○ Have Hardware & Software Contingencies■ Backup Links■ Temporary Cache(s)
○ Actually Test Failure in Production