Scaling Wix with microservices architecture and multi-cloud platforms - Reversim Summit 2015
-
Upload
aviran-mordo -
Category
Engineering
-
view
1.057 -
download
2
Transcript of Scaling Wix with microservices architecture and multi-cloud platforms - Reversim Summit 2015
Aviran MordoHead of Back-end Engineering
@aviranm
linkedin.com/in/aviran
aviransplace.com
Scaling Wix with Microservices Architecture
Wix in Numbers
Over 61M users1.5M new users/month
Static storage is >2PB of data1.5TB new files/day
3 data centers + 3 clouds (Google, Amazon, Azure)
1.5B HTTP requests/day
900 people work at Wix, of which ~ 300 in R&D
Initial Architecture
Built for fast development
Stateful login (Tomcat session), Ehcache, file uploads
No consideration for performance, scalability and testing
Intended for short-term use
Tomcat, Hibernate, custom web framework
Lighttpd(file serving) MySQL
DB
Wix(Tomcat)
The Monolithic Giant
One monolithic server that handled everything
Dependency between features
Changes in unrelated areas of the system caused deployment of the whole system
Failure in unrelated areas will cause system wide downtime
Concerns and SLA
Data Validation
Security / Authentication
Data consistency
Lots of data
Edit websites
High availability
High performance
Lots of static files
Very high traffic volume
Viewport optimization
Long tail (immutable)
Serving Media
High availability
High performance
High traffic volume
Long tail (mutable)
View sites, created by
Wix editor
HTML Editor
Flash Editor
MSM
Private Media
Public Media
Editor Segment Public Segment
Premium Services
eCommerse
List DB
App Builder
App Store
App Market
Dashboard
Statics/media
Mailer
TimeZone
Public HTML API
Public API (Flash)
MSP
Public Server
HTML Renderer
HTML SEO Renderer
Flash Renderer
Flash SEO Renderer
Sitemap Renderer
Robots.txt Renderer
User Server
Template Viewer
ContactsHUBActivit
y
Site Members
Provided Mailing Service
Comments
Snapshoter
User Pref
Feed Me
Shout-out Hotels
PETRI
Site Pref
Dist LoggerSlicer
eComRenderer
eCom Cart
eComCheckout
eComCatalog
eComOrders
Payment Facade
Account Info
HTML API
HTML Embeder
BlogMobile
Microservices Guidelines
Each service has its own database (if one is needed)
Only one service can write to a specific DB table(s)
There may be additional read-only services that directly accesses the DB (for performance reasons)
Services are stateless
No DB transactions
Cache is not a building block, but an optimization
Microservices TradeoffsEach service has its own database (if one is needed)
Easy to scale microservices based on SLA concerns Tradeoff – system complexity, performance
Only one service can write to a specific DB table(s)De-coupling architecture – faster development Tradeoff – system complexity / performance
May be additional read-only services that accesses the DBPerformance gainTradeoff coupling
Services are stateless
Easy to scale out (just add more servers)
Tradeoff performance / consistency
No DB transactionsBetter DB performance, easier to scaleTradeoff system complexity
Editor Server
Immutable JSON pages (~2.5M / day)
Site revisions
Active – standby MySQL cross datacenters
Editor Server
MySQL Active Sites
MySQL Archive
Protect The Data
DB outage with fast recovery = replication
Data poisoning/corruption = revisions / backup
Make the data available at all times = data distribution to multiple locations / providers
BrowserEditor Server
Static Grid
Notify
Google Cloud
Storage
MySQL Active Sites
MySQL Archive
Notify
Saving Editor Data
Archive (Amazon)
Archive (Google)
Save Page(s)
200 OK
Upload
Save Page
DC replication
Download Page
MySQL Archive
MySQL Active Sites
BrowserEditor Server
Static Grid
Save Page(s)
Save Page
Upload
NotifyDownload Page
Google Cloud
Storage
MySQL Archive
MySQL Active Sites
MySQL Archive
DC replication
Notify
Self Healing Process
Archive (Amazon)
Archive (Google)
MySQL Active Sites
200 OK
No DB Transactions
Save each page (JSON) as an atomic operation
Page ID is a content based hash (immutable/idempotent)
Finalize transaction by sending site header (list of pages)
Can generate orphaned pages, not a problem in practice
Prospero – Wix Media Storage
2PB user media files
3M files uploaded daily
800M metadata records
Dynamic media processing• Picture resize, crop and sharpen “on the fly”• Watermark• Audio format conversion
T
GoogleCloud
Prospero – Wix Media Manager
get image.jpg
First fallback
Secondfallback
If not in CDN
Amazon
x36
Tx36
Tx32
Austin
CDN
Public Segment Roles
Routing (resolve URLs)
Dispatching (to a renderer)
Rendering (HTML,XML,TXT)
Public Server
HTML Renderer
HTML SEO Renderer
Flash Renderer
Sitemap Renderer
Robots.txt Renderer
www.example.com
Flash SEO Renderer
Publish Site
Publish site header (a map of pages for a site)
Publish routing table
Publish site header / routes (CQRS)
Editor Segment Public Segment
Built For Speed
Minimize out-of-process hops (2 DB, 1 RPC)
Lookup tables are cached in memory, updated every few minutes
Denormalized data – optimize for read by primary key (MySQL)
Minimize business logic
How a Page Gets Rendered
Bootstrap HTML template that contains only data
Only JavaScript imports
JSON data (site-header + dynamic data)
No “real” HTML view
Why JSON?
Easy to parse in JavaScript and Java/Scala
Fairly compact text format
Highly compressible (5:1 even for small payloads)
Easy to fix rendering bugs (just deploy a new code)
Serving a Site – Sunny Day
Archive
CDN StaticsBrowser
http://example.wix.com
Store HTML to cache
HTTP Request
Notify site view
LB
Public
Renderer
HTML
Resources / Media
HTTP Request
Serving a Site – DC Lost
Archive
CDN StaticsBrowser
http://example.wix.com
LB
Public
Renderer
LB
Public
Renderer
Change DNS
HTTP Request
Serving a Site – Public Lost
Archive
Browserhttp://example.wix.com
LB
Public
Renderer
Get Cached HTML Version
HTMLHTTP Request
LB
Public
Renderer
Fallback to 2nd
DC
Living in the Browser
Archive
CDN StaticsBrowser
http://example.wix.com
LB
Public
Renderer
Editor Pages
Fallback
JSON / Media
HTMLHTTP Request
Fallback
Summary
Identify your critical path and concerns
Build redundancy in critical path (for availability)
De-normalize data (for performance)
Minimize out-of-process hops (for performance)
Take advantage of client’s CPU power