Tapping the real-time stream
with SQL
Wix’s SQL-on-Storm Platform
May, 2015
Gregory Bondar, [email protected] Igal Shilman, [email protected]
Wix Company • Wix.com is the world’s leading cloud-‐based web development pla<orm
that enables to create professional HTML5 websites using online "Drag & Drop" tools
• Wix was founded in 2006, headquartered in Tel Aviv • Wix has around 65M registered users and growing…
Wix’s Data Services: building blocks
• Batch-‐Oriented Data Processing: - Hadoop ecosystem: Cloudera CDH4, HBase, Pig, Oozie, etc.
• SQL-‐on-‐Hadoop interfaces: - Facebook’s Presto with “home-‐made” Parquet, HBase and MS SQL
connectors
• Real-‐Zme Stream-‐Oriented AnalyZcs: - Storm, Esper, etc.
• And more: - Microso\ SQL Server 2012 - Google Cloud (AppEngine, Datastore, Pub/Sub, Dataflow, etc.) - Sharded Redis cluster
Major limitaZons pushed us into Data Stream journey
• Latency, latency, laaaaaaaaatency… – Events ingesZon latency (10-‐20 minutes on average) – Hadoop is opZmized for batch-‐oriented processing of historical data – Latency of analyZc job results (up to dozens of minutes) – Unpredictable consumpZon of Hadoop cluster resources by on-‐
demand analyZc jobs
Use Cases that require Real-‐Time Data Stream AnalyZcs
• Product personalizaZon
• Analysis of user behavior trends and anomalies
• OperaZonal analyZcs (monitoring, security, etc.)
• Machine learning models against user acZvity to predict user behavior
Wix Data Stream Tube
Let’s assume that all Wix’s events flows through a one tube named “events”
SQL-‐like query language
SQL-‐like query language (Cont.)
Wix’s SQL-‐on-‐Storm: requirements
• DemocraZzing Data, self-‐service to access and uZlize as much data as legally possible
• User-‐friendly interface for SQL patriots • Flexibility to execute any kind of queries • Ability to output the query results to external
services • On-‐demand and long-‐running queries support • Knowledge sharing: “ready-‐to-‐use” query templates • High throughput and maximum upZme
Integrated usage of Storm and Esper
Esper -‐ hgp://www.espertech.com/esper/
• Esper – light-‐weight Java library for complex event processing (CEP) and event series analysis
• Why Esper? – Offers rich SQL-‐like event processing language (EPL) supporZng very complex event streaming analyZcs
– Easy to integrate and use – Very stable, with high performance metrics – AcZvely developed – Open source, well documented
Storm topology reuse by correct parZZon key
• Accepts events from log collectors • Converts them to enriched objects • Hash parZZon objects by key (e.g., user id, request id)
Compute Bolt
• Manages Esper engine instances • Deploy/un-‐deploy queries on demand • Routes query results to the ac:on / aggrega:on layers
AcZons • PersonalizaZon
Services • Graphite • Database • New Relic • Email • UDP and HTTP
output
Wix SQL-‐on-‐Storm Dashboard: Demo
AggregaZon Bolt
• Special acZon type aggregaZng parZal results of Compute Bolts • In another words: Map-‐Reduce paradigm implementaZon for streaming
Wix SQL-‐on-‐Storm – AggregaZon Queries: Demo
Wix SQL-‐on-‐Storm: Architecture Summary
Any QuesZons?!
Top Related