Zeppelin at Twitter
-
Upload
prasad-wagle -
Category
Technology
-
view
1.159 -
download
0
Transcript of Zeppelin at Twitter
Zeppelin at Twitter
Prasad Wagle Technical Lead, Data Platformtwitter.com/prasadwagle
May 18, 2016Apache Zeppelin MeetupMapR, San Jose, CA
Twitter Data Pipeline Overview
Production systems
Presto
Vertica
MySQL
Scalding
Spark
Custom Dashboards
Tableau
Zeppelin
Command line tools
HDFS
Analytics EnginesAnalytics Front-ends
One company-wide server530 notes2300 paragraphs
1000 Vertica, 800 Presto, 200 MySQL250 Markdown50 Hive, Scalding, Spark
550 users
Zeppelin Usage Metrics
Field of Dreams
Report CreatorsProduct managers (dashboards, product analytics)Data scientistsSales analystsEngineers and SREs
Report ViewersAnyone in the company
Zeppelin Users
My mind is absolutely blown away by the ease of use, speed, and power of Zeppelin. I've been wanting a tool like this at Twitter my entire time working here.
started playing with @ApacheZeppelin. amazingly addictive!
Thanks for all the updates to Zeppelin - Fabric has fallen in love with it fast (and we're even using it for daily tracking of our OKRs amongst all the other metrics)
Zeppelin Testimonials
Very easy to create and share reportsWeb based
Works seamlessly with analytics enginesJDBCNon-JDBC - Scalding, Spark
FlexibleOpen source (easy to add features)
Reasons for adoption
Drag and drop report builderCan create complex queries without SQL knowledge (e.g. Top N) Polished UI (for executives)Filters and other transformations work on extracts
no new database queries (fast)Row level permissions (for sales reports)
Tableau
Areas:Security
StabilityOperationsInterpreter
Work Done Before Production
AuthenticationIntegrated with Twitter’s homegrown single sign-on system
SSLIntegrated with Twitter’s homegrown key distribution system
Notebook authorizationData source authorization
Work Done (Security)
Websocket deadlock issue with Jetty 8reduce communicationremove synchronized block (risky, will move to Jetty 9)
MonitoringStandby serverBackups
Work Done (Stability, Operations)
ScaldingHive
Create JDBC connection for every query to avoid Vertica closed connection issue
Work Done (Interpreter)
Notebook authorizationData source authorizationRun scheduled notes with a userScalding interpreterReduce websocket communicationParagraph footerRow level permissions
Work Contributed to Apache Project
Stability, ScalabilityJetty9
InterpretersScalding and Spark
Multiuser scalability, authentication, integration with Twitter sources
R (authentication)Use JDBC interpreter instead of Hive
Future / Work in Progress
Features and UXNotebook organization (folders)Email reports and alertsRow level permissions like tableau
OperationsMonitoring (end-to-end query)Admin (view/stop running jobs, resource usage)FailoverContinuous Integration
Future / Work in Progress