Zeppelin at twitter (sf data science meetup, july 2016)
-
Upload
prasad-wagle -
Category
Technology
-
view
634 -
download
0
Transcript of Zeppelin at twitter (sf data science meetup, july 2016)
Zeppelin at Twitter
Prasad Wagle Technical Lead, Data Platformtwitter.com/prasadwagle
July 13, 2016SF Data Science MeetupGalvanize, San Francisco, CA
Twitter Data Pipeline Overview
Production systems
Presto
Vertica
MySQL
Scalding
Spark
R
Custom Dashboards
Tableau
Zeppelin
Command line tools
HDFS
Analytics ToolsAnalytics Front-ends
One company-wide server860 notes4000 paragraphs
1500 Vertica, 1500 Presto, 300 MySQL400 Markdown300 Scalding, Spark, etc
850 users
Zeppelin Usage Metrics
Field of (Data) Dreams
Started as hackweek project in Dec 2015, Beta: Jan 27
Number of notes created
Feb - 270, Mar - 350, Apr - 470, May - 560, Jun - 750
Report CreatorsProduct managers (dashboards, product analytics)Data scientistsSales analystsEngineers and SREs
Report ViewersAnyone in the company
Zeppelin Users
My mind is absolutely blown away by the ease of use, speed, and power of Zeppelin. I've been wanting a tool like this at Twitter my entire time working here.
started playing with @ApacheZeppelin. amazingly addictive!
Thanks for all the updates to Zeppelin - Fabric has fallen in love with it fast (and we're even using it for daily tracking of our OKRs amongst all the other metrics)
Zeppelin Testimonials
Very easy to create and share reportsWeb based
Works seamlessly with analytics enginesJDBCNon-JDBC - Scalding, Spark
Open source (easy to add features)
Reasons for adoption
Drag and drop report builderCan create complex queries without SQL knowledge (e.g. Top N) Polished UI Filters and other transformations work on extracts
no new database queries (fast)Row level permissions (for sales reports)
Tableau
AuthenticationIntegrated with Twitter’s homegrown single sign-on system
SSLIntegrated with Twitter’s homegrown key distribution system
Notebook authorizationData source authorization
Work Done (Security)
Websocket deadlock issue with Jetty 8reduce communicationremove synchronized block (risky, will move to Jetty 9)
MonitoringStandby serverBackups
Work Done (Stability, Operations)
ScaldingJDBC
Create JDBC connection for every query to avoid Vertica closed connection issue
Work Done (Interpreter)
Notebook authorizationData source authorizationRun scheduled notes with a userScalding interpreterReduce websocket communicationParagraph footerRow level permissions
Work Contributed to Apache Project
Stability, ScalabilityJetty9
InterpretersScalding, Spark, R
Multiuser scalability, authentication, integration with Twitter sources
Use JDBC interpreter instead of Hive
Future / Work in Progress
Features and UXNotebook organization (folders)Email reports and alertsRow level permissions like tableau
OperationsMonitoring (end-to-end query)Admin (view/stop running jobs, resource usage)FailoverContinuous Integration
Future / Work in Progress