Zeppelin at Twitter

15
Zeppelin at Twitter Prasad Wagle Technical Lead, Data Platform twitter.com/prasadwagle May 18, 2016 Apache Zeppelin Meetup MapR, San Jose, CA

Transcript of Zeppelin at Twitter

Page 1: Zeppelin at Twitter

Zeppelin at Twitter

Prasad Wagle Technical Lead, Data Platformtwitter.com/prasadwagle

May 18, 2016Apache Zeppelin MeetupMapR, San Jose, CA

Page 2: Zeppelin at Twitter

Twitter Data Pipeline Overview

Production systems

Presto

Vertica

MySQL

Scalding

Spark

Custom Dashboards

Tableau

Zeppelin

Command line tools

HDFS

Analytics EnginesAnalytics Front-ends

Page 3: Zeppelin at Twitter

One company-wide server530 notes2300 paragraphs

1000 Vertica, 800 Presto, 200 MySQL250 Markdown50 Hive, Scalding, Spark

550 users

Zeppelin Usage Metrics

Sriram Krishnan
You may wanna double check if comms is OK with sharing these numbers, if you haven't already
Page 4: Zeppelin at Twitter

Field of Dreams

Page 5: Zeppelin at Twitter

Report CreatorsProduct managers (dashboards, product analytics)Data scientistsSales analystsEngineers and SREs

Report ViewersAnyone in the company

Zeppelin Users

Page 6: Zeppelin at Twitter

My mind is absolutely blown away by the ease of use, speed, and power of Zeppelin. I've been wanting a tool like this at Twitter my entire time working here.

started playing with @ApacheZeppelin. amazingly addictive!

Thanks for all the updates to Zeppelin - Fabric has fallen in love with it fast (and we're even using it for daily tracking of our OKRs amongst all the other metrics)

Zeppelin Testimonials

Page 7: Zeppelin at Twitter

Very easy to create and share reportsWeb based

Works seamlessly with analytics enginesJDBCNon-JDBC - Scalding, Spark

FlexibleOpen source (easy to add features)

Reasons for adoption

Page 8: Zeppelin at Twitter

Drag and drop report builderCan create complex queries without SQL knowledge (e.g. Top N) Polished UI (for executives)Filters and other transformations work on extracts

no new database queries (fast)Row level permissions (for sales reports)

Tableau

Page 9: Zeppelin at Twitter

Areas:Security

StabilityOperationsInterpreter

Work Done Before Production

Page 10: Zeppelin at Twitter

AuthenticationIntegrated with Twitter’s homegrown single sign-on system

SSLIntegrated with Twitter’s homegrown key distribution system

Notebook authorizationData source authorization

Work Done (Security)

Page 11: Zeppelin at Twitter

Websocket deadlock issue with Jetty 8reduce communicationremove synchronized block (risky, will move to Jetty 9)

MonitoringStandby serverBackups

Work Done (Stability, Operations)

Page 12: Zeppelin at Twitter

ScaldingHive

Create JDBC connection for every query to avoid Vertica closed connection issue

Work Done (Interpreter)

Page 13: Zeppelin at Twitter

Notebook authorizationData source authorizationRun scheduled notes with a userScalding interpreterReduce websocket communicationParagraph footerRow level permissions

Work Contributed to Apache Project

Page 14: Zeppelin at Twitter

Stability, ScalabilityJetty9

InterpretersScalding and Spark

Multiuser scalability, authentication, integration with Twitter sources

R (authentication)Use JDBC interpreter instead of Hive

Future / Work in Progress

Page 15: Zeppelin at Twitter

Features and UXNotebook organization (folders)Email reports and alertsRow level permissions like tableau

OperationsMonitoring (end-to-end query)Admin (view/stop running jobs, resource usage)FailoverContinuous Integration

Future / Work in Progress

Sriram Krishnan
How about (horizontal) scaling? If you end up supporting Scalding, you will end up in a situation where each of the submitters run on the same node otherwise.