Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by...

Post on 20-May-2020

1 views 0 download

Transcript of Data Science Notebook Webinar 2017-11-16 copy...Data science readiness •Jupyter: Widely used by...

DataScienceNotebookGuidelines

ODPi BI&DataScienceSIG:CupidChan

MoonSooLeeFrankMcQuillan

• BridgingthegapsothatBItoolscansitharmoniouslyontopofbothHadoopandRDBMS,whileprovidingthesame,orevenmore,businessinsighttotheBIuserswhohavealsoHadoopinthebackend.

• ProvideanobjectiveguidelineforevaluatingtheeffectivenessofaBIsolution,and/orotherrelatedmiddlewaretechnologies

BI&DataScienceSpecialInterestGroup(SIG)

Targetuserpersona

• Jupyter:Datascienceuserwithprogrammingexperienceinoneofthesupportedkernels

• Zeppelin:Dataengineer,datascientistandbusinessusersinthesamedataprocessingpipelineneedtocollaborate

Installation

• Jupyter:EasyinstallationwithAnacondaorpip.Standalone,orHadoopandSpark(viaYARN)clusterssupported.

• Zeppelin:Downloadbinarypackageandstartdaemonscript.IncludedinHDP.

Configuration

• Jupyter:Editconfig filesorusecommandlinetoolfornotebooksettings.Communitymaintainedlanguagekernelshavevariousconfigurationworkflows.

• Zeppelin:Editconfig files.InterpreterscanbeconfiguredthroughGUI.

UserInterface

• Jupyter:Functionalnotebookuserinterfacethatcanbeusedtocreatereadableanalysescombiningcode,images,comments,formulaeandplots.

• Zeppelin:Notebookinterfacethatusercandocument,runcodes,visualizeoutputswithflexiblelayoutandmultiplelookandfeel.

Supportedlanguages

• Jupyter:Python,R,Juliaanddozensofcommunitymaintainedkernels

• Zeppelin:VariouslanguagesupportsareincludedinthebinarypackagewhichSpark,Python,JDBCandetc.3rdpartyinterpretersareavailablethroughonlineregistry

Multi-usersupport

• Jupyter:NativeJupyter doesnotsupportmulti-user.HoweverJupyterHub canbeusedtoservenotebookstousersworkinginseparatesessions.

• Zeppelin:Multipleuserscancollaborateinreal-timeonanotebook.Multipleuserscanworkwithmultiplelanguagesinthesamenotebook.

Supportandcommunity

• Jupyter:Matureprojectwithactivecommunityandgoodsupport.Jupyter projectbornin2014buthasrootsgoingbackto2001.

• Zeppelin:ApacheZeppelinisoneofthemostactiveprojectinApacheSoftwareFoundation.Projectbornin2013andbecametoplevelprojectofASFin2015.

Architecture

• Jupyter:Thenotebookserversendscodetolanguagekernels,rendersinabrowser,andstorescode/output/MarkdowninJSONfiles.

• Zeppelin:Zeppelinserverdaemonmanagesmultipleinterpreters(backendintegrations).Webapplicationcommunicatestoserverusingwebsocketforreal-timecommunication.

Bigdataecosystem

• Jupyter:Canbeconnectedtoavarietyofbigdataexecutionenginesandframeworks:Spark,massivelyparallelprocessing(MPP)databases,Hadoop,etc.

• Zeppelin:TightlyintegratedwithApacheSparkandotherbigdataprocessingengines.

Security

• Jupyter:Codeexecutedinthenotebookistrusted,likeanyotherPythonprogram.Token-basedauthenticationonbydefault.Rootusedisabledbydefault/

• Zeppelin:Userauthentication(LDAP,ADintegration)NotebookACL.InterpreterACL.SSLconnection.

Datasciencereadiness

• Jupyter:Widelyusedbydatascientistsforavarietyoftasksincludingquickexploration,documentationoffindings,reproducibility,teaching,andpresentations

• Zeppelin:Datascientistscancollaborateeachother.Alsobusinessuserscanloginandcollaboratewithdatascientistsdirectlyonnotebooks.

JupyterFrankMcQuillan

Agenda

• WhatisaJupyter notebook?• Lightningtutorial- myfirstJupyter notebook• Datascienceexamples

– Python– SQL

• Keystrengthsandpotentialareasofimprovement

WhatisaJupyter Notebook?

• Tellastorywithyourdata• Programinawebbrowser• “Multimodal”• Favoritetoolofdatascientistsandresearchers

SupportandCommunity

• 2001- IPythonnotebookproject(FernandoPerez)• 2014- Jupyternotebooklaunched• Opensource(modifiedBSDlicense)• Steeringcouncilof~15membersfromacademiaandcommercialcompanies

• Matureproductwithactivecommunityhttps://stackoverflow.com/search?q=jupyter returns~10,500results

Architecture

● IPython● IRkernel● IJulia● Dozensofcommunity

maintainedkernelshttps://github.com/jupyter/jupyter/wiki/Jupyter-kernels

Demo

Summary

• Keystrengths– Datasciencefriendly–Matureproject–Widelyused– IntuitiveUI– Nicepresentationofcode,images,comments,formulae

– Lotsofavailablekernels

• Somepotentialimprovements–Multi-usersupport– Celldraganddrop– Hidingcode/output– IDEtypeoperationslikesyntaxchecking,versioncontrol,runningcodeonelineatatime

ZeppelinMoonSooLee

Slide & demo notebook - https://s.apache.org/ZPLN