Future of data visualization
-
Upload
hadoopsphere -
Category
Technology
-
view
400 -
download
0
Transcript of Future of data visualization
hadoopsphere
Future of Data VisualizationHadoopSphere Virtual ConclaveAugust 2015
2
Commonly understood components of data visualization
• Graphs, maps, tables, shapes
• WYSIWYG editors
• Dashboards
• HTML5 views
• Infographics
3
Defining data visualization• Data visualization is the presentation of data in a
pictorial or graphical format. - Wikipedia
• Data visualization is a visual representation of the insights gained from your analysis. - Datameer
4
Emerging Trends• New Channels–Mobile, VR devices
• More interactive charts– Redraw, filter, annotations
• Multidimensional visual– VR, GL
• Network visualization– Social, Linkages
• Collaborations– Share, Review, Workflow
• And we may have ‘audiolizations’ as well– Audio narrations
6
Challenges
Access to dataParse dataCentral data accessFast queriesComplex visual typesLinked ViewsData miningCollaborationWorkflow
7
Introducing Apache Zeppelin
HDFS/ Data Store
Oper
atio
nsGovernance/Security
YARN
Spark / Flink / Tajo …
• Apache Zeppelin is a web-based multi-purpose notebook for interactive data analysis.
• It is a 100% open source incubator project of Apache Software Foundations.
• As per HadoopSphere, Apache Zeppelin is going to influence big data visualization tools for next 2 years or more.
8
Zeppelin Notebook• A web-based notebook
that enables interactive data analytics.
• You can type in code in SQL, Scala and more in the notebook.
• Run the commands directly from the notebook.Source for this slide and subsequent slides:
(1) http://zeppelin.apache.org(2) Lee Moon Soo, Introduction to Zeppelin, ApacheCon 2015
10
Behind the scenes• Java based backend• Active development community- Built-in Apache Spark integration- Uses Angular JS, D3.js- Tested on Mac OSx, Ubuntu 14.x, CentOS 6.x
11
Zeppelin features - Visualization• Some basic charts are
currently included in Zeppelin and more will be added in future.
• Visualizations are not limited to Spark SQL's query - relational output from many other language backends can be recognized and visualized.
12
Zeppelin features - Pivots• With simple drag and
drop Zeppelin aggregates the values and display them in pivot chart.
• You can easily create chart with multiple aggregated values including sum, count, average, min, max.
13
Zeppelin features – Dynamic forms• Zeppelin can
dynamically take inputs in forms as part of the notebook.
• These dynamic forms can be used to see input based results or render charts.
14
Zeppelin features – Collaboration and publishing• Notebook URL can be
shared among collaborators. Zeppelin can then broadcast any changes in real time, just like the collaboration in Google docs.
• Zeppelin provides a URL to display the results only that can easily be embedded as an iframe inside a web page.
15
Zeppelin interpreter architecture• Zeppelin Interpreter is a connector between Zeppelin and backend
data processing system. For example to use scala code in Zeppelin, you need scala interpreter.
• Every Interpreter belongs to an InterpreterGroup which is a unit of start/stop interpreter. Interpreters in the same InterpreterGroup can reference each other. For example, SparkSqlInterpreter can reference SparkInterpreter to get SparkContext from it while they're in the same group. ZeppelinServer
InterpreterGroup
Separate JVM process
Interpreter
Interpreter
Interpreter
Spark
Spark PySpark SparkSQL Dep
Load libraries
Maven repositorySpark cluster
Share single SparkDriver
Thrift
17
Getting involved with Zeppelin• http://zeppelin.apache.org/• http://github.com/apache/incubator-zeppelin
Installation reference:• http://hortonworks.com/blog/introduction-to-data-
science-with-apache-spark/• http://nflabs.github.io/z-manager/
Mailing List• [email protected]
18
Other Notebook options• iPython Notebook • Beaker• Spark-Notebook • Databricks Cloud Notebook