Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for...

78
Applications & Libraries VU Visual Data Science Johanna Schmidt WS 2019/20

Transcript of Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for...

Page 1: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Applications & LibrariesVU Visual Data Science

Johanna Schmidt

WS 2019/20

Page 2: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Visual Data Science Tools

• Several different tools available

[1]

Page 3: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Visual Data Science Tools

• Differentiate between• Charting libraries

• Applications

Page 4: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Visual Data Science Tools

• Differentiate between• Charting libraries

• Python, R, Matlab, …

• Embedded in programming language environment

• Require programming skills

• Often Open Source

• Applications

Page 5: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Visual Data Science Tools

• Differentiate between• Charting libraries

• Python Plotly

• Python Matplotlib

• D3

• Highcharts

• GGPlot

• Applications

Page 6: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Visual Data Science Tools

• Differentiate between• Charting libraries

• Applications• Standalone applications

• Provide means for data handling & visualization

• Usually no programming skills required

• In many cases commercial

Page 7: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Visual Data Science Tools

• Differentiate between• Charting libraries

• Applications• Excel

• Tableau

• Microsoft Power BI

• Cognos

• QlikView

• …

Page 8: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Visual Data Science Tools

• Categorization / Evaluation

Page 9: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Visual Data Science Tools

• Categorization / Evaluation• Categorization based on comparison

[2]

https://source.opennews.org/articles/what-i-learned-recreating-one-chart-using-24-tools/

Page 10: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Visual Data Science Tools

• Categorization / Evaluation• Categorization based on comparison

[2]

Page 11: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Visual Data Science Tools

• Categorization / Evaluation• Evaluation of existing applications

[3]

Page 12: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Visual Data Science Tools

• Differentiate between• Charting libraries

• Applications

Page 13: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Tool Comparison

• Study by Lisa Charlotte Rost (Datawrapper): What I Learned Recreating One Chart Using 24 Tools

• Study in 2016

• Compared 12 charting libraries and 12 applications

https://source.opennews.org/articles/what-i-learned-recreating-one-chart-using-24-tools/

Page 14: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Tool Comparison

[4]

Page 15: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Tool Comparison

[5]

Excel Google Sheets

Page 16: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Tool Comparison

[5]

Tableau Plotly

Page 17: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Tool Comparison

• Task

[2]

Page 18: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Tool Comparison

• Flexibility

[2]

Page 19: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Tool Comparison

• Learning curve[2]

Page 20: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Tool Comparison

[2]

• Environment

Page 21: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Tool Comparison

• Conclusion:• “There Are No Perfect Tools, Just Good Tools for People with Certain Goals”

Page 22: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Tool Comparison

• Conclusion

[2]

Page 23: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Charting Libraries

• Differentiate between• Charting libraries

• Applications (next lecture)

[1]

Page 24: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Tool Comparison

• Task

[2]

Page 25: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Tool Comparison

[2]

• Task

Page 26: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Charting Libraries

• Libraries• Seaborn• R• ggplot2• ggvis• matplotlib• bokeh• Vega-Lite• Vega• Processing• Highcharts• D3• D4• C3• NVD3

Page 27: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Charting Libraries

• Libraries• Python

• R

• ggplot

• Vega

• Processing

• JavaScript

Page 28: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Charting Libraries

• Libraries• Python (matplotlib, bokeh, Seaborn)

• R (R, ggvis)

• Vega (Vega, Vega-Lite)

• Chartbuilder

• Processing

• JavaScript (D3, D4, C3, NVD3)

Page 29: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Charting Libraries

• Libraries• Python (matplotlib, bokeh, Seaborn)

• R (R, ggvis)

• Vega (Vega, Vega-Lite)

• Chartbuilder

• Processing

• JavaScript (D3, D4, C3, NVD3)

Page 30: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Python

• Scripting language

• Open-source development

• Well-known programming language for data science

• Many plotting libraries available (plotly, matplotlib, Bokeh, Seaborn)

• Flexible, allow explorative data analysis

Page 31: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Python

Pandas Seaborn bokeh

[6]

Page 32: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Python

Pandas

Seaborn

bokeh

[6]

Page 33: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Python

• Pandas: for simple plots

• Seaborn: more complex visualization, requires matplotlib knowledge

• ggplot: lot of promise

• bokeh: robust tool, overkill for simple scenarios

• pygal: interactive svg graphs, but not as flexible as others

• Plotly: for highly interactive graphs and rich web-based visualizations

[7]

Page 34: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Python

[2]

• Task

Page 35: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Python

• Flexibility

[2]

Page 36: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Python

• Learning curve[2]

Page 37: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Charting Libraries

• Libraries• Python (matplotlib, bokeh, Seaborn)

• R (R, ggvis)

• Vega (Vega, Vega-Lite)

• Chartbuilder

• Processing

• JavaScript (D3, D4, C3, NVD3)

Page 38: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

R

• Scripting language

• Open-source development

• Well-known programming language for data scienceFlexible, allowexplorative data analysis

Page 39: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

R

• Graphics package for basic charts

• ggplot2 for more advanced visualizations

https://rkabacoff.github.io/datavis/Models.html

Page 40: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Charting Libraries

• Libraries• Python (matplotlib, bokeh, Seaborn)

• R (R, ggvis)

• Vega (Vega, Vega-Lite)

• Chartbuilder

• Processing

• JavaScript (D3, D4, C3, NVD3)

Page 41: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega

• Defines a visualization grammar

• Library for creating, saving, and sharing interactive visualizations

• Defines visualizations as JSON format

• Open-source

Page 42: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega

[8]

Page 43: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega

• JSON Format

• Display format (e.g., size)

• Data

• Scales (axes scales & visual mapping)

• Axes (orientation, ticks, labels)

• Marks (visual elements)

• Signals (interaction)

Page 44: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega

• Creating Vega JSONs• Vega editors

• Export funtionality of current tools/applications

Page 45: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega

• Interpretation & display• Vega viewers

• Embedded in web applications

[8]

Page 46: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega-Lite[9]

Page 47: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega-Lite

• Version 2 released in 2017, current version: 3

• Extends Vega grammar to add view composition and interaction(selection)

Page 48: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega-Lite

• Version 2 released in 2017, current version: 3

• Extends Vega grammar to add view composition and interaction(selection)• View Composition

• Subdivide data into groups and creates chart for every group (facet)

• Combine charts into one layout (concat, repeat)

Page 49: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega-Lite

• Demo

https://vega.github.io/editor/#/examples

Page 50: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega-Lite

• Version 2.0 released in 2017

• Extends Vega grammar to add view composition and interaction(selection)• View composition

• Subdivide data into groups and creates chart for every group (facet)

• Combine charts into one layout (concat, repeat)

• Interaction (selection)• Different types of selections

• In combination with view composition, enables linking&brushing

Page 51: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega-Lite

• Advantages• All you need to create a visualization

• Grammar-based specification

• Flexibility

• Transferability

• Disadvantages• Limited possibilities to create charts

• Understanding the JSON structure

Page 52: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega / Vega-Lite

[2]

• Task

Page 53: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega / Vega-Lite

• Flexibility

[2]

Page 54: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Vega / Vega-Lite

• Learning curve[2]

Page 55: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Charting Libraries

• Libraries• Python (matplotlib, bokeh, Seaborn)

• R (R, ggvis)

• Vega (Vega, Vega-Lite)

• Chartbuilder

• Processing

• JavaScript (D3, D4, C3, NVD3)

Page 56: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Chartbuilder

https://quartz.github.io/Chartbuilder/

Page 57: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Charting Libraries

• Libraries• Python (matplotlib, bokeh, Seaborn)

• R (R, ggvis)

• Vega (Vega, Vega-Lite)

• Chartbuilder

• Processing

• JavaScript (D3, D4, C3, NVD3)

Page 58: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Processing

• Visualization language built on Java

[10]

Page 59: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Processing

• Visualization language built on Java

• JavaScript interpretator available for web-based usage

https://processing.org/examples/penrosetile.html

Page 60: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Charting Libraries

• Libraries• Python (matplotlib, bokeh, Seaborn)

• R (R, ggvis)

• Vega (Vega, Vega-Lite)

• Chartbuilder

• Processing

• JavaScript (D3, D4, C3, NVD3)

Page 61: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

VU Visualisierung 1 (186.827)d3 Tutorial

https://www.cg.tuwien.ac.at/courses/Visualisierung1/VU.html

Manuela Waldner

Institute of Computer Graphics and Algorithms, TU Wien, Austria

Page 62: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge
Page 63: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge
Page 64: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge
Page 65: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge
Page 66: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge
Page 67: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge
Page 68: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge
Page 69: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge
Page 70: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge
Page 71: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Processing vs. D3

• Processing easier to learn and better for making quick prototypes

• D3 not suited for making quick prototypes

• D3 has a steep learning curve, but a large community for gettinghelp/ideas

• More tools available for D3

• Both libraries allow to publish results online

Page 72: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

JavaScript

[2]

• Task

Page 73: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

JavaScript

[2]

• Flexibility

Page 74: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

JavaScript

• Learning curve [2]

Page 75: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Web-based Visualisation

• Client-server environment

• Necessary to transfer data to client

• Data processing/visualisation done on the client

Page 76: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Conclusion

• Charting libraries require programming skills

• No fully-featured applications (e.g., data loading)

• Limited support for interaction (e.g., multiple views)

• More flexibility

• Better integration of analysis and visualization

• Open source

Page 77: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

Next lecture

• Comparison / evaluation of applications for visual data science

• Please note: different lecture room

Page 78: Applications & Libraries - TU Wien › courses › VisDataScience › ... · Python •Pandas: for simple plots •Seaborn: more complex visualization, requires matplotlib knowledge

References

[1] https://practicalanalytics.files.wordpress.com/2012/01/implementingbusinessanalytics.png

[2] https://source.opennews.org/articles/what-i-learned-recreating-one-chart-using-24-tools/

[3] Michael Behrisch, Dirk Streeb, Florian Stoffel, Daniel Seebacher, Brian Matejek Stefan Hagen Weber, Sebastian Mittelstadt, Hanspeter Pfister, Dand aniel Keim: Commercial Visual Analytics Systems - Advances in the Big Data Analytics Field. VIS 2018

[4] https://lisacharlotterost.github.io/pic/160425_gapminder.png

[5] https://lisacharlotterost.github.io/2016/05/17/one-chart-tools/

[6] https://codeburst.io/overview-of-python-data-visualization-tools-e32e1f716d10

[7] http://pbpython.com/visualization-tools-1.html

[8] https://vega.github.io/vega/tutorials/bar-chart/

[9] Arvind Satyanarayan, Dominik Moritz, Kanit Wongsuphasawat, and Jeffrey Heer: Vega-Lite: A Grammar of Interactive Graphics. IEEE TVCG 23(1), 2017

[10] https://processing.org/examples/pattern.html