Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh,...

27
Interactive Data Visualization 11/19/19 Mark Grivainis

Transcript of Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh,...

Page 1: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Interactive Data Visualization

11/19/19Mark Grivainis

Page 2: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Overview

What is Interactive Data Visualization

Common Interactive Visualization Techniques

What Tools Exist for Interactive Visualization

Working with Bokeh

Page 3: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

What is Interactive Data Visualization

Interactive Data Visualization allows for real time queries to be made on plots

The underlying visualizations tend to be standard figures - bar plots, scatter plots, heatmaps etc.

Adding interactions allow for data to be explored more thoroughly

You would want to start with a solid static figure before adding interactions

Page 4: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Different Types of Interaction

Identification (Hovering)

Scaling (Zooming)

Selection (Brushing)

Linking

Page 5: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Available Tools for Interactive Visualization

Python: Bokeh, Plotly, Matplotlib

R: Shiny

Javascript: D3

Most of these tools rely on HTML and Javascript for rendering of plots

If you want to create a non standard plot:

Learn Javascript

Use D3

Page 6: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

What is Bokeh

Interactive visualization library for Python

Works with large datasets

Simplifies the process of creating:

Interactive plots

Dashboards

Data applications

Page 7: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Installing Bokeh

Bokeh is not part of the Python Standard Library

It can be installed using pip or conda (conda is prefered)

conda install bokeh

You can either install into your base environment or create a new environment

conda create -n vis python=3.6 bokeh jupyter pandas numpy

Page 8: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Using Bokeh Output

Bokeh has three output modes:

Server Mode

Static HTML- output_file()

Notebook- output_notebook()

https://docs.bokeh.org/en/1.4.0/docs/reference/server.html?highlight=server#module-bokeh.server

Page 9: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Defining a figure

from bokeh.plotting import figure, showfrom bokeh.io import output_notebook

output_notebook()

p = figure()

show(p)

Page 10: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Bokeh Input Data

Providing Data Directly

1. from bokeh.plotting import figure, show2. from bokeh.io import output_notebook3.4. output_notebook()5.6. x_values = [1, 2, 3, 4, 5]7. y_values = [6, 7, 2, 3, 6]8.9. p = figure()

10.11. p.scatter(x=x_values, y=y_values)12. show(p)

Page 11: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Using ColumnDataSource

1. from bokeh.plotting import figure, show2. from bokeh.io import output_notebook3. from bokeh.models import ColumnDataSource4.5. output_notebook()6.7. data = {'x_values': [1, 2, 3, 4, 5],8. 'y_values': [6, 7, 2, 3, 6]}9.

10. source = ColumnDataSource(data=data)11.12. p = figure()13. p.scatter(x='x_values', y='y_values', source=source)14. show(p)

Page 12: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Using ColumnDataSource and Pandas

1. from bokeh.plotting import figure, show2. from bokeh.io import output_notebook3. from bokeh.models import ColumnDataSource4. import pandas as pd5.6. output_notebook()7.8. data = {'x_values': [1, 2, 3, 4, 5],9. 'y_values': [6, 7, 2, 3, 6]}

10.11. df = pd.DataFrame.from_dict(data)12.13. source = ColumnDataSource(df)14.15. p = figure()16. p.scatter(x='x_values', y='y_values', source=source)17. show(p)

Page 13: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Built in Plot Types

line multiline vbar scatter

hbar image hex_tile

A full list is available in the documentation here

There are no prebuilt statistical plots

Eg: Boxplot, heatmaps

Many of these plots are not complicated to generate

Build your own package defining them that can be used across projects

Page 14: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Adding Hover Functionality1. from bokeh.plotting import figure, show2. from bokeh.io import output_notebook3.4. output_notebook()5.6. source = ColumnDataSource(data=dict(7. x=[1, 2, 3, 4, 5],8. y=[2, 5, 8, 2, 7],9. desc=['A', 'b', 'C', 'd', 'E'],

10. ))11.12. TOOLS = 'hover,pan'13.14. p = figure(tools=TOOLS, tooltips=TOOLTIPS)15. p.scatter('x', 'y', source=source)16. show(p)

Page 15: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Adding Hover Functionality1. from bokeh.plotting import figure, show2. from bokeh.io import output_notebook3.4. output_notebook()5.6. source = ColumnDataSource(data=dict(7. x=[1, 2, 3, 4, 5],8. y=[2, 5, 8, 2, 7],9. desc=['A', 'b', 'C', 'd', 'E'],

10. ))11.12. TOOLS = 'hover,pan'13. TOOLTIPS = [14. ("index", "$index"),15. ("(x,y)", "($x, $y)"),16. ("desc", "@desc"),17. ]18.19. p = figure(tools=TOOLS, tooltips=TOOLTIPS)20. p.scatter('x', 'y', source=source)21. show(p)

Page 16: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

The autompg Dataframe

mpg cyl displ hp weight accel yr origin name18.0 8 307.0 130 3504 12.0 70 1 chevrolet chevelle malibu15.0 8 350.0 165 3693 11.5 70 1 buick skylark 32018.0 8 318.0 150 3436 11.0 70 1 plymouth satellite16.0 8 304.0 150 3433 12.0 70 1 amc rebel sst17.0 8 302.0 140 3449 10.5 70 1 ford torino ... ... ... ... ... ... .. ... ...27.0 4 140.0 86 2790 15.6 82 1 ford mustang gl44.0 4 97.0 52 2130 24.6 82 2 vw pickup32.0 4 135.0 84 2295 11.6 82 1 dodge rampage28.0 4 120.0 79 2625 18.6 82 1 ford ranger31.0 4 119.0 82 2720 19.4 82 1 chevy s-10

Page 17: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Summarizing a Dataframe

from bokeh.sampledata.autompg import autompg as df

mpg = df.groupby('cyl').describe()['mpg']acc = df.groupby('cyl').describe()['accel']

print(mpg.to_string(max_rows=10)print(acc.to_string(max_rows=10)

mpg count mean std min 25% 50% 75% maxcyl 3 4.0 20.550000 2.564501 18.0 18.75 20.25 22.05 23.74 199.0 29.283920 5.670546 18.0 25.00 28.40 32.95 46.65 3.0 27.366667 8.228204 20.3 22.85 25.40 30.90 36.46 83.0 19.973494 3.828809 15.0 18.00 19.00 21.00 38.08 103.0 14.963107 2.836284 9.0 13.00 14.00 16.00 26.6

accel count mean std min 25% 50% 75% maxcyl 3 4.0 13.250000 0.500000 12.5 13.25 13.5 13.5 13.54 199.0 16.581910 2.383185 11.6 14.80 16.2 18.0 24.85 3.0 18.633333 2.369247 15.9 17.90 19.9 20.0 20.16 83.0 16.254217 2.031778 11.3 15.05 16.0 17.6 21.08 103.0 12.955340 2.224759 8.0 11.50 13.0 14.0 22.2

Page 18: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

ColumnDataSource on a Group

from bokeh.sampledata.autompg import autompg as df

df['yr'] = df['yr'].astype(str)group = df.groupby('yr')source = ColumnDataSource(group)print(source.to_df().to_string(max_cols=10, index=False, max_rows=6))

yr mpg_count mpg_mean mpg_std mpg_min accel_min ... accel_25% accel_50% accel_75% accel_max 70 29.0 17.689655 5.339231 9.0 8.0 ... 10.000 12.5 15.000 20.5 71 27.0 21.111111 6.675635 12.0 11.5 ... 13.250 14.5 15.500 20.5 72 28.0 18.714286 5.435529 11.0 11.0 ... 13.375 14.5 16.625 23.5.. ... ... ... ... ... ... ... ... ... ... 80 27.0 33.803704 6.885854 19.1 11.4 ... 15.150 16.5 18.750 23.7 81 28.0 30.185714 5.635319 17.6 12.6 ... 14.700 16.3 17.425 20.7 82 30.0 32.000000 5.232524 22.0 11.6 ... 14.775 16.3 17.900 24.6

Page 19: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Categorical Data

from bokeh.sampledata.autompg import autompg as dffrom bokeh.plotting import figure, showfrom bokeh.models import ColumnDataSourcefrom bokeh.io import output_notebook

output_notebook()

df['yr'] = df['yr'].astype(str)group = df.groupby('yr')source = ColumnDataSource(group)p = figure(x_range=group)p.vbar(x='yr',

top='mpg_mean', width=0.8, source=source)

show(p)

Page 20: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Coloring Plotsfrom bokeh.sampledata.autompg import autompg as dffrom bokeh.plotting import figure, showfrom bokeh.models import ColumnDataSourcefrom bokeh.io import output_notebookfrom bokeh.palettes import d3from bokeh.transform import factor_cmap

output_notebook()

df['yr'] = df['yr'].astype(str)group = df.groupby('yr')source = ColumnDataSource(group)

fm = factor_cmap('yr', palette=d3['Category20'][13], factors=df['yr'].unique())

p = figure(x_range=group)p.vbar(x='yr', top='mpg_mean', width=0.8, color=fm, source=source)show(p)

Page 21: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Gridsfrom bokeh.sampledata.autompg import autompg as dffrom bokeh.plotting import figure, showfrom bokeh.layouts import column, gridplotfrom bokeh.models import ColumnDataSource, Gridfrom bokeh.io import output_notebookfrom itertools import product

def build_figure(title, x_lab, y_lab, source): p = figure(plot_width=300, plot_height=300) p.scatter(x=x_lab, y=y_lab, source=source) p.xaxis.axis_label = x_lab p.yaxis.axis_label = y_lab return p

output_notebook()

COMPARE = ['mpg', 'hp', 'weight']source = ColumnDataSource(df[COMPARE])GRID_W = len(COMPARE)

plots = [build_figure('', x, y, source) for y, x in product(COMPARE, repeat=2)]grid = gridplot(plots, ncols=GRID_W)

show(grid)

Page 22: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Gridsfrom bokeh.sampledata.autompg import autompg as dffrom bokeh.plotting import figure, showfrom bokeh.layouts import column, gridplotfrom bokeh.models import ColumnDataSource, Gridfrom bokeh.io import output_notebookfrom itertools import product

TOOLS = "box_select,lasso_select,help"

def build_figure(title, x_lab, y_lab, source): p = figure(plot_width=300, plot_height=300, tools=TOOLS) p.scatter(x=x_lab, y=y_lab, source=source) p.xaxis.axis_label = x_lab p.yaxis.axis_label = y_lab return p

output_notebook()

COMPARE = ['mpg', 'hp', 'weight']source = ColumnDataSource(df[COMPARE])GRID_W = len(COMPARE)

plots = [build_figure('', x, y, source) for y, x in product(COMPARE, repeat=2)]grid = gridplot(plots, ncols=GRID_W)

show(grid)

Page 23: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Linked Plots

This examples code was too long to put in a slide

https://demo.bokeh.org/selection_histogram

Source Code

https://demo.bokeh.org/selection_histogram

Page 24: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Getting your Figure Online

It is easy to host static html content on GitHub

Use output_file(‘index.html’) to save your Bokeh plot as an html file- ‘index.html’ is always loaded by default, it must be the entry point- In this case it is easy as it is the only html file

Upload this file to the master branch of a GitHub repository

Navigate: Settings -> GitHub Pages -> set Source to ‘master branch’

Note: This will not work very well with datasets that are large as the data needs to be downloaded before it can be plotted

Page 25: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Example: Bokeh Conda Environment

Open a Terminal window (Mac) or Anaconda Prompt (Windows)

conda create -n ivis python=3.6 bokeh jupyter numpy pandas

conda activate ivis

jupyter notebook

Page 26: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

Example: Building a Boxplot

https://en.wikipedia.org/wiki/Box_plot#/media/File:Boxplot_vs_PDF.svg

Page 27: Interactive Data Visualizationfenyolab.org/presentations/Methods_2019/slides... · Python: Bokeh, Plotly, Matplotlib R: Shiny Javascript: D3 Most of these tools rely on HTML and Javascript

References

https://www.knowablemagazine.org/article/mind/2019/science-data-visualization

http://docs.bokeh.org/en/1.3.2/index.html

http://docs.bokeh.org/en/1.3.2/docs/user_guide/data.html