Python Visualisation for Data Science

17
Data Vis for Data Science Usage of Python Visualisation Libraries Amit Kapoor @amitkaps

Transcript of Python Visualisation for Data Science

Page 1: Python Visualisation for Data Science

Data Vis for Data ScienceUsage of Python Visualisation Libraries

Amit Kapoor@amitkaps

Page 2: Python Visualisation for Data Science

Data Science Pipeline

— Frame: Problem definition

— Acquire: Data ingestion

— Refine: Data wrangling

— Transform: Feature creation

— Explore: Feature selection

— Model: Model creation & assessment

— Insight: Solution communication

Page 3: Python Visualisation for Data Science

Role of Visualisation

— Frame: Structuring (issue tree, hypotheses)

— Acquire: Loading (progress, errors)

— Refine: Profiling (missing values, outliers)

— Transform: Univariate & Bivariate Vis (1D, 2D)

— Explore: Multi Dimensional Vis (3D ... ND)

— Model: Model Vis (predictions, errors, models)

— Insight: Vis Comm (chart, narrative, dashboard)

Page 4: Python Visualisation for Data Science

Understanding Visualisation

— Domain & Task Layer e.g. Tabular Data for EDA

— Data Layer e.g. Data Types, Transformation

— Visual Layer e.g. Encoding, Marks, Coordinate

— Annotation Layer e.g. Labels, Ticks, Titles

— Interaction Layer e.g. Filtering, Highlighting, Selection

Page 6: Python Visualisation for Data Science

Choosing a Visualisation Library

— Ease of Learning: How hard is the API?

— Coverage: How many graphic types can it cover?

— Approach: Is it Charting or Grammar based?

— Documentation: How easy is it to make basics graphs?

— Community Support: How hard is it to make complex graphs?

Page 7: Python Visualisation for Data Science

Notes in Circulation

year | type | denom | value | money | number |------- | -------| ------ | ------ | ------- | ------ |1977 | Notes | 0001 | 1 | 2.72 | 2.720 |1977 | Notes | 1000 | 1000 | 0.55 | 0.001 |1977 | Notes | 0002 | 2 | 1.48 | 0.740 |1977 | Notes | 0050 | 50 | 9.95 | 0.199 |... | ... | ... | ... | ... | ... |2015 | Notes | 0500 | 500 | 7853.75 | 15.708 |2015 | Notes | 0001 | 1 | 3.09 | 3.090 |2015 | Notes | 0010 | 10 | 320.15 | 32.015 |2015 | Notes | 1000 | 1000 | 6325.68 | 6.326 |

Page 8: Python Visualisation for Data Science

Use Pandas for Base Plotting

# Loading Dataimport pandas as pdnotes = pd.read_csv('notes.csv')

# Data Transformationnotes_wide = pd.pivot_table(data = notes, index="year", columns="denom", values="money")

# Plottingnotes_wide.plot(kind="line")

Page 9: Python Visualisation for Data Science
Page 10: Python Visualisation for Data Science

Use Matplotlib for Annotation

# Basic Stylingimport matplotlib.pyplot as petplt.rcParams['figure.figsize'] = (9,6)plt.style.use('ggplot')

# Plottingnotes_wide.plot(kind="line")

# Adding Annotationplt.ylabel('Value INR Bns')plt.title('Notes in Circulation')

Page 11: Python Visualisation for Data Science
Page 12: Python Visualisation for Data Science

Ideally use ggplot like R

from plot import *ggplot(notes, aes(x='year', y='money', color='denom')) + / geom_line()

Page 13: Python Visualisation for Data Science

Use Altair for Grammar Visualisation

from altair import Chart

Chart(notes).mark_line().encode( x='year:N', y='money', color='denom')

Page 14: Python Visualisation for Data Science
Page 15: Python Visualisation for Data Science

Personal Usage

— Use Pandas for base plotting and time series

— Use Matplotlib for matrices and customisation

— Use Seaborn for 1D & 2D statistical graphs, especially categorical variable

— Use IPython Widgets for model interaction

— Use Datashader for Big Data Visualisation

— Experimenting with Altair

Page 16: Python Visualisation for Data Science

What about interactivity?

— Watch out for Altair - Interaction will be build in soon

— Use Bokeh for web-based interactive dashboard, but require learning a different API

— Use Plotly for creating full interactive charts. Integration with Matplotlib available.

Page 17: Python Visualisation for Data Science

Get in touch with me

Amit [email protected]