Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we...

15
Data Visualization in Python Violinplot. Michael Waskom. https://blog.modeanalytics.com/images/post-images/viz-libraries-02.png

Transcript of Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we...

Page 1: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Data Visualization in Python

Violinplot. Michael Waskom. https://blog.modeanalytics.com/images/post-images/viz-libraries-02.png

Page 2: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Visualizing Information

http://mapdesign.icaci.org/wp-content/uploads/2014/01/MapCarte32_ise_large.png

Guide for Visitors to Ise Shrine. Adapted from Edward R. Tufte: Envisioning Information

„All communication [...] [to] readers of an image must now take place on a two-dimensional surface.“

Page 3: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Graphical Excellence

Tableaux Graphiques et Cartes Figuratives de M. Minard. Adapted from Edward R. Tufte: The Visual Display of Quantitative Information

„Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency“

Page 4: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Tools for generating plots

• R (statistics software) + ggplot2

• Python with matplotlib or seaborn or ...

• Online tools: e.g. plot.ly

• (Excel)

=> Since you already familiar with Python syntax, why not use it directly for visualization?

Page 5: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Matplotlib

Material adapted from Matplotlib Pyplot tutorial

• 2D plotting library for Python• many plot types: scatter, histogram, bar...• You have control over everything

• Data in the plot• Fonts, styling• Axes• additional elements (lines, etc.)

• Large userbase offering examples, tutorials, help• Seaborn as addition for more advanced and nicer layouts• Website: matplotlib.org

Page 6: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Simple Usage Example

Material adapted from Matplotlib Pyplot tutorial

import matplotlib.pyplot as plt

xdata = [1, 2, 3, 4]ydata = [1, 4, 9, 16]plt.plot(xdata, ydata, “ro“)plt.axis([0,6,0,20])plt.show()plt.savefig(“scatterplot.pdf“)

Code for simple scatterplot

Import just one module from matplotlib Give it a shorter alias plt

“ro“ = red dots

Save generated plot as PDF

Page 7: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

A short note on libraries

Material adapted from Matplotlib Pyplot tutorial

• All functions we have used so far come from the Python standard library• is shipped with Python, no installation necessary• E.g. sys, random, os, math

• Matplotlib is an external library and needs to be installed• Fortunately, Python has a package system which makes this process easy

pip3 install --user matplotlib

System terminal command for installation of matplotlib(already installed on computer pool machines!)

pip = pip installs packageThis also works for other packages, e.g. seaborn

Page 8: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Workflow I: Choose plot type

x,y1,12,43,94,16

data.csv

http://matplotlib.org/users/screenshots.html

Bar chart -> bar() Scatter/Line plot-> plot()

?

Histogram-> hist() Anything else?

Page 9: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Workflow II: Prepare Data

x,y1,12,43,94,16

data.csv

xdata, ydata = [], []with open(‘data.csv’, ‘r’) as f:

for line in f:if not line.startswith(‘x’):

x, y = line.strip().split(‘,’)xdata.append(x)ydata.append(y)

print(xdata, ydata)

[1,2,3,4], [1,4,9,16]

Storing your data in lists is a good idea *

Page 10: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Workflow III: construct basic plot• Read the documentation of the plot function

• E.g. for scatter plot -> pyplot.plot()• matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot

• The plot() function is very flexible concerning its input

import matplotlib.pyplot as pltxdata = [1,2,3,4]ydata = [1,4,9,16]plt.plot(xdata,ydata)

...plt.plot(xdata,ydata,’ro’)

import matplotlib.pyplot as pltdata = [1,2,3,4]plt.plot(data)

Page 11: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Workflow III: construct basic plot• In our case, we want to plot the X against the Y values

• dots looks best for this application• Also, let’s use some additional data

import matplotlib.pyplot as pltxdata = [1,2,3,4,5,6,7,8]ydata = [1,4,9,16,25,36,49,64]plt.plot(xdata, ydata, ‘go’)plt.savefig(‘scatter.pdf’)

This time, it‘s green dots

However, several things could be improved...

We don‘t have a title

No axis labels

Would be nice to have larger dots and a line

First/last data points almost hidden

Page 12: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Workflow IV: Add elements

import matplotlib.pyplot as pltxdata = [1,2,3,4,5,6,7,8]ydata = [1,4,9,16,25,36,49,64]plt.plot(xdata, ydata, color=‘g’, \

marker=‘o’, markersize=10)plt.title(“Square Function”)plt.xlabel(“X”)plt.ylabel(“Y”)plt.xlim([0,9])plt.ylim([-1,70])plt.savefig(‘scatter.pdf’)

Standard line and additional markers (dots) for data points larger markers(points)

Adds title and axis labels

Adjusts axis range

Page 13: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Practical Example: Bar Plots

Page 14: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

Practical Example: Bar Plots

import matplotlib.pyplot as plt

plt.style.use("grayscale")

data = {"Streptomyces":72, \"Halobacterium":67, "Plasmodium":20}

positions = range(len(data))

plt.barh( \positions, \list(data.values()), \align="center", \alpha=0.4 \

)

plt.yticks(positions, list(data.keys()))plt.title("Average GC Content by Organism")plt.xlabel("GC Content in %")plt.tight_layout()plt.show()

predefined gray color scheme

gc contents data (x-axis)

bar positions (y-axis)

horizontal bar chart. For vertical, use bar()

Add ticks/labels for y-axis

Auto optimize space

Page 15: Data Visualization in Python - Biotec · Workflow III: construct basic plot • In our case, we want to plot the X againstthe Y values • dots looks bestfor this application •

15

Summary

• Visualizaion supports our understanding

• Different Python libraries and modules for visualization

• Popular Matplotlib‘s Pyplot module for creating 2-dimensional plots (Scatter, Bar...)

• Matplotlib need to be installed using pip3(Python Package Manager)

• Many different ways to prepare data and to configure output

• Pyplot‘s styles for uniform plot styling