Introduction to Scientific Programming: Python for...

33
Introduction to Scientific Programming: Python for Biologists Ian Stokes-Rees SBGrid Harvard Medical School May 2009 I. Stokes-Rees (SBGrid) Intro to Programming May 2009 1 / 33

Transcript of Introduction to Scientific Programming: Python for...

Page 1: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Introduction to Scientific Programming:Python for Biologists

Ian Stokes-Rees

SBGridHarvard Medical School

May 2009

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 1 / 33

Page 2: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Value Proposition

What is the motivation for you, as a scientist, to developcomputer programming skills?

Data analysis – both numerical and graphicalDevelop algorithms, system models, and simulationsTie together multiple applications and data formats into anintegrated workflowEliminate time consuming, error-prone, repetitive manual tasksDevelop web-based interfaces to applications, algorithms, anddata

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 2 / 33

Page 3: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Why Python?

Python is a very flexible performant interpreted programming language

Can be used for quick scripting tasks, much like Perl, and morepowerful than shell scriptsRich object oriented features which allow much larger applicationsto be developedMany packages (extension libraries) which assist in buildinggraphical interfacesBatteries Included means it comes with over 380 pre-installedpackagesFreely available for all platforms (Windows, Mac, Linux, Sun, ...)Dynamic typing makes it much easier to write programs

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 3 / 33

Page 4: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Basic Model: Programs as Mathematical Functions

x yf(x)

System: y = f(x)

Function orFilter orProgram

Input Output

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 4 / 33

Page 5: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Simple Example: Square

f (x) = x2

x f (x)

1 12 43 94 16

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 5 / 33

Page 6: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

More Advanced Model:Programs as Functions with Control

x y

Function orFilter orProgram

Input Output

f(x,v)

System: y = f(x,v)

v Control

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 6 / 33

Page 7: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Simple Example: Power

f (x , v) = xv

x v f (x , v)

1 2 12 2 43 2 91 3 12 3 83 3 27

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 7 / 33

Page 8: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

What is a programming language? (I)

It defines a grammar for manipulating data, forming (mathematical)expressions, and specifying a defined sequence of logical operations

data the most basic kinds of data are textual (also called strings),integers, and decimal values (also called floating point or floats)expressions rules for constructing basic mathematicalexpressions2 + 32 × 3.14(2 + 7 − 12 + 8)/4.0logic operations can be used inside expressions (and, or,not) and to provide control flow within a program (if, while,for)

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 8 / 33

Page 9: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

What is a programming language? (II)

There are other important aspects that make up a “complete”programming language

variables a mechanism to assign a name to an expressionpi = 3.14username = "john doe"

functions ways to generalize and abstract a set of expressionsand operations so they can be reuseddata sets support for sets of data items (sometimes called arrays,or sequences, or lists)user defined data types mechanisms to define data typesbeyond string, integer, float

modules ways to encapsulate groups of functions and data typesfor reuse

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 9 / 33

Page 10: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Simple Literal Expressions

String literal "Here we are."Integer literal 42Floating point literal 3.14Boolean literal True

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 10 / 33

Page 11: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Hold on, I thought we were learning about computerprograming ...

It is possible to see (and perhaps even write) a computer program without reallyunderstanding the components of the language.

By way of analogy, for spoken languages, many native speakers only implicitlyunderstand the grammar from years of learning by example. New speakers want(and usually need) to understand the grammar rules to properly use thelanguage.

Grammars in computer programming languages are, generally, very strict. Youhave to know exactly what each part of the syntax is referring to and“statements” (expressions) must always be correctly constructed.

Most programming languges contain the same elements with slightly differntsyntax. My goal in this introductory lecture is to present these commonelements.

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 11 / 33

Page 12: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Compound Literal Expressions

These are only a little more interesting than what you can do on acalculator

String concatenation "Today is:" + "Wednesday"Addition 3 + 8Area of two circles (3.14 ∗ 9.7 ∗ ∗2) + (3.14 ∗ 5.4 ∗ ∗2)Boolean expression 2009 > 2000

In Python, the power operator is ∗∗

Notice the compound expressions use operators (+, ∗, >, ∗∗) and alsospecial characters such as brackets to join simple literal expressionstogetherdemo

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 12 / 33

Page 13: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Program Parameters

The first step towards writing a program that is more interesting than apocket calculator is to introduce parameterization

Program parameters are also called named variables or justvariablesIn the most basic form, a variable provides a named reference toa simple literal expressionThe two parts of a variable are its name and valueVariables can be used in expressions in place of the value theycontainThe value of a variable, as the name implies, can be changed

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 13 / 33

Page 14: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Simple Variables

The = operator is used to assign the value of the expression on theright to the named variable on the left

a = 3b = 8a + b

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 14 / 33

Page 15: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Some More Examples of Variables

a = 3b = 8a + b + 73 + b

c = a * 6

a = 5c = a * 6

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 15 / 33

Page 16: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

A Full Program: square.py

Python:

x = 3

y = x**2print y

Perl:

$x = 3;

$y = $x**2;print "$y\n";

C:

#include <stdio.h>#include <math.h>

int main() {double x, y;

x = 3.0;

y = pow(x,2.0);printf("%3.1f\n", y);

return 0;}

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 16 / 33

Page 17: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

A Full Program: power.py

Python:

x = 3v = 2

y = x**vprint y

Perl:

$x = 3;$v = 2;

$y = $x**$v;print "$y\n";

C:

#include <stdio.h>#include <math.h>

int main() {double x, v, y;

x = 3.0;v = 2.0;

y = pow(x, v);printf("%3.1f\n", y);

return 0;}

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 17 / 33

Page 18: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Conditional Statements

After variables, the next step in developing more interesting programsis introducing conditional statements that can affect the flow ofcontrol

The if statement, and its associated elif and else statementscontrol whether or not the program executes the nested programblocksConditional statements test a boolean expression which is anexpression that can be interpreted as True or FalseBoolean comparators such as >, <,>=, <=, ==, ! =, andoperators such as and, or, not are used to form theseexpressions

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 18 / 33

Page 19: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Control Flow Example

This program checks to see if the power is 2If so, it simply multiplies the base by itselfIf not, it uses the ∗∗ operator to raise the base to the specifiedpowerNotice the colon: this indicates the end of the boolean expressionand the start of the nested program blockNotice the indentation: this is how Python associates a nestedprogram block with a particular condition

x = 3v = 2

if v == 2:y = x*x

else:y = x**v

print y

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 19 / 33

Page 20: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Control Flow Example 2

This program demonstrates a single conditional blockIt prints out Still winter if the date is between December 21and March 21It always prints out the (approximate) day of the year, and(approximate) days since 0 AD

year = 2009month = 5day = 1

day_of_year = (month * 30) + dayprint day_of_year

if day_of_year > 354 or day_of_year < 79:print "Still winter"

approx_days = (year * 365) + (month * 30) + dayprint approx_days

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 20 / 33

Page 21: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Control Flow Example 3

This programdemonstrates multipleconditional blocks

It sorts and counts usersby age and gender

We’ll have to wait untilwe’ve discussed lists andloops for this to be useful

name = "Jane"gender = "female"age = 17

child_cnt = 0female_cnt = 0male_cnt = 0error_cnt = 0

if age <= 18:child_cnt = child_cnt + 1

elif gender == "female":female_cnt = female_cnt + 1

elif gender == "male":male_cnt = male_cnt + 1

else:error_cnt = error_cnt + 1

print child_cnt, female_cnt, male_cnt, error_cnt

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 21 / 33

Page 22: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

A Note on the Importance of Semantics

Can you tell what thisprogram does?

Why not?

n = "Jane"g = "f"a = 17

c1 = 0c2 = 0c3 = 0c4 = 0

if a <= 18:c1 = c1 + 1

elif g == "f":c2 = c2 + 1

elif g == "m":c3 = c3 + 1

else:c4 = c4 + 1

print c1, c2, c3, c4

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 22 / 33

Page 23: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Sets of Data

All programming languages have a concept of sets of dataIn Python, the most common kind of data set is a listYou can add and remove data from a Python listElements in the list are referenced by their positional index,starting from ZERO

grades = [73.5, 88.2, 67.8, 80.5, 75.8]fruit = ["apple", "organge", "pear"]

print grades[0]print fruit[2]

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 23 / 33

Page 24: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Loops

All programming languages have a concept of loopingIn Python, the most common kind of loop is a for loopA Python for loop iterates over elements of a list and assigns themin turn to a variable

for g in grades:print gif g > 80.0:

print "Grade A student"

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 24 / 33

Page 25: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Files

All programming languages have mechanisms to work with filesIn Python, the built-in open() function is usedThis is the first time we’ve seen a function!When the program is done working with a file, it must be closedwith the close() methodThis is the first time we’ve seen a method! (which is only subtlydifferent from a function)

data_fh = open("myinput.dat", "r")for line in data_fh:

print linedata_fh.close()

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 25 / 33

Page 26: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Reading and Processing Data

We now have almost all the piecesnecessary to write a small but usefulprogram that reads data from a file,collects it, and processes it

The program reads in a table of datawhere each row corresponds to astudent record

Grades are extracted using thesplit() method which divides astring based on white space

The grades are collected into a list andwhen the program is done reading thefile it will calculate the class average

NOTE There are some new aspectshere which will be discussed later

grades = []

data_fh = open("classrecord.dat","r")for student_record in data_fh:

columns = student_record.split()grade = float(columns[3])grades.append(grade)

data_fh.close()

total = 0.0for g in grades:

total = total + g

average = total/len(grades)

print average

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 26 / 33

Page 27: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Getting Python

Python Download: http://www.python.org/download/Already available on almost all Linux, Unix, and Sun systems

Try python -V from the command line to checkIf not, ask your sys admin to install it, or do it yourself (10 MBsource distribution)

Pre-installed on Macs with OS X since 10.3.9May be an old version, so consider upgrading (2.5.2 is the latest)Otherwise, you can get it fromhttp://www.python.org/download/mac/ (18 MB DMG)

Windows users will need to download installer (11 MB)

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 27 / 33

Page 28: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Getting Python

Downloads from Python.org also include IDLE, a basic graphicaleditor for PythonActiveState provide a commerically packaged distribution ofPython with enhanced Help files, and some extra packages –otherwise it is identical to the Python.org distribution

http://www.activestate.com/Products/activepython/Base version available for free for most platforms

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 28 / 33

Page 29: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Why Python?

Very powerful language that is easy to learn and develop small,medium, and large programs withObject oriented language provides bridge to strongly typedcompiled languages such as Java, C++, and C#Rich set of “batteries-included” extension modules (about 400)Rich set of community contributed extension modules (about4000)

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 29 / 33

Page 30: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Great Features to Explore

Interactive interpreterMatplotlib – Matlab-like replacement module in pure PythonBiopython – Python module and tools to support computationalbiologySage Math and Sage Notebook

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 30 / 33

Page 31: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Summary

Many different reasons why programming skills are valuable to the21st century scientistProgramming languages share many of the same featuresProgramming is a way of thinking about problems, models, dataand algorithmsNot just memorizing the syntax of a specific languageThat said, the grammar/syntax of a language is necessary to learnin order to use itPrograms can be divided into data and logicThe data part consists of a type system with built-in anduser-defined typesThe logic part consists expressions, operators, comparators,conditionals, loops, and functions

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 31 / 33

Page 32: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Python Code Editorsa.k.a Integrated Development Environments (IDEs)

IDLE Basic Python GUI editor, comes with python distribution (free)

PyDev Extension for Eclipse http://pydev.sourceforge.net/

XCode From Apple, only for OS X (free)http://developer.apple.com/tools/xcode/

Wing 101 free for educators, otherwise $35-$180http://www.wingware.com/

Komodo From ActiveState $300http://www.activestate.com/Products/komodo ide

... or just use a text editor: vim, emacs, SciTE, pico

Recommendations:

For small jobs, a text editor or IDLE are sufficient

For big jobs/applications PyDev with Eclipse is a powerful free option, but steeplearning curve for Eclipse http://eclipse.org http://pydev.sf.net

If you have the cash, Komodo is a very nice Python IDE (I have no experiencewith Wing)

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 32 / 33

Page 33: Introduction to Scientific Programming: Python for Biologistsportal.sbgrid.org/training/intro/tex/python-1a.pdf · variables a mechanism to assign a name to an expression ... (3:14

Key Resources

http://python.org Python central: downloads, tutorials, APIs, packagereferences, language reference

http://docs.python.org/modindex Package index for API documentation of allincluded packages

http://pypi.python.org/pypi Package repository for community contributedpackages (almost 4000)

http://docs.python.org/tut Official Python Tutorial – highly recommended

http://docs.python.org/ref Python language reference

http://aspn.activestate.com/ASPN/Cookbook/Python/ Python Cookbook

http://numpy.scipy.org/ NumPy numeric python package

http://www.scipy.org/ SciPy scientific python package

http://biopython.org/ BioPython

http://www.pasteur.fr/formation/infobio/python/ Introduction to Python forBiologists (Pasteur Institute, France)

http://www.greenteapress.com/thinkpython/ How to Think Like a (Python)Programmer

I. Stokes-Rees (SBGrid) Intro to Programming May 2009 33 / 33