University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11,...

44
Python for Scientific Research Bram Kuijper University of Exeter, Penryn Campus, UK February 11, 2020

Transcript of University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11,...

Page 1: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Python for Scientific Research

Bram Kuijper

University of Exeter, Penryn Campus, UK

February 11, 2020

Page 2: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Acknowledgements

I This course is funded by UExeter’s Institute for DataScience and Artificial Intelligence: IDSAI

I Big thanks to JJ Valletta as he has developed theselectures

I Big thanks to Deepak Kumar Panda for helping out thisafternoon

Page 3: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Course Schedule

I Tuesday Feb 4: The basics of programming in PythonI how to run Python codeI data typesI flow controlI functions and modules

I Tuesday Feb 11: Applying Python to simplify your lifeI text manipulation and regular expressionsI working with files and streamsI number crunching with numpy and scipy

I Tuesday Feb 25: Advanced subjectsI working with data using pandasI making graphs using matplotlibI data visualisation with seaborn

Page 4: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Course Schedule

I Tuesday Feb 4: The basics of programming in PythonI how to run Python codeI data typesI flow controlI functions and modules

I Tuesday Feb 11: Applying Python to simplify your lifeI text manipulation and regular expressionsI working with files and streamsI number crunching with numpy and scipy

I Tuesday Feb 25: Advanced subjectsI working with data using pandasI making graphs using matplotlibI data visualisation with seaborn

Page 5: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Course Schedule

I Tuesday Feb 4: The basics of programming in PythonI how to run Python codeI data typesI flow controlI functions and modules

I Tuesday Feb 11: Applying Python to simplify your lifeI text manipulation and regular expressionsI working with files and streamsI number crunching with numpy and scipy

I Tuesday Feb 25: Advanced subjectsI working with data using pandasI making graphs using matplotlibI data visualisation with seaborn

Page 6: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Schedule Tue Feb 4

I Morning session, 0900 - 1100 DDM IT 3.037I 0900 - 0930: How to run PythonI 0930 - 1000: Data typesI 1000 - 1010: BreakI 1010 - 1100: Data types practical

I Afternoon session, 1300 - 1600 DDM IT 3.037I 1300 - 1300: Flow controlI 1330 - 1400: Flow control practicalI 1400 - 1410: BreakI 1410 - 1430: Flow control practical continuedI 1430 - 1500: FunctionsI 1500 - 1510: BreakI 1510 - 1600: Functions practical

Page 7: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Some important websites

I Course website:https://exeter-data-analytics.github.io

I Python documentation: https://docs.python.org

Page 8: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

What is Python?

I A scripted, high-level programming language created byGuido Van Rossum and named after Monty Python’s flyingcircus

I easy-to-use, highly standardized and with an emphasis onreadability of code

Page 9: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

What is Python?

I A scripted, high-level programming language created byGuido Van Rossum and named after Monty Python’s flyingcircus

I easy-to-use, highly standardized and with an emphasis onreadability of code

Page 10: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why use Python?

The TIOBE index is a measure of the popularity ofprogramming languages:

Page 11: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why Python?

I It is free! No licence costs

I Runs on many platforms (Mac, Windows, Linux)

I Because of its ease of programming, Python minimisesdevelopment effort

I A huge number of libraries, written by an active community

I Python can “glue” together functions written in C/C++ andFortran to speed things up (we can also call R andMATLAB functions)

I Compared to other high-level scientific languages such asMATLAB and R, Python offers a much wider range ofadditional functionality (e.g web and GUI development)

Page 12: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why Python?

I It is free! No licence costs

I Runs on many platforms (Mac, Windows, Linux)

I Because of its ease of programming, Python minimisesdevelopment effort

I A huge number of libraries, written by an active community

I Python can “glue” together functions written in C/C++ andFortran to speed things up (we can also call R andMATLAB functions)

I Compared to other high-level scientific languages such asMATLAB and R, Python offers a much wider range ofadditional functionality (e.g web and GUI development)

Page 13: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why Python?

I It is free! No licence costs

I Runs on many platforms (Mac, Windows, Linux)

I Because of its ease of programming, Python minimisesdevelopment effort

I A huge number of libraries, written by an active community

I Python can “glue” together functions written in C/C++ andFortran to speed things up (we can also call R andMATLAB functions)

I Compared to other high-level scientific languages such asMATLAB and R, Python offers a much wider range ofadditional functionality (e.g web and GUI development)

Page 14: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why Python?

I It is free! No licence costs

I Runs on many platforms (Mac, Windows, Linux)

I Because of its ease of programming, Python minimisesdevelopment effort

I A huge number of libraries, written by an active community

I Python can “glue” together functions written in C/C++ andFortran to speed things up (we can also call R andMATLAB functions)

I Compared to other high-level scientific languages such asMATLAB and R, Python offers a much wider range ofadditional functionality (e.g web and GUI development)

Page 15: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why Python?

I It is free! No licence costs

I Runs on many platforms (Mac, Windows, Linux)

I Because of its ease of programming, Python minimisesdevelopment effort

I A huge number of libraries, written by an active community

I Python can “glue” together functions written in C/C++ andFortran to speed things up (we can also call R andMATLAB functions)

I Compared to other high-level scientific languages such asMATLAB and R, Python offers a much wider range ofadditional functionality (e.g web and GUI development)

Page 16: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why Python?

I It is free! No licence costs

I Runs on many platforms (Mac, Windows, Linux)

I Because of its ease of programming, Python minimisesdevelopment effort

I A huge number of libraries, written by an active community

I Python can “glue” together functions written in C/C++ andFortran to speed things up (we can also call R andMATLAB functions)

I Compared to other high-level scientific languages such asMATLAB and R, Python offers a much wider range ofadditional functionality (e.g web and GUI development)

Page 17: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Horses for coursesI Python is becoming the de facto standard for exploratory

and interactive scientific research

BUT

I Python is no programming silver bullet

I Your application will dictate the tool (and a mixture of morethan one language is ok). For example:

I MATLAB excels at interfacing with hardware, e.g generatinghardware description language (HDL) code to configure anintegrated circuit board or connecting to a data acquisitioncard

I R is great for data wrangling and visualisation, andstatistical modelling

I C achieves the fastest runtimes, at the expense of a longdevelopment time

Page 18: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Horses for coursesI Python is becoming the de facto standard for exploratory

and interactive scientific research

BUT

I Python is no programming silver bullet

I Your application will dictate the tool (and a mixture of morethan one language is ok). For example:

I MATLAB excels at interfacing with hardware, e.g generatinghardware description language (HDL) code to configure anintegrated circuit board or connecting to a data acquisitioncard

I R is great for data wrangling and visualisation, andstatistical modelling

I C achieves the fastest runtimes, at the expense of a longdevelopment time

Page 19: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Horses for coursesI Python is becoming the de facto standard for exploratory

and interactive scientific research

BUT

I Python is no programming silver bullet

I Your application will dictate the tool (and a mixture of morethan one language is ok). For example:

I MATLAB excels at interfacing with hardware, e.g generatinghardware description language (HDL) code to configure anintegrated circuit board or connecting to a data acquisitioncard

I R is great for data wrangling and visualisation, andstatistical modelling

I C achieves the fastest runtimes, at the expense of a longdevelopment time

Page 20: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Horses for coursesI Python is becoming the de facto standard for exploratory

and interactive scientific research

BUT

I Python is no programming silver bullet

I Your application will dictate the tool (and a mixture of morethan one language is ok). For example:I MATLAB excels at interfacing with hardware, e.g generating

hardware description language (HDL) code to configure anintegrated circuit board or connecting to a data acquisitioncard

I R is great for data wrangling and visualisation, andstatistical modelling

I C achieves the fastest runtimes, at the expense of a longdevelopment time

Page 21: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Horses for coursesI Python is becoming the de facto standard for exploratory

and interactive scientific research

BUT

I Python is no programming silver bullet

I Your application will dictate the tool (and a mixture of morethan one language is ok). For example:I MATLAB excels at interfacing with hardware, e.g generating

hardware description language (HDL) code to configure anintegrated circuit board or connecting to a data acquisitioncard

I R is great for data wrangling and visualisation, andstatistical modelling

I C achieves the fastest runtimes, at the expense of a longdevelopment time

Page 22: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Horses for coursesI Python is becoming the de facto standard for exploratory

and interactive scientific research

BUT

I Python is no programming silver bullet

I Your application will dictate the tool (and a mixture of morethan one language is ok). For example:I MATLAB excels at interfacing with hardware, e.g generating

hardware description language (HDL) code to configure anintegrated circuit board or connecting to a data acquisitioncard

I R is great for data wrangling and visualisation, andstatistical modelling

I C achieves the fastest runtimes, at the expense of a longdevelopment time

Page 23: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why do you want to learn Python?

Some reasons:

I Python has a simple syntax and is widely used; ideallanguage for beginners

I Development time is much quicker than for compiledlanguages like C or Java

I Broad uptake: Python (and R) have replaced Perl as thekey programming language in Bioinformatics

I major GIS applications like ArcGIS use Python as theirmain scripting language

I Language of choice for machine learning (PyTorch, scikit,TensorFlow)

Page 24: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why do you want to learn Python?

Some reasons:I Python has a simple syntax and is widely used; ideal

language for beginners

I Development time is much quicker than for compiledlanguages like C or Java

I Broad uptake: Python (and R) have replaced Perl as thekey programming language in Bioinformatics

I major GIS applications like ArcGIS use Python as theirmain scripting language

I Language of choice for machine learning (PyTorch, scikit,TensorFlow)

Page 25: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why do you want to learn Python?

Some reasons:I Python has a simple syntax and is widely used; ideal

language for beginnersI Development time is much quicker than for compiled

languages like C or Java

I Broad uptake: Python (and R) have replaced Perl as thekey programming language in Bioinformatics

I major GIS applications like ArcGIS use Python as theirmain scripting language

I Language of choice for machine learning (PyTorch, scikit,TensorFlow)

Page 26: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why do you want to learn Python?

Some reasons:I Python has a simple syntax and is widely used; ideal

language for beginnersI Development time is much quicker than for compiled

languages like C or JavaI Broad uptake: Python (and R) have replaced Perl as the

key programming language in Bioinformatics

I major GIS applications like ArcGIS use Python as theirmain scripting language

I Language of choice for machine learning (PyTorch, scikit,TensorFlow)

Page 27: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why do you want to learn Python?

Some reasons:I Python has a simple syntax and is widely used; ideal

language for beginnersI Development time is much quicker than for compiled

languages like C or JavaI Broad uptake: Python (and R) have replaced Perl as the

key programming language in BioinformaticsI major GIS applications like ArcGIS use Python as their

main scripting languageI Language of choice for machine learning (PyTorch, scikit,

TensorFlow)

Page 28: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Why do you want to learn Python?

Some reasons:I Python has a simple syntax and is widely used; ideal

language for beginnersI Development time is much quicker than for compiled

languages like C or JavaI Broad uptake: Python (and R) have replaced Perl as the

key programming language in BioinformaticsI major GIS applications like ArcGIS use Python as their

main scripting languageI Language of choice for machine learning (PyTorch, scikit,

TensorFlow)

Page 29: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Python version 2 vs 3

I Many systems (e.g., Mac OS X) still use Python 2 as thedefault

I Python 3 differs in various ways from Python 2I Often, Python 3 code cannot be run using a Python 2

interpreter and vice versaI Python 2 is a legacy version and will ultimately be replaced

by Python 3I Current course will focus on Python 3

Page 30: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Different Python distributions

Various distributions of Python available:

I Official version from python.orgI Caveat: libraries and other tools need to be installed

separately via pip (Python’s package manager)I Enthought Canopy: Python, libraries & tools in single

installerI Anaconda: Python, libraries & tools in single installer:

used in this course

Page 31: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Different Python distributions

Various distributions of Python available:I Official version from python.org

I Caveat: libraries and other tools need to be installedseparately via pip (Python’s package manager)

I Enthought Canopy: Python, libraries & tools in singleinstaller

I Anaconda: Python, libraries & tools in single installer:used in this course

Page 32: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Different Python distributions

Various distributions of Python available:I Official version from python.org

I Caveat: libraries and other tools need to be installedseparately via pip (Python’s package manager)

I Enthought Canopy: Python, libraries & tools in singleinstaller

I Anaconda: Python, libraries & tools in single installer:used in this course

Page 33: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Different Python distributions

Various distributions of Python available:I Official version from python.org

I Caveat: libraries and other tools need to be installedseparately via pip (Python’s package manager)

I Enthought Canopy: Python, libraries & tools in singleinstaller

I Anaconda: Python, libraries & tools in single installer:used in this course

Page 34: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Different Python distributions

Various distributions of Python available:I Official version from python.org

I Caveat: libraries and other tools need to be installedseparately via pip (Python’s package manager)

I Enthought Canopy: Python, libraries & tools in singleinstaller

I Anaconda: Python, libraries & tools in single installer:used in this course

Page 35: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Executing Python code: Spyder IDE

I Windows: Start Menu > Anaconda3 > Spyder

I Mac: Applications > Spyder

Page 36: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Executing Python code: Spyder IDE

I Windows: Start Menu > Anaconda3 > SpyderI Mac: Applications > Spyder

Page 37: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Executing Python code: Spyder IDEI Spyder is an integrated development environment (IDE) for

scientific computing, akin to RStudio and MATLAB

I One place to write, execute and debug code, and explorevariables

TextEditorIPython Console

VariableExplorer

RunScript

Page 38: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Running standalone Python scripts without the IDE - I

Being able to run Python scripts without the Spyder IDE isparticularly important when running scripts on computingclusters.I Windows: don’t bother, use the Spyder IDE

I Mac/Linux:I Write your code in a plain text file, say my script.pyI In a terminal, run:

python3 my_script.py

Page 39: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Running standalone Python scripts without the IDE - I

Being able to run Python scripts without the Spyder IDE isparticularly important when running scripts on computingclusters.I Windows: don’t bother, use the Spyder IDEI Mac/Linux:

I Write your code in a plain text file, say my script.py

I In a terminal, run:

python3 my_script.py

Page 40: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Running standalone Python scripts without the IDE - I

Being able to run Python scripts without the Spyder IDE isparticularly important when running scripts on computingclusters.I Windows: don’t bother, use the Spyder IDEI Mac/Linux:

I Write your code in a plain text file, say my script.pyI In a terminal, run:

python3 my_script.py

Page 41: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Running standalone Python scripts without the IDE - IIBy adding a so-called ‘shebang’ to the top of a Python script,one can run scripts as a standalone programme on Mac / Linux

I Add a shebang #!/usr/bin/env python3, to the first lineof the python script file, here called my script.py:

1 #!/usr/bin/env python3

2

3 print("This Python script prints something.")

4 ...

I Close the file and make it executable by typing in a terminalchmod +x my_script.py

I Run the script by typing in a terminal./ my_script.py

This Python script prints something.

Page 42: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Running standalone Python scripts without the IDE - IIBy adding a so-called ‘shebang’ to the top of a Python script,one can run scripts as a standalone programme on Mac / LinuxI Add a shebang #!/usr/bin/env python3, to the first line

of the python script file, here called my script.py:1 #!/usr/bin/env python3

2

3 print("This Python script prints something.")

4 ...

I Close the file and make it executable by typing in a terminalchmod +x my_script.py

I Run the script by typing in a terminal./ my_script.py

This Python script prints something.

Page 43: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Running standalone Python scripts without the IDE - IIBy adding a so-called ‘shebang’ to the top of a Python script,one can run scripts as a standalone programme on Mac / LinuxI Add a shebang #!/usr/bin/env python3, to the first line

of the python script file, here called my script.py:1 #!/usr/bin/env python3

2

3 print("This Python script prints something.")

4 ...

I Close the file and make it executable by typing in a terminalchmod +x my_script.py

I Run the script by typing in a terminal./ my_script.py

This Python script prints something.

Page 44: University of Exeter, Penryn Campus, UK · University of Exeter, Penryn Campus, UK February 11, 2020. Acknowledgements I This course is funded by UExeter’s Institute for Data Science

Running standalone Python scripts without the IDE - IIBy adding a so-called ‘shebang’ to the top of a Python script,one can run scripts as a standalone programme on Mac / LinuxI Add a shebang #!/usr/bin/env python3, to the first line

of the python script file, here called my script.py:1 #!/usr/bin/env python3

2

3 print("This Python script prints something.")

4 ...

I Close the file and make it executable by typing in a terminalchmod +x my_script.py

I Run the script by typing in a terminal./ my_script.py

This Python script prints something.