Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is...

20
SOEE1160 - 1 - Computers and Programming in Geosciences Semester 2 Dr. Sebastian Rost INTRODUCTION This module is about Computers and how we can teach a computer to do what we want it to do. This will include different programming languages (Fortran and Matlab) and tools (e.g. Shell scripting) that are commonly used in Geosciences. Computers are great at doing the same thing over and over again. They don’t get bored, they don’t need vacations and they don’t know weekends either (although it sometimes seems different). So if you want to calculate the same thing over and over again it is probably worth investing the time to write a small piece of code or script that does the work for you while you are drinking coffee (or do something more productive) and not trying to do things by hand with a calculator. We will focus on how to use a UNIX system. The power of a UNIX system lies in using the command line. This means you are typing something into a terminal window (it seems old- fashioned, but I hope you will see the advantages soon). So traditionally in a UNIX system there is less pointing and clicking than you are probably used to from a Windows system. Although there are now many UNIX tools that work similarly to tools on Windows you should spend the time to learn how to deal with a UNIX system without using these graphical (point+click) tools. Why? (1) after a while you become faster by doing things by hand, (2) using shell scripting (short recipes that tell the computer what to do – more about that later) you are using command line commands anyway, so it is better to know them anyway and (3) when you are logging into a computer remotely (i.e. you are logging into the SEE computer system from home) you might not have a graphical user interface, but you will always have the command line. Therefore, I will not introduce you to these

Transcript of Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is...

Page 1: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 1 -

Computers and Programming in GeosciencesSemester 2

Dr. Sebastian Rost

INTRODUCTION

This module is about Computers and how we can teach a computer to do what we want it to do. This will include different programming languages (Fortran and Matlab) and tools (e.g. Shell scripting) that are commonly used in Geosciences. Computers are great at doing the same thing over and over again. They don’t get bored, they don’t need vacations and they don’t know weekends either (although it sometimes seems different). So if you want to calculate the same thing over and over again it is probably worth investing the time to write a small piece of code or script that does the work for you while you are drinking coffee (or do something more productive) and not trying to do things by hand with a calculator.

We will focus on how to use a UNIX system. The power of a UNIX system lies in using the command line. This means you are typing something into a terminal window (it seems old-fashioned, but I hope you will see the advantages soon). So traditionally in a UNIX system there is less pointing and clicking than you are probably used to from a Windows system. Although there are now many UNIX tools that work similarly to tools on Windows you should spend the time to learn how to deal with a UNIX system without using these graphical (point+click) tools. Why? (1) after a while you become faster by doing things by hand, (2) using shell scripting (short recipes that tell the computer what to do – more about that later) you are using command line commands anyway, so it is better to know them anyway and (3) when you are logging into a computer remotely (i.e. you are logging into the SEE computer system from home) you might not have a graphical user interface, but you will always have the command line. Therefore, I will not introduce you to these tools (but some are really useful, so look around the Linux system of the school in your spare time).

In the practicals we will be using the computers in the ENVI EFC Lab. These are located on the second floor in the environment building. Go up the stairs one floor, at the end of the stairs turn left and go through the glass doors at the end of the corridor and turn left immediately. You are standing right in front of the ENVI EFC Lab.This lab uses so-called thin clients. These are not full computers but work as a kind of smart-terminal to a server in the background. You have the possibility to use these with a Windows operating system or under Linux (with is a version of UNIX for PC’s). More about how to log-into (and log-out of) these machines in the hand-out for the first practical.

So let’s take a brief look at computer hardware

COMPUTER HARDWARE

Page 2: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 2 -

Many of you will likely have taken a look into a computer before. I think it is important to have an idea of how such a box looks like inside and what the components do. Why should we care? There are several reasons: (1) Most likely you will use a computer a lot in your future career, especially when you stay in some kind of research job. In a way we are computing scientists, thus computing citizens. Often you have to understand a computer better to get the maximum performance from your processes and (2) in Earth Sciences we often deal with computers in our labs and out in the field. If you are stuck in northern Scotland and something breaks in your box, you should be able to find out which part broke and how to repair things (this is true not only for hardware, but also for software). So, sometimes you have to go beyond the role of user.

So what is in a computer? Computers vary widely in size, power and function, but their basic layout tends to be the same. The center piece of any computer is the CPU (Central Processing Unit). This contains the logic circuitry needed to carry out basic simple arithmetic, i.e. adding to numbers (that is pretty much all a computer can do, but it can do it really fast). Additionally the CPU contains registers or fast storage locations (also known as Cache), and the Program Counter, which controls how the CPU takes it instructions. The clock speed of a computer determines how many instructions the CPU can process per second and is now quoted in GHz. There are two general CPU architectures: CISC (Complex Instruction Set Computer) and RISC (Reduced Instruction Set Computer). Examples for these are the common INTEL processors used in PC’s (CISC architecture) and Motorola’s PowerPC used so far in Mac’s (RISC architecture). The architecture determines which operating system and software can run in a computer.

The computer communicates with us through input devices (mouse, keyboard) and output devices (Monitor). These are some of the peripherals. Hard Disks, USB thumb drives and perhaps Tapes are also used as input and output devices and are also peripherals. Peripherals work much more slowly than the CPU, so a lot of input/output from a hard disk can really slow a process down. One of the most common ways to attach external peripherals is USB (Universal Serial Bus). USB 2.0 has a draw rate of 480 Mbps (Mega Bit per second) and is therefore pretty fast (USB1.1 had a draw rate of 12 Mbit/s). Other external systems such as Firewire (IEEE1392) or SCSI exist. Internal communication is done through other technologies such as SATA, PCI or PCI-Express.Therefore, the computer stores incoming and outgoing information in the memory (RAM – Random Access Memory). The memory size of a computer determines the size of the biggest task it can handle without having to resort to using the hard disk to store information. RAM is slower than the Cache, since the information has to travel outside of the CPU, but RAM is much (much) faster than the disk. If a computer starts to store information of a program on disk, we call this swapping, and you normally notice a huge drop in performance.

Most computers are now networked, i.e. they can talk to other computers and the internet. This is normally done through an Ethernet card with varying speeds (normally 100Mbit/s or 1000 Mbit (Gigabit)/s).

SELF STUDY

Page 3: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 3 -

There are some good internet resources for learning about computer basics. A few are:

The PC Tech Guide page http://www.pctechguide.com

The Tech Fest page http://www.techfest.com

OPERATING SYSTEM The operating system is a resource manager. It balances the software that is needed to run the hardware (drivers), and the resources needed for applications and user needs. There are several layers before the thing you just typed into the keyboard actually arrives at the CPU. The commands typed will be converted by the operating system into binary coded instructions and passed to the CPU for execution (in reality there is another layer in between – The Shell). For example, the OS allows safe access to a printer by allowing only one application program to send data to the printer at one time, while suspending programs that are waiting for I/O to make way for programs that can use the CPU productively, therefore using the CPU efficiently.The most important part of the operating system is the kernel which is in direct control of the underlying hardware. The kernel provides low-level device, memory and processor management function. The OS differ by the system calls, system utilities and the user interface they provide.There are several common operating systems, e.g. Windows, MacOS (now virtually another UNIX system) and UNIX (and its different flavors). Normally there are strong opinions about the pros and cons of the different Operating Systems. I will not go into this, since we will focus on UNIX in this module.

UNIXUnix has been a popular for more than 3 decades, with its current incarnation as open source linux trying to balance the MS-Windows Monopoly. Here a brief overview over the history (from http://www.doc.ic.ac.uk/~wjk/UnixIntro/Lecture1.html).

Page 4: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 4 -

Things actually were much more complicated (go to http://www.levenez.com/unix/ to see how it really went).In the practicals we will be using a PC version of Unix called Linux developed by Linus Thorvalds in the early 1990’s. The distribution we will be using is CentOS (http://www.centos.org) which is a free open source distribution building upon RedHat (http://www.redhat.com). There are several other Linux distributions (a package of the Kernel and specific Utilities and Applications. The distributions vary slightly in scope what they are good for. Other common Linux distributions are OpenSuSE (http://www.opensuse.org) or Ubuntu (http://www.ubuntu.com). Most of these distributions are free (!) and are actually pretty easy to install. Most computers can be set up as dual-boot machines being able to boot both Linux and Windows (if you are scared about partitioning your hard disk check out the Wubi Installer (http://www.wubi-installer.org).There is also a Unix Shell for a windows environment called Cygwin (http://www.cygwin.com). Cygwin is not a full Linux OS, but gives you a Linux-like environment and makes it possible to run Unix applications under Windows.

Unix is a true multi-user system. This means that several users can use a computer at the same time. The thin clients you are using during the practicals are tied into just two servers in the background. It is also able to do real multi-tasking. The CPU power is split between several processes depending on how much resources are needed for the task and a prioritization scheme. This makes Unix systems perfect for a university/research environment.

FILES AND DIRECTORIESData is stored on disks in files. A file is a related set of data with a name attached. Files can store data or programs, and they can be in clear text form (called ASCII) in which case they can normally be read by a word process or printed out, or they can be binary data (which makes reading the data a bit more complicated). As disks can store a very large number of files we need some kind of order for them. This is done by hierarchical file-systems that contain files and folders (or directories in UNIX). File systems are normally visualized as an upside-down tree, consisting of directories, with the root directory at the top, and subdirectories spreading out below. This is pretty much the same for Unix or Windows.This kind of sorting helps to keep order in your files. In my research I work with tens of thousand different data files. Without some kind of order it would be nearly impossible to find things. So when you start working on the practicals please think about the file system structure for a moment.Here a plot of a Windows file system (some of these figures have been taken from Ed Garnero’s Computers and Geology class at Arizona State University).

Page 5: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 5 -

A Unix file system tree looks slightly different, but in principle it is the same:

The top directory (/) is called the root directory since it is the beginning of the tree. You as user will not be allowed to write into this directory. Beneath the root-directory are others which normally have special uses. The bottom directory (/nfs/see-fs-01_users/earsro) is my home directory where I can store my own data and programs.

Page 6: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 6 -

On a windows system if you want to go from where you are in the C: folder to a specific file (e.g. EQ_plane.xls) you will either double click your way through the tree (C: - My documents – Classes – Glg410 – Lab 6 = EQ_planes.xls = 5 double clicks)or you can give the full path into the address box of the Windows Explorer:

The command line way of doing things in UNIX is similar to the latter. In the image above you will notice that folders are separated by backslashes “\”. UNIX has a similar approach except UNIX uses forward slashes “/” to separate folders in a UNIX file system.

In UNIX you type in a command that tells the OS that you want to relocate you active windows, i.e. your present working directory, to another directory. The command is:

cd location(change directory)

Here location can be anything from nothing (i.e. a blank) to elaborate descriptions of a directory deep in the file system tree. Here are some simple examples.

cd examples

cd (blank) change directory to your main home directorycd ~ change directory to your main home directory – same as abovecd ~earsro change to rost’s main home directory (earsro is my login)cd ~/HW change to a directory called “HW” just below your main home dir(.)cd ../ go UP one level to the directory above youcd ../../ go UP 2 directories (and so on …)cd ../graphics go up one directory, then down into a directory called “graphics”cd /nfs/see-fs-01_t1go to the networked file system see-fs-01_t1

Page 7: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 7 -

Some special characters in UNIX

. The present (active) working directory

.. The directory above the present working directory / A divider between hierarchal directories when listing paths * A wildcard that matches any sequence of characters ? A wildcard that matches any single character ; separates separate UNIX commands on one line ! Relates to history (past typed commands) ~ Location of a main home directory & Causes a command or program to run in the "background" | Routes standard output from a command to the next command ("pipe") > Routes standard output from a command to create a new specified file >! Same as above: however, if file already exists, replace it >> Routes standard output from a command to append to the specified file < Routes a specified file to be input to a command SPACE Yes, a space... spaces are important in UNIX, they act as field separators

Listing: looking at what’s thereYou can’t really type your way around a computer if you don’t yet know what things are called. The way we look at a listing of what is in any given directory is with the list command:

ls location and/of filename info(list)

As with cd the location and/or filename part of the command is optional. If it is left out, then you will get a listing of what is in the current directory (.). ls is quite powerful with many options. Look at the manual page for it: type “man ls” (literally: give me the manual page for the command “ls”). The man pages can be very short and confusing. Another source of help is the info tool.Type: “info ls” Here are some options: We just use them for our working directory, but you could use them anywhere.

Some ls flag options

ls –a list all entries, including those that begin with a dot (.) that are normally hiddenls –l list in long format, giving mode, ACL indication (see below) number of links,

owner, group, size in bytes, and time of last modification for each filels –s give size in blocks, including indirect blocks for each entry

Page 8: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 8 -

ls –sa combines the above flags –s and –a (you can combine as many flags as you wish)

Multi-user operating systems such as UNIX include the notion of ownership of files and directories – users have a home directory (~) owned by themselves and they can normally only create or destroy files in directories below this home directory. The ownership and permissions can be viewed by typing (ls –l). Here a figure from Sobell’s book:

where: type is a single character which is either 'd' (directory), '-' (ordinary file), 'l'

(symbolic link), 'b' (block-oriented device) or 'c' (character-oriented device). permissions is a set of characters describing access rights. There are 9 permission

characters, describing 3 access types given to 3 user categories. The three access types are read ('r'), write ('w') and execute ('x'), and the three users categories are the user who owns the file, users in the group that the file belongs to and other users (the general public). An 'r', 'w' or 'x' character means the corresponding permission is present; a '-' means it is absent.

links refers to the number of filesystem links pointing to the file/directory (see the discussion on hard/soft links later).

owner is usually the user who created the file or directory. group denotes a collection of users who are allowed to access the file according to

the group access rights specified in the permissions field. size is the length of a file, or the number of bytes used by the operating system to

store the list of files in a directory. date is the date when the file or directory was last modified (written to). The -u

option display the time when the file was last accessed (read). name is the name of the file or directory.

Both ls and cd work hand in hand. ls shows you what’s there, and cd takes you to the next place. ls can list directories or files that are in other places. Here is a small subset of examples. With ls you can look at something that is many, many directories away with only a few keystrokes. Using a clicking tool would take much longer to get the same information.

ls examples

ls ~ list what's in your main home directory ls ~/HW list what's in your "HW"directory, which is in your main home directory ls .. list what is in the directory directly above you ls ../.. list what's UP 2 directories

Page 9: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 9 -

ls ~earsro list what's in my main home directory

Viewing file contents/attributes

There are many commands that let you look at the size, type or content of a file. Here is a brief (and incomplete) list of some simple examples. You can also use the special characters with these (e.g. ls ~/e* lists all files in your home directory starting with e)

ls -flags We just covered this one.more location/filename(s) Display contents of a file 1 page at a time to the standard output,

the terminal screen. Hit the space bar to go to the next page if there is one, hit b to go back a page at a time

cat location/filename(s) Scroll the whole file to the screenhead location/filename(s) Display the 1st 10 lines of a file (see man page for more)wc location/filename(s) Word count: displays the number of lines, words, and characters in

a filetail location/filename Display the last 10 lines of a filefile location/filename(s) Display the type of the file if determinableemacs location/filename Very powerful editor. Works both graphically and in a non-graphical

shell. You can use the mouse to do things, but everything is available through keystrokes. Possibility to define macros.

vi location/filename Edit the file with the "vi" (VIsual editor). I only mention it briefly in this class. It is very powerful, but comes with a steep learning curve.

Copying and moving filesYou might want to copy a file to another file or move a file to another location in the file system tree. This can be done by the commands cp and mv.

cp present_location/present_name new_location/new_name(copy)

mv present_location/present_name new_location/new_name(move)

In a sense mv is a cp with a following remove (rm) of the file.You might want to add new folders to you file system tree. This can be done by

mkdir newdirectory(make directory)

moving/copying examples

Page 10: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 10 -

mkdir test First we will make a directory called "test", then we can copy files there

cp ~earsro/.cshrc ~/test/.cshrc_earsro Assuming you successfully created a directory called "test" in your main directory, you are copying the file ".cshrc" from rost’s main directory to your directory "test", and you are renaming the file to be ".cshrc_earsro". Thus, here you copied and renamed in one fell swoop. Now check to make sure it worked. Look at the contents of directory test with ls test

mv ~/test/.cshrc_earsro ~/junk Now you are moving the file you just copied to the test directory to your present working directory, and renaming it "junk". You will notice that directory test is now empty. You can delete this directory with rmdir test.

How to find out where you are?

pwd(print working directory)

NOTES:

from anywhere, you can copy or move a file to anywhere, as long as file permissions allow it. You are only allowed to create, modify or remove your own files, on your own space. But you can copy from other users' accounts to yours. This is actually very cool -- on a Windows/MAC, you are probably used to going to the source folder, copying, then clicking around to the destination folder, then pasting. The UNIX way is ultimately much easier, especially if you already know the path.

Some files or folders of other users might be file protected, not letting you copy or view them.

When you move or copy some file to a place that already has something with the same name, you OVERWRITE IT! So, always be careful with what you're putting where -- you don't always get warnings from UNIX asking if it is okay to do something.

SPACES are very important... UNIX filenames or directory names cannot have spaces. Thus, establish a "no space etiquette" for your Windows files too, in case you need to use any of the Windows files on the UNIX system.

UNIX is also case sensitive, Hence upper and lower case letters are uniquely recognized -- this is in stark contrast to Windows. Thus you need to be diligent about your file names, remember what you name things.

Page 11: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 11 -

SELF STUDY

Finally, here are some web resources for your self study to learn more about the UNIX system. Please use these in you self study time to get a better grip on these simple concepts.

University of Surreyhttp://www.ee.surrey.ac.uk/Teaching/Unix

University of Utahhttp://www.math.utah.edu/lab/unix/unix-tutorial.html

Linux Tutorial Sitehttp://www.ctssn.com/

University of Edinburghhttp://unixhelp.ed.ac.uk/

Page 12: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 12 -

SHELLS AND REDIRECTION OF IN/OUTPUT

What is the Shell? It is simply the UNIX system command processor, the part of the UNIX OS that deals with interpreting what you type. In a sense the shell is a layer or the liaison between you, the user, and all programs/processes/resources of the computer. There are many different shells. The most common are: C shell, Bourne shell and the Korn shell. They have slightly different syntax. I will mainly discuss the C shell simply because that is the one I am using most. There will be more about shell scripting in the next lecture. Let’s take a look at how the shell interprets anything you type into one of the terminal windows.

Some of these figures are taken from Sobell’s book. Let’s review what happens when you type into the terminal … how is the OS responding to what you are typing?

This also relates to the syntax importance of blank spaces in UNIX (i.e., in comparison to Windows). A blank space signifies the separation of words, which may be commands, programs, and arguments.

Take a step back now. Typed input, i.e., from something that you type from the shell, is called standard input. This input gets processed (as above) and output goes to the screen, which is called standard output.

RedirectionIf your process produces a lot of output it might not be suitable to spit things out to the screen (the standard output). UNIX has an easy way to redirect output to a file by using the “>” symbol (and to overwrite a possible existing file with “>!”), or append to an existing file using two of the

Page 13: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 13 -

greater-than symbols: “>>”. What happens when you use these is that the standard output no longer spews to the screen – rather it goes to a file.

In a shell this would look like

command {arguments} > output_file

Let’s combine this with some of the other things we learned. To list the files in one of my directories and write this information into a file you would use:

ls ~earsro/BMP/* > images

Now you can take a look at the contents of the file “images” with more, cat or a text editor or you can erase it with “rm images”.

What works with output also works with input. Sometimes it might be convenient to feed command arguments from some input file. You might want to do this for book keeping reason (you can control the input later) or as a way to minimize typing mistakes. The diagram below shows how an input file is used for a command.Here we use the less-than symbol ”<” to

redirect the input from standard input to the file input.command < input_file

Page 14: Earth & Environmenthomepages.see.leeds.ac.uk/~earsro/SOEE1160/Lecture_N…  · Web viewData is stored on disks in files. A file is a related set of data with a name attached. Files

SOEE1160 - 14 -

Very frequently it is convenient to have input and output coming and going to files. The syntax would be:

command < input_file > output_file

More redirection and appending output and simple C-shells

Suppose you want to know who is on the UNIX computer you are logged onto right this minute. You can easily find out using the who command: type "who" and hit return. If you wanted to document the time and who is on the system, you can type: "date" (return), then "who" (return). You can do this all on one line by: "date; who"(return). You separate the commands by the semi-colon to avoid that the Shell interprets the second command as an argument of the first command.

Let's direct the output of these efforts into a new file. Then we type: "date > sneaky". This stuck the output of the "date" command into a new file called "sneaky". We can append to that with the "who" command by typing: "who >> sneaky". Now, if you do a "cat" or a "more" on the file "sneaky", you will see both the date and who is logged on at the moment. Perhaps you want to know the day of the week, so you also type the UNIX command "cal >> sneaky". You have just appended a calendar for the present month. Look at sneaky.

Let's assume you are a snoopy system's administrator, and you want to see who is on the system at different times quite frequently. Then you can put these commands into a file, and make that file "executable" by "changing its mode". (very easy!). First, go you’re your favorite text editor and put in the exact info you just typed:

date >! sneakywho >> sneakycal >> sneaky

(Notice I put the exclamation after the first ">". why?) Then save that file as something called, for example, "snoop". Now we need to change the mode of "snoop" so UNIX understands that you want to take the contents of snoop and feed it to the shell. You have to make the file executable. Here's how:

chmod +x snoop

Now, snoop is an executable file. If you type "snoop", it will create a new file called sneaky. If you want that info to go to the screen, you can simply erase the redirection information from the file snoop.

Shell scripts come in many sizes from simple ones with only a few lines (as above) to many thousand lines of code. In the next lecture we will take a look at some more tools available in the shell and how to write more complicated shell scripts.