Essential UNIX skills for biologists

25
Lane Medical Library & Knowledge Management Center http://lane.stanford.edu Essential UNIX Skills for Biologists Yannick Pouliot, PhD Bioresearch Informationist Lane Medical Library & Knowledge Management Center 1/14/2009

description

Learning UNIX for biomedical researchers

Transcript of Essential UNIX skills for biologists

Page 1: Essential UNIX skills for biologists

Lane Medical Library & Knowledge Management Centerhttp://lane.stanford.edu

Essential UNIX Skills for Biologists

Yannick Pouliot, PhDBioresearch Informationist

Lane Medical Library & Knowledge Management Center

1/14/2009

Page 2: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

2

The Bioresearch Informationist: At Your Service Yannick Pouliot, PhD, Lane Medical Library & Knowledge

Management Center Bioresearch Informationist ≈ computational biologist in

residence Lane Library service Closely coordinated with CMGM

Role: Support laboratory researchers regarding biocomputational resources and their use

…especially postdocs

Contact: [email protected]

Page 3: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

3

Goals Deliver basic understanding of core

UNIX commands Tips on running UNIX on Mac and Windows

… and on a procedural note, we’ll be using anonymous polling to determine whether you’re happy with the material and speed of delivery …

Page 4: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

4

But First: LaneConnex -- Your Key to Finding Resources Quickly

Page 5: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

5

So, Why UNIX? UNIX is good for:

1. performing complex operations with very few key strokes

2. operating on large number of objects for e.g., searching file contents very specifically renaming files moving/copying files

UNIX is fast… Fast running and fast to invoke

LINUX (≈ UNIX) is free and runs on everything

Page 6: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

6

UNIX Trip-Ups UNIX is capitalization-sensitive

ls ≠ Ls What you type is what you get

no mistyping! mind those commands

e.g., rm –fr = delete everything in current directory and

subdirectories! → DON’T DO THIS AT HOME!

Page 7: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

7

So How Does One Access UNIX?

Mac: UNIX underlies Mac’s graphical interface

Applications → Utilities → Terminal Windows: Must install code (more later)

Page 8: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

8

Exploring UNIX

Page 9: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

9

Key UNIX Concepts UNIX is command-line based (no cute icons). There are flavors of UNIX

“Mac” UNIX ≈ Linux ≈ UNIX “Shell” = command line interface

different shells exist, all with identical basic functionality Anything you can imagine, UNIX can do

… but you may have to think about it… In UNIX, anything can be done in at least three different ways… UNIX has:

commands (built-in) → most of today’s workshop utilities

≈ “super-commands”, e.g., grep, for parsing text not built-in but usually there

Page 10: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

10

Concept: Redirection *** Redirection operator

“>” or “<“ : add to file (overwrite) “>>” or “<<“: add to file (don’t overwrite)

Applies to both input and output file.txt > prog.exe prog.exe > file.txt File.txt > prog.exe > file1.txt prog.exe >> file.txt

Page 11: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

11

Concept: Metacharacters *** “*”= 0 or more characters of any kind ‘.’ or ‘?’ = exactly one character of any kind

Exact character depends on the tool… Metacharacters can be used with nearly any other

command, e.g., ls file?.txt ls file*.txt ls *.* more *.txt grep *omics *.txt

NB: There are lots of other kinds of metacharacters…

Page 12: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

12

Concept: Stringing Commands Together Using Pipes

“I” = pipe, e.g.: ls -1 | more

Page 13: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

13

Polling Time: How’s the speed?

1: Too fast

2. Too slow

3. More or less OK

4. I feel nauseous

Page 14: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

14

Overview of Selected UNIX Commands

Page 15: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

15

ls [options] [names] ****

Lists contents of directories, including directories themselves Basically, lists files…

When names are provides, lists files contained in a directory name or that match a file name.

names can include filename metacharacters. The options display information in different formats. The most useful

options include -F, -R, -l, and -s.

Examples1. list all details of all files in current directory

ls –l2. list just the filenames

ls -1 3. create a file that contains a list of the filenames

ls -1 > mylist.txt4. List files of type with word “example” followed by single character, e.g.,

example1.txt, etcls -1 example?.txt

Page 16: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

16

cat/more/head/tail→ commands to look at content of files cat: returns everything more: same but one page at a time **** head: returns top x lines tail: returns bottom x lines all can operate on multiple files

Examples1. show contents of all txt files

cat *.txt2. show first 100 lines of file

head +100 file.txt3. show first 1000 lines of file and paginate:

head +1000 file.txt | more

Page 17: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

17

grep: Searching File Contents Using “Regular

Expressions” ****grep [options] pattern [files]

Very powerful: Searches file contents for presence of a string grep protein *.pdf about a million options…

Also searches using regular expressions Definition: a mathematical expression that expresses the characteristics of

one or more strings, e.g.: te?xt *omics

Examples1. Find all text files whose contents contain words ending in “omics”

(“genomics”, “proteomics”, “transcriptomics”): grep *omics *.txt

Page 18: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

18

Polling Time: How’s the speed?

1: Too fast

2. Too slow

3. More or less OK

4. Need coffee

Page 19: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

19

uniq options filename1 **

Very handy for listing unique (or duplicate) lines in a file Has options to…

ignore first or last n fields delimited by tabs or spaces compare only the first n characters

Operates ONLY on sorted files

Examples1. List unique lines using unsorted file

sort test1.txt | uniq

2. Count number of unique instances using sorted file

uniq –c test2.txt

Page 20: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

20

find [pathnames] [conditions] ***

Very powerful: can specify anything, including exclusions and negations

Descends the directory tree beginning at each pathname and locates files that meet the specified conditions. The default pathname is the current directory.

Most useful conditions are -name and -type (for general use) Can search very large numbers of file names, if slowly…

Examples1. List all files named chapter1 in the /work directory:

find /work -name chapter1 -print

2. Look for filenames in current directory that don't begin with a capital letterfind . ! -name '[A-Z]+' -print

Page 21: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

21

UNIX on Windows Easy: UnxUtls

= UNIX “light” Excellent for most tasks Not a complete emulation of UNIX Download here; make sure to follow installation instructions

More later… Hard: Cygwin

difficult to make it behave perfectly can run in parallel with Windows

Easier: create a dual boot Provides ability to boot either Windows or Linux Requires reboot to go switch…

Page 22: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

22

Resources

• UNIX commands: http://en.wikibooks.org/wiki/Guide_to_Unix/Commands

Another list of UNIX utilities: http://en.wikipedia.org/wiki/List_of_Unix_utilities

Page 23: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

23

Everything You Need to Know About UNIX in Short Form: eBooks from Lane

• The ultimate quick reference for LINUX

• More than you typically need, but you can zoom into what you need

Page 24: Essential UNIX skills for biologists

Lane Medical Library &Knowledge Management Centerhttp://lane.stanford.edu

24

UnxUtils Installation: The MiniMe of UNIX

Download Installation instructions

→ Let’s do it together if you have a PC and want it

Page 25: Essential UNIX skills for biologists

Lane Medical Library & Knowledge Management Centerhttp://lane.stanford.edu