File Processing

35
File Processing

description

File Processing. Introduction. More UNIX commands for handling files Regular Expressions and Searching files Redirection and pipes Bash facilities. In UNIX everything is a file!. Directories and programs are all files! Devices (keyboard, mouse, screen, memory, hard disks, etc) are files - PowerPoint PPT Presentation

Transcript of File Processing

Page 1: File Processing

File Processing

Page 2: File Processing

Introduction

• More UNIX commands for handling files

• Regular Expressions and Searching files

• Redirection and pipes• Bash facilities

Page 3: File Processing

In UNIX everything is a file!

• Directories and programs are all files!• Devices (keyboard, mouse, screen,

memory, hard disks, etc) are files• Input and Output channels are read

and written like files.• All of these things can be

manipulated like files

Page 4: File Processing

More UNIX commands

Page 5: File Processing

Review of commands seen so far

• who, date, finger, passwd• man• pwd, cd, mkdir, rmdir• cp, mv, rm, ls• cat, more, less, head, tail• lpr• chmod, umask

Page 6: File Processing

Getting Information about Files

• file gives the content types of the specified files (e.g., text, binary, directory, program)

file /bin/*• wc counts the number of words, lines

and characters in a file-l for lines-c for characters-w for words

with no argument, it reads from the keyboard

Page 7: File Processing

Finding Files• find searches in a directory hierachy• find <starting directory> -name <filename> -

print

$ find /usr/share/doc/ -name 'post*' -print/usr/share/doc/postgresql-7.4.13/usr/share/doc/postgresql-7.4.13/html/postmaster-shutdown.html/usr/share/doc/postgresql-7.4.13/html/postmaster-start.html$$ find . -name '*.txt' –print (recently can do without print)

• which will find things which are on your PATH• which <command> shows which <command>

would be executed if we typed command.$ which tail/usr/bin/tail

Page 8: File Processing

• sort – prints out lines of a file sorted into

alphabetical order• can sort on fields within lines• can sort numerical entries (-n)• flags to remove duplicates, reverse sort, etc.

• cmp – tests whether two files are identical and

reports position of first character where they are (shows 0 if they are identical)

Sorting and Comparing Files

Page 9: File Processing

Sorting and Comparing Files (2)

• comm – gives three column output of lines in

first, but not second; in second, but not first; and in both

• diff -c– gives the differences with 4 or 5 lines

either side to show context

Page 10: File Processing

Regular Expressions and Searching Inside Files

Page 11: File Processing

• grep pattern <pathname…> searches the specified files for the specified pattern and prints out all lines that contain it, e.g.:

grep “that” poemwill print every line in poem

containing the word that

Searching Inside Files

Page 12: File Processing

Regular Expressions

• grep “That” poem will only find the string “That” in poem if it has an upper case ‘T’ followed by lower case ‘hat’

• Regular expressions are much more powerful notation for matching many different text fragments with a single expression– i.e. could wish to find “That”, “that”,

“tHaT”, etc.

Page 13: File Processing

Regular Expressions (2)

• Search expressions can be very complex and several characters have special meanings– to insist that That matches only at the start of

the line use grep “^That” poem– to insist that it matches only at the end use

grep “That$” poem– a dot matches any single character so that

grep “c.t” poem matches cat, cbt, cct, etc.

Page 14: File Processing

• Square brackets allow alternatives:– grep “[Tt]hat$” poem

• An asterisk allows zero or more repetitions of the preceding match– grep “^-*$” poem for lines with only -’s

or empty– grep “^--*$” poem for lines with only -’s

and at least one -– grep “Bengal.*Sumatra” poem for lines

with Bengal followed sometime later by Sumatra

• Many flags to: – display only number of matching lines,

ignore case, precede each line by its number on the file and so forth

Regular Expressions (3)

Page 15: File Processing

Redirection and Pipes

Connecting all the tools!

Page 16: File Processing

Input and Output in UNIX

• UNIX considers input and output to programs to be “streams of data”– Could be from/to the user– Could be from/to a file– Could be from/to another program

Page 17: File Processing

Redirecting Input and Output

• Input and output need not only involve the keyboard and screen, it is possible to redirect them to and from files

• Each UNIX command has at least one input channel and two output channels:– STDIN (0) Input channel– STDOUT (1) Output channel– STDERR (2) Output channel

• More input and output are usually created by commands to read and write from files that are specified in arguments

Page 18: File Processing

STDIN

• Stands for standard input• This is where programs expect to find

their input• STDIN is set by default to read input

from the keyboard• If you want to read the input from a

file instead, use <

Page 19: File Processing

STDOUT

• Stands for standard output• This is where programs (usually) write any

output they generate. By default STDOUT appears in your terminal window

• If you want to save the output to a file instead, use > – The file will be created– If the file already exists then it will be

overwritten

• You can also use >> which appends the output onto a file’s contents

Page 20: File Processing

STDERR

• Stderr stands for standard error• This is where programs usually write

error messages. So even if you are redirecting the normal output to a file, you can still see error messages on the screen

• You can redirect STDERR using 2>

Page 21: File Processing

By default, UNIX attaches STDIN to the keyboard and STDOUT and STDERR to the screen

stdin

stdout

stderr

input files

output files

Redirecting Input and Output (2)

Page 22: File Processing

• Use > to redirect STDOUT to a filels mydir > temp

temp will be created or overwritten to contain the normal output of the ls command, although error messages still go to the screen

• Use 2>&1 to redirect both outputsls mydir > temp 2>&1

• >> is like > except that it appends to an existing file instead of overwriting it

ls anotherdir >> temp• < redirects the standard input

Redirecting Input and Output (3)

Page 23: File Processing

Piping• Like redirection except that it attaches

input and output to other commands instead of files

• User can build a pipeline of connected commands each of which operates on the output of the one before

• This is why so many commands take input from stdin when no files are given as arguments (e.g., cat, more, sort, grep, wc)

• Uses the pipe symbol, ‘|’

Page 24: File Processing

• ls | more gives paged output from an ls• What about:

who | grep zlizmjwho | grep zlizmj | wc -lwho | sortfile /etc/* | grep “ascii”

• Complex pipes can be saved permanently as shell scripts or aliases

screenkeyboard (stdin)

(stdout)

(stderr)lsmore

(stderr) screen

Redirecting Input and Output with Pipes

Page 25: File Processing

Hidden Files (1)• Files whose names start with a dot do not

show up in a straight ls command• Instead use the -a flag, (i.e., ls -a)• These are often special files for configuring

the system or different applications

.login

.bash_logout

.bashrc

.profile

.bash_profile

Page 26: File Processing

Hidden Files (2)

• You can permanently customise your environment by editing your .profile

• Once you’ve edited it you can apply your changes immediately – Type source .profile

(source reads and executes a file)

OR– Log out and log in again. The commands

in .profile are executed every time you log in

Page 27: File Processing

UNIX Shell

• The UNIX command line interface is called the ‘shell’

• There are many different shells, for example csh, bash, tsh, and usually you will run only one type of shell in a login session

• Different types of shell have different built-in commands, although the core commands are common

Page 28: File Processing

Review of Lecture 1

• Editing the command line– DELETE or back space to delete the last

character (also ^H)– ^D to delete the next character– ^W to delete the last word– Alt-U to delete the entire line– ^C to interrupt most commands– ^A and ^E to go to the beginning or end of

the line

(^X means press the Control key and X at the same time)

Page 29: File Processing

Bash facilities• There is a history of previously entered

commands (called events) that you can see with the command history

• You can recall and modify these with!! Previous event again!! string previous event with string added!n event number n!-n the nth previous event!prefix last event that began with prefix!* all the arguments of the last event!^,!$ first and last arguments of last event!:5 fifth argument of last event

many more see bash manual

The most useful of them all: ^R– ^R searches interactively in the history. – Press enter when you found the one you like (or right arrow to

edit it)

Page 30: File Processing

Aliases (bash)

• To define shorthand for complex commands

• alias name definition defines an aliasalias hist=historyalias ls='ls -F'

• alias alone shows you current aliases• unalias name removes an alias• unalias –a removes an alias

Page 31: File Processing

The .bash_profile file

• Whenever a login shell starts up it executes this file. Can be used to automatically create aliases and set history length. Could contain:

set HISTSIZE=200alias h=historyalias print='lpr -

Pmyprint'

Page 32: File Processing

Directory stacks (bash)

• Lets you remember old directories when you change to new ones– pushd puts a new directory on top of

the stack– popd removes it and goes back to

the previous one– dirs shows the stack

Page 33: File Processing

Directory stacks (bash) (2)

/

x

y zcb

a

/x/y/x/y

/a/b

/x/y

cd /x/y pushd /a/b popd

Page 34: File Processing

More UNIX …• UNIXhelp for Users:

http://unixhelp.ed.ac.uk/ • CERN Unix users guide:

http://consult.cern.ch/writeup/unixguide/unix_2.html• Commonly Used Unix Commands:

http://infohost.nmt.edu/tcc/help/unix/unix_cmd.html • Unix Fundamentals:

http://infohost.nmt.edu/tcc/help/unix/fund.html • Unix 101: http://www.ugu.com/sui/ugu/show?

I=help.articles.unix101• Introduction to Unix Systems Administration:

http://8help.osu.edu/wks/sysadm_course/html/sysadm-1.html

Page 35: File Processing

Summary

• More UNIX commands• Redirection and Pipes• Bash facilities