9 The sed Editor

40
9 The sed Editor Mauro Jaskelioff (based on slides by Gail Hopkins)

description

9 The sed Editor. Mauro Jaskelioff (based on slides by Gail Hopkins). Introduction. sed is a Stream Editor Designed to edit files in a batch fashion Not interactive Often used for text substitution When you have multiple changes to make to one or more files: - PowerPoint PPT Presentation

Transcript of 9 The sed Editor

Page 1: 9 The sed Editor

9 The sed Editor

Mauro Jaskelioff(based on slides by Gail Hopkins)

Page 2: 9 The sed Editor

Introduction

• sed is a Stream Editor• Designed to edit files in a batch fashion

– Not interactive

• Often used for text substitution• When you have multiple changes to make

to one or more files:– Write down the changes in an editing script– Apply the script to all the files

Page 3: 9 The sed Editor

What does sed do?

• Used to edit input streams– Input stream can be from a file, from a

pipe or from the keyboard• Produces results on standard output

– …but results can be put in a file or sent through a pipe

Page 4: 9 The sed Editor

Typical Uses of sed

• Editing one or more files automatically– E.g. replace all occurrences of a string

within a file for a different string

• Simplifying repetitive edits to multiple files– E.g. perform the same operation on lots

of similar files

Page 5: 9 The sed Editor

How Does sed Work?

• Each line of input is copied into an internal buffer known as a “pattern space”

• All editing commands in a sed script are applied, in order, to each line of input (in the buffer)

• Editing commands are applied to all lines in the buffer– Unless line addressing is used to restrict the

lines affected

Page 6: 9 The sed Editor

How does sed Work? (2)

• If a sed command changes the input, the next command will apply to this new (changed) line of input, not the original one

– More on this later!

s/caterpillars/spiders/s/crawl/run/

Furry caterpillars crawl slowly

Furry spiders crawl slowly

Furry spiders run slowly

sed script

Pattern space

Page 7: 9 The sed Editor

How does sed Work? (3)• When sed edits an input file, the original

input file is unchanged– The editing commands modify a copy of

each original line of input– When sed outputs the result, it is the copy

that is sent to STDOUT (or redirected to a file)

• sed keeps a separate buffer, known as the “hold space”– Can be used to save data for later retrieval– For most edits this isn’t needed - only if a

command refers to it

Page 8: 9 The sed Editor

How to Run sed from the Command Line

• sed [-n] [-e] ’command’ file(s)– For specifying an editing command on the command line– E.g.:

• sed 's/ant/flea/g' myCreaturesFile• sed -e 's/ant/flea/g' -e 's/worm/slug/g' myCreaturesFile• (what does this mean??? - more about sed commands shortly…)

• sed [-n] -f scriptfile file(s)– For specifying a scriptfile containing sed commands– E.g.:

• sed -f myScript myCreaturesFile

• If no file specified, sed reads from STDIN

Page 9: 9 The sed Editor

The -n flag• sed can be given a -n option

– This tells sed NOT to write the contents of the pattern space by default to stdout:

• sed -n 's/ant/flea/g’ myCreaturesFile

– Another way of specifying this is to put #n at the start of a sed script

• Why do we want to stop sed’s output?– We can then tell sed to print specific lines of

output, rather than the whole pattern space:– sed -n 's/swan/coot/p’ myCreaturesFile

– NOTE the p in the above example…

Page 10: 9 The sed Editor

sed Regular Expressions

• sed uses regular expressions• The format of these is very similar to

those used by grep

Page 11: 9 The sed Editor

sed Regular Expressions

Symbol Matches Example

^ Beginning of line /^He/ Line starts with He

$ End of line /nd$/ Line end in nd

. Any single character /./

Would match, a, b, 1, 2, and so on…

* 0 or more occurrences of preceding character

/we*/

Matches w, we, wee, weee, etc…

? 0 or 1 occurrence of preceding character

/we?/

Matches w, or we

[ ] Any character enclosed in [ ] [abc]

Matches a, b or c

[^] Any character NOT enclosed in [ ]

[^abc]

Matches d, e, f, etc. but NOT a, b or c

Page 12: 9 The sed Editor

sed Regular Expression (2)

Symbol Matches Example

\{m,n\} m-n repetitions of preceding character

x\{1,3\}

Matches x, xx or xxx

\{m,\} m or more repetitions of preceding character

y\{4,\}

Matches yyyy, yyyyy, yyyyyy, etc…

\{,n\} n or fewer (possibly 0) repetitions of preceding character

we\{,5\}

Matches weeeee, weeee, weee, wee, we or w

\{n\} Exactly n repetitions of preceding character

z\{6\}

Matches zzzzzz

\(expression\) Group operator or region of interest

SEE LATER EXAMPLE

\n nth group SEE LATER EXAMPLE

Page 13: 9 The sed Editor

sed Commands - Syntax• sed instructions consist of addresses and

editing commands• They have the general form:

– [address[,address]][!]command [arguments]– NOTE: here, [] denotes something is optional– Therefore:

• If the address of the command matches the line of the pattern space (internal buffer), the command is applied to that line

[address[,address]][!]command [arguments]

Zero or more addresses

If ! is present then it means anythingNOT in the address(es) stated

The sed command to be executed

Optional arguments to the command

Page 14: 9 The sed Editor

sed Addresses

• A sed command can have 0, 1 or 2 addresses

• An address in a sed command can be:– A line number– The symbol $ (meaning the last line)– A regular expression enclosed in slashes

(/regex /)

• Therefore, an address can be thought of as “something that matches” in the pattern space

Page 15: 9 The sed Editor

sed Addresses (2)

• If no address is specified:– The command applies to each input line

• If one address is specified:– The command applies to any line matching the address– REMEMBER: an address can be a regular expression!

• If two comma-separated addresses are specified– The command applies to the first matching line and all

succeeding lines up to and including a line matching the second address

• If an address followed by ! is specified– The command applies to all lines that DO NOT match the

address

Page 16: 9 The sed Editor

sed Commands• Consist of a single letter or symbol

– They tell sed to “do something” to the text at the address specified

– E.g.:• s means substitute• g is a flag to the s command. It means global, or

all occurrences of… (more on this later)• sed 's/ant/flea/g’ myCreaturesFile• …means substitute all occurrences of the word

ant with the word flea in the file myCreaturesFile

– …in this example, no address is specified and so sed applies the command to all lines in the pattern space

Page 17: 9 The sed Editor

sed Commands (2)• Another example:

– sed -n ’/^squirrel/,/^swift/p’ myCreaturesFile• Print everything between the line starting squirrel

and the line starting swift, inclusive• Here, there are 2 addresses, both are regular

expressions:• /^squirrel/

– The first address is the first line matching “squirrel” at the start of the line

• /^swift/– The second address is the first line matching “swift” at the start

of the line– REMEMBER: regular expressions are written between /

and /• sed therefore prints between the first matching line

(with squirrel at the start) and all succeeding lines up to and including a line matching the second address (with swift at the start)

Page 18: 9 The sed Editor

sed Commands (3)

• An example using !– sed ’/aardvark/!d’ myCreaturesFile– Delete any line that doesn’t contain the text

“aardvark” in the file myCreaturesFile

• An example using line numbers:– sed ’5s/wombat/womble/g’ myCreaturesFile– Substitute all occurrences of wombat with

womble on line 5

Page 19: 9 The sed Editor

• An example of two elements together:– Input file:

– sed -e 's/ant/flea/g’ -e ‘s/alarm/to itch/g’ myCreaturesFile

– Output:

a, a, ants on my arma, a, ants on my arma, a, ants on my armthey’re causing me alarm!

a, a, fleas on my arma, a, fleas on my arma, a, fleas on my armthey’re causing me to itch!

Putting more than one sed Element in a Command

Page 20: 9 The sed Editor

Putting more than one sed Element in a Command (2)

• Input file:

• sed -e ‘s/parrot/lizard/g’ -e ‘s/lizard/koala/g’myCreaturesFile

• Output from sed:

• Why???

At the top of the tree there were 4 parrots and 2 lizards

At the top of the tree there were 4 koalas and 2 koalas

Page 21: 9 The sed Editor

…because• sed read in the line in the file and executed:

– s/parrot/lizard/g

• …to produce the text:

• sed then performed the command:– s/lizard/koala/g– …on this new edited line to produce:

• REMEMBER from previously:– If a sed command changes the input, the next

command will apply to this new (changed) line of input, not the original one

At the top of the tree there were 4 lizards and 2 lizards

At the top of the tree there were 4 koalas and 2 koalas

Page 22: 9 The sed Editor

Summary of sed Commands (4)

a\ append text after a linec\ replace texti\ insert text before a lined delete liness substitutey translate characters

Basic Editing

= display line number of a linep display the linel display control characters in ascii

Line Information

h copy into hold space; clear out what’s thereH copy into hold space; append to what’s thereg get the hold space back; wipe out the destination lineG get the hold space back; append to the pattern spacex exchange contents of hold space and pattern space

Yanking and Putting

n skip current line and go to line belowr read another file’s contents into the output streamw write input lines to another fileq quit the sed script

Input/Output Processing

Page 23: 9 The sed Editor

Examples of commonly used sed Commands

ssed ‘s/dog/cat/’ myfile substitute the first occurrence of dog with cat for

each line found in myfile

sed ‘s/dog/cat/g’ myfile substitute all occurrences of dog with cat in myfile

sed ‘s/dog/cat/4’ myfile find every line in myfile with 4 “dog” strings and substitute the 4th occurrence of dog with cat on each

sed ‘1,2s/dog/cat/g’ myfile substitute all occurrences of dog withcat in the first 2 lines of myfile ONLY

sed ‘/dog/,/cat/s/.*//’ myfile look for the text dog followed by the text cat. Remove the lines containing them plus all text(possibly more than one line) in between. Repeat until end of file myfile.s/.*// means substitute all text found for an empty string

Page 24: 9 The sed Editor

dsed ‘1,2d’ myfile delete everything in myfile between line 1 and

line 2

sed ‘5d’ myfile delete the fifth line from myfile

sed ‘/^#/d’ myfile delete all lines starting with # in myfile

psed -n ‘/BEGIN/,/END/p’ myfile find a line containing BEGIN and print

that line and all following lines up to and includinga line containing END. Note: if there is no END, sed will still print all text after BEGIN due to its stream oriented nature - it doesn’t know there is no END until it gets to the end of the file!

Examples of commonly used sed Commands (2)

Page 25: 9 The sed Editor

Flags to commands

• sed commands can be given flags. We have already seen the substitute command with the g flag:– s/lizard/koala/g

• Other flags to s are:– n - replace the nth occurrence of pattern

with replacement text • e.g. sed ‘s/dog/cat/4’ myfile

– p - print pattern space to stdout if substitution successful

• e.g. sed -n ‘s/dog/cat/p’ myfile

A flag to the s command. It tells s to substitute ALL occurences of…

Page 26: 9 The sed Editor

Flags to Commands (2)

– w filename - write the pattern space of lines that are changed to resultsfile if substitution successful • e.g. sed ‘s/dog/cat/w resultsfile’ myfile• NOTE: here there must be exactly ONE

SPACE between the w and the resultsfile• resultsfile will contain only those lines that

sed applied the substitution to

Page 27: 9 The sed Editor

Running sed from a Script

• sed commands can be put in a file called a script

• E.g.:

• …and run from the command line:

# this is my sed script

s/horse/cow/gs/chicken/duck/gs/newt/lizard/g

script.sed

$ sed -f script.sed myCreaturesFile

A comment in sed

Page 28: 9 The sed Editor

Piping to and from sed(and a much more complicated example!)

• The UNIX who command gives an output:

$ whozliybbs pts/5 Apr 8 19:11zliybsj2 pts/6 Apr 8 18:42 (ss-226-host39.nottingham.edu.cn)zliybyk2 pts/9 Apr 6 14:30zliybyk2 pts/10 Apr 6 14:31zliybbs pts/11 Apr 8 19:15 (10.20.50.15)zliybyy2 pts/12 Apr 8 20:10 zliybwj pts/15 Apr 6 14:34zuczpd pts/17 Apr 6 14:44zuczpd pts/18 Apr 6 14:44zuczpd pts/19 Apr 6 14:44 (ss-226-host67.nottingham.edu.cn)zuczpd pts/20 Apr 6 14:45 (ss-226-host67.nottingham.edu.cn)zlizmj pts/1 Apr 9 08:49 (10.20.10.85)

Page 29: 9 The sed Editor

Piping to and from sed (2) (and a much more complicated example!)

• If we wanted to extract only the machine names from this output, we could use the following command:

• who | sed -n ‘s/.*(\(.*\))/\1/p’

What ON EARTH does this mean???? ☺

Page 30: 9 The sed Editor

who | sed -n ‘s/.*(\(.*\))/\1/p’

Take the output from the UNIX who command and pipe it onto sed

Take everything up to and including the first open bracket …

This denotes the start of a region of interest

This denotes the end of a region of interest

Take everything after the first open bracket “(“up to, but not including, the close bracket “)”and keep it for future referencing in a region of interest

…and substitute it with the region of interest that was saved earlier, referenced as number 1

(REMEMBER from earlier: \n means nth group)

Page 31: 9 The sed Editor

Piping to and from sed (3)

• If we then wanted to sort the result into alphabetical order, we could pipe it onto sort:who |sed -n ‘s/.*(\(.*\))/\1/p’ | sort

• We could then redirect the whole output to a file:who | sed -n ‘s/.*(\(.*\))/\1/p’ | sort >

machines.txt

Page 32: 9 The sed Editor

An Example of Data Manipulation using sed

• Suppose we had a file names.txt in the form forename:surname (with a colon in between):

• …and we wanted to reverse the names so that they were in the order surname,forename (with a comma in between)…

Steve:BradfordSaun:HigginsGail:HopkinsSara:MeadFred:SmithHenry:Taylor

Page 33: 9 The sed Editor

An Example of Data Manipulation using sed (2)

• sed -e ‘s/\(.*\):\(.*\)/\2,\1/’

• …would produce the following output:

Bradford,SteveHiggins,SaunHopkins,GailMead,SaraSmith,FredTaylor,Henry

EXPLANATION:

This uses regions of interest. It puts the forename in a region of interest and then puts the surname in another region of interest. It then outputs the second region of interest followed by the first.

Page 34: 9 The sed Editor

Using Different Delimiters• Often, / is used in sed scripts as a

delimiter• However, other characters can be used

as delimiters instead– sed takes the first character that it expects

to be the delimiter as the delimiter

• All of these are therefore equally viable:

• Why would we want a different delimiter?

s/horse/cow/g s,horse,cow,g

s:horse:cow:g s$horse$cow$g

Page 35: 9 The sed Editor

Using Different Delimiters (2)• Suppose we had an HTML file which we

wanted to convert to XHTML– We therefore want to change

• all occurrences of <H1> to <h1>• all occurrences of <H2 to <h2>• all occurrences of </H1> to </h1>• all occurrences of </H2> to </h2>• and so on…

s/<H1>/<h1>/gs/<H1>/<h1>/gs:</H1>:</h1>:gs:</H2>:</h2>:g

Here we have used : as a delimiter because there are slashes in the data

Page 36: 9 The sed Editor

sed Tries to Match the Longest Expression!

• Suppose we had an HTML file and we wanted to remove all the markup:

• We could instruct sed to find a ‘<‘ character followed by zero or more other characters until a ‘>’ character:

• sed -e 's/<.*>//g' UST.html • This would produce:

<b>Welcome</b> to the <i>UST</i> website.

website. Why??

Page 37: 9 The sed Editor

sed Tries to Match the Longest Expression! (2)

• …because sed tries to find the longest expression that matches:

• <b>Welcome</b> to the <i>UST</i>

• …instead, we need to specify that sed looks for a ‘<‘ character followed by zero or more non-‘>’ characters followed by a ‘>’ character:

• sed -e 's/<[^>]*>//g' UST.html • sed will then match <b> and </b> and

<i>, and so on…

Page 38: 9 The sed Editor

Character Classes - POSIX Compliant sed

• Often in sed you want to specify a regular expression that contains white space (TABs, spaces, etc.)

• POSIX compliant sed offers a simple way of doing this with a character class:

• sed ‘s/[[:space:]]//g’ myfile• Character classes give you a way of

specifying, within a regular expression, types of characters to search for

Page 39: 9 The sed Editor

Character Classes (2)

• [:alnum:] Alphanumeric [a-z A-Z 0-9]

• [:alpha:] Alphabetic [a-z A-Z]

• [:blank:] Spaces or tabs• [:cntrl:] Any control

characters• [:digit:] Numeric digits [0-

9]• [:graph:] Any visible

characters (no whitespace)

• [:lower:] Lower-case [a-z]• [:print:] Non-control

characters• [:punct:] Punctuation

characters• [:space:] Whitespace• [:upper:] Upper-case [A-

Z]• [:xdigit:] hex digits [0-9

a-f A-F]

Page 40: 9 The sed Editor

Summary

• An introduction to sed• Format of sed statements• Addresses• Types of command• Putting sed inside a script• Some more advanced examples of

sed