Advanced Text Processing. 222 Lecture Overview Character manipulation commands cut, paste, tr Line...

60
Advanced Text Processing

Transcript of Advanced Text Processing. 222 Lecture Overview Character manipulation commands cut, paste, tr Line...

Page 1: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

Advanced Text Processing

Page 2: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

222

Lecture Overview

Character manipulation commands cut, paste, tr

Line manipulation commands sort, uniq, diff

Regular expressions and grep

Text replacement using sed

Page 3: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

333

Cutting Lines – cut

The cut command extracts sections from each line of the input file

Command line options for cut: -c – output only these characters -f – output only these fields -d – use this character as the field delimiter

cut options [files]

Page 4: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

444

Cutting Lines – cut

With cut, at least one of the selection options (-c or -f) must be specified

The value given with -c or -f can be: A number – specifies a single character position A range – specifies a sequence of positions A comma separated list – specifies multiple

positions or ranges

Page 5: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

555

cut – Examples

Given a file called 'my_phones.txt':ADAMS, Andrew 7583BARRETT, Bruce 6466BAYES, Ryan 6585BECK, Bill 6346BENNETT, Peter 7456GRAHAM, Linda 6141HARMER, Peter 7484MAKORTOFF, Peter 7328MEASDAY, David 6494NAKAMURA, Satoshi 6453REEVE, Shirley 7391ROSNER, David 6830

Page 6: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

666

cut – Examples

head -3 my_phones.txt | cut -c3-16

AMS, Andrew 75RRETT, Bruce 6YES, Ryan 6585

head -3 my_phones.txt | cut -d" " -f2

AndrewBruceRyan

head -3 my_phones.txt | cut -c1-3,10,12,15-18

ADAde7583BARBu 646BAYa 85

Page 7: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

777

Merging Files – paste

The paste command merges multiple files by concatenating corresponding lines

Command line options for paste: -d – provide a list of separator characters -s – paste one file at a time instead of in parallel

(each file becomes a single line)

paste [options] [files]

Page 8: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

888

paste – Examples

Assume that we are given 3 input files:

AndrewBruceRyanBillPeterLindaPeterPeterDavidSatoshi

first.txtADAMSBARRETTBAYESBECKBENNETTGRAHAMHARMERMAKORTOFFMEASDAYNAKAMURA

last.txt7583646665856346745661417484732864946453

num.txt

Page 9: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

999

paste – Examples

paste first.txt last.txt num.txt | head -3

Andrew ADAMS 7583Bruce BARRETT 6466Ryan BAYES 6585

paste -d" :" first.txt last.txt num.txt | head -3

Andrew ADAMS:7583Bruce BARRETT:6466Ryan BAYES:6585

paste -s last.txt first.txt num.txt | cut -f1-5,10

ADAMS BARRETT BAYES BECK BENNETT NAKAMURAAndrew Bruce Ryan Bill Peter Satoshi7583 6466 6585 6346 7456 6453

Page 10: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

101010

Translating Characters – tr

The tr command is used to translate between one character set and another

Input is read from standard input and written to standard output (no files)

With no options, tr accepts two character sets with equal lengths, and replaces each character with the corresponding one

tr [options] set1 [set2]

Page 11: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

111111

Deleting or Squeezing Characters – tr

Sets contain literal characters, or character ranges, such as: 'a-z' or 'DEFa-z'

With command line options, tr can also be used to delete or squeeze characters

Command line options for tr: -d – delete characters in set1 -s – replace sequence of characters with one

Page 12: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

121212

Defining Sets for tr

tr has some interpreted sequences to simplify the definition of sets: [:alpha:] – all letters [:digit:] – all digits [:alnum:] – all letters and digits [:space:] – all whitespace [:punct:] – all punctuation characters [CHAR*REPEAT] – REPEAT copies of CHAR [CHAR*] – copies of CHAR until set1 length

Page 13: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

131313

tr – Examples

Change lower case to capital, and replace the digits 6, 7, 8 with the letters x, y, z

head -3 padded_phones.txt

ADAMS Andrew 7583BARRETT Bruce 6466BAYES Ryan 6585

head -3 padded_phones.txt | tr 'a-z678' 'A-Zxyz'

ADAMS ANDREW y5z3BARRETT BRUCE x4xxBAYES RYAN x5z5

Page 14: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

141414

tr – Examples

Squeeze sequences of spaces into one:

Delete spaces, and digits 7 and 8:head -3 padded_phones.txt | tr -d " 78"

ADAMSAndrew53BARRETTBruce6466BAYESRyan655

head -3 padded_phones.txt | tr -s " "

ADAMS Andrew 7583BARRETT Bruce 6466BAYES Ryan 6585

Page 15: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

151515

Reading from Standard Input

Many UNIX commands accept one or more input files listed in the command line(tr is one of the few that don't)

If no input file is given, these commands will read from the standard input

Alternately, if the file list contains a '-', the standard input will be inserted in its place

Page 16: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

161616

Standard Input – Example

cat last.txt | tr "A-Z" "a-z" | \ paste –d"_" first.txt - number.txt | head -10

Andrew_adams_7583Imelda_aguilar_6518Daniel_albers_7540Pierre_amaudruz_7567Friedhelm_ames_7581Willy_andersson_6238Andrei_andreyev_6491Jonathan_aoki_6820Donald_arseneau_6295Danny_ashery_6188

Page 17: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

171717

Lecture Overview

Character manipulation commands cut, paste, tr

Line manipulation commands sort, uniq, diff

Regular expressions and grep

Text replacement using sed

Page 18: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

181818

Sorting Files – sort

The sort command reorders the lines ina file (or files), and sends the result to the standard output

Command line options for sort: -f – ignore case (fold lowercase to uppercase) -r – sort in reverse order -n – sort in numeric order

sort [options] [files]

Page 19: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

191919

Sorting Files – sort

With no options given, the input is sorted based on the ASCII code order

The sort command has many more options for selecting which fields to sort by, and for changing the way input is treated

As always, you should read the man pages for the full details

Page 20: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

202020

sort – Example: Using Ignore-Case

AndrewbillBrucepeterRyan

AndrewBruceRyanbillpeter

BruceRyanpeterAndrewbill

sort -f

sort

Page 21: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

212121

sort – Example: Sorting Numbers

1838665751256875

1256875183857566

3818125687566575

sort -n

sort

Page 22: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

222222

Removing Duplicate Lines – uniq

The uniq command removes adjacent duplicate lines from its input file If input is sorted, removes all duplicate lines

Command line options for uniq: -i – ignore case -c – prefix lines by the number of occurrences -d – only print duplicate lines -u – only print unique lines

Page 23: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

232323

uniq – Example

1 Andrew1 Bill2 David3 Peter1 Ryan

AndrewBillDavidPeterRyan

AndrewBillDavidDavidPeterPeterPeterRyan

uniq -c

uniq

Page 24: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

242424

uniq – Example

AndrewBillRyan

DavidPeter

AndrewBillDavidDavidPeterPeterPeterRyan

uniq -u

uniq -d

Page 25: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

252525

Example – File Processing Using Pipes

Task – go over the book "War and Peace" and count the appearances of each word Step 1: remove all punctuation marks

Step 2: put each word in a separate line

Step 3: sort words

cat war_and_peace.txt | tr -d '[:punct:]'

cat war_and_peace.txt | tr -d '[:punct:]' |tr " " "\n"

cat war_and_peace.txt | tr -d '[:punct:]' |tr " " "\n" | sort

Page 26: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

262626

Example – File Processing Using Pipes

Step 4: count appearances of each word

Step 5: sort result by number of appearances

Step 6: write output to file

cat war_and_peace.txt | tr -d '[:punct:]' |tr " " "\n" | sort | uniq -c | sort -nr

cat war_and_peace.txt | tr -d '[:punct:]' |tr " " "\n" | sort | uniq -c

cat war_and_peace.txt | tr -d '[:punct:]' |tr " " "\n" | sort | uniq -c | sort -nr > words.txt

Page 27: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

272727

Comparing Text Files – diff

The diff command takes two input files, and compares them

The output contains only the different lines, with their line numbers

Command line options for diff: -i – ignore case -b – ignore changes in amount of white space -B – ignore insertion or deletion of blank lines

Page 28: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

282828

diff – Examples

2,3c2,3< BARRETT Bruce 6466< BAYES Ryan 6585---> BARRETT Bruce 3333> BAYES Ryan 65855c5< BENNETT Peter 7456---> Bennett peter 7456

diff

ADAMS Andrew 7583BARRETT Bruce 3333BAYES Ryan 6585BECK Bill 6346Bennett peter 7456

ADAMS Andrew 7583BARRETT Bruce 6466BAYES Ryan 6585BECK Bill 6346BENNETT Peter 7456

Page 29: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

292929

diff – Examples

2c2< BARRETT Bruce 6466---> BARRETT Bruce 33335c5< BENNETT Peter 7456---> Bennett peter 7456

diff -b

ADAMS Andrew 7583BARRETT Bruce 3333BAYES Ryan 6585BECK Bill 6346Bennett peter 7456

ADAMS Andrew 7583BARRETT Bruce 6466BAYES Ryan 6585BECK Bill 6346BENNETT Peter 7456

2c2< BARRETT Bruce 6466---> BARRETT Bruce 3333diff -bi

Page 30: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

303030

Maintaining Output Consistency

During program development, assume that we have reached the correct output

We want to verify that it does not change Create reference output file:

After changing the program, compare output:

prog > prog.out

prog | diff – prog.out

Page 31: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

313131

Lecture Overview

Character manipulation commands cut, paste, tr

Line manipulation commands sort, uniq, diff

Regular expressions and grep

Text replacement using sed

Page 32: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

323232

Searching For Matching Patterns – grep

The grep command searches files for patterns, and prints matching lines

The mandatory regexp argument defines a regular expression

A regular expression is a formula for matching strings that follow some pattern

grep [options] regexp [files]

Page 33: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

333333

Searching For Matching Patterns – grep

The simplest regular expression is just a sequence of characters

This regular expression matches only a single string – itself

The following command prints all lines from any of files that contain word:

grep word files

Page 34: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

343434

Searching For Matching Patterns – grep

The power of grep lies in using more sophisticated regular expressions

Command line options for grep: -v – print all lines that don't match -c – print only a count of matched lines -n – print line numbers -h – don't print file names (for multiple files) -l – print file name but not matching line

Page 35: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

353535

Regular Expressions

Regular expressions are a powerful tool for searching and selecting text

Their origin is in the UNIX grep command (and further back in automata theory)

They have since been copied into many other tools and languages such as awk, sed, perl and Java

Page 36: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

363636

Regular Expressions vs.Filename Expansion

Note that regular expressions are different from filename expansion

Filename expansion uses some regular expression concepts and symbols, but: Filename expansion is done by the shell Regular expressions are passed as arguments to

specific commands or utilities

Page 37: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

373737

Matching a Single Character

A period (.) matches any single character

For example:

Regular Expression

Matches Doesn't Match

b.g bagdebugbigger

bragbgbad

U..X UNIX unix

. a, b, c An empty line

Page 38: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

383838

Matching a Character Class

Square brackets ([]) match any single character within the brackets

If the first character following the left bracket is a '^', the expression matches any character not in the brackets

A '-' can be used to indicate a range,such as: [a-z]

Page 39: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

393939

Matching a Character Class

Regular Expression

Matches Doesn't Match

[Bb]ill Billbillgot billed

Dillillkill

t[aeiou].k talkstackstink

tracktake

number [^0-5] number xxxnumber 8:

number 59

Page 40: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

404040

Matching a Character Class

The same predefined character classes used for tr can also be used here

For portability reasons, [:alpha:] is always preferable to [A-Za-z]

Note: the brackets are part of the symbolic names, and must be included in addition to the enclosing brackets, i. e. [[:alpha:]]

Page 41: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

414141

Matching Repetitions

An asterisk (*) represents zero or more matches of the regular expression it follows

Regular Expression

Matches Doesn't Match

ab*c acabcaaabbbc

abacacb

t.*ing thingstringthinking

king

Page 42: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

424242

Matching Special Characters

Sometimes we want to literally matcha character that has a special meaning, such as '*' or '['

There are two ways to do that: Precede the character with a '\' Use square brackets – any character inside is

taken literally

Page 43: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

434343

Matching Special Characters

Regular Expression

Matches Doesn't Match

a\.c a.c abc

\.\.\.* the end...more.....

abcstop.

[*.] * start *Sys.print

Hello worldabc

C:\\bin C:\bin C:\\bin

Page 44: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

444444

Matching the Beginning orthe End of a Line

A regular expression that begins with a caret (^) can match a string only at the beginning of a line

Similarly, a regular expression that ends with a dollar sign ($) can match a string only at the end of a line

Page 45: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

454545

Matching the Beginning orthe End of a Line

Regular Expression

Matches Doesn't Match

^T This lineThat bug

STARTMy Tag

^num.*[0-9]$ num5num99number 1

my num1the number 6num 6a

^t.*k$ talktracktk

stacktake

Page 46: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

464646

Using Regular Expressions with grep – Examples

cat bugs.txt

big boybad bugbagbigger bagbetterboogie nights

grep 'b.g' bugs.txt

big boybad bugbagbigger bag

grep 'b.g.' bugs.txt

big boybigger bag

grep 'b.*g.' bugs.txt

big boybigger bagboogie nights

Page 47: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

474747

Using Regular Expressions with grep – Examples

cat f.txt

ADAMS,Andrew7583BARRETT,Bruce6466BAYES,Ryan6585

grep '[[:alpha:]],' f.txt

grep '^[C-Z][[:lower:]]*$' f.txtRyan

ADAMS,BARRETT,BAYES,

64666585

grep '^[^[:alpha:]0-3]*$' f.txt

Page 48: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

484848

Pipes and Regular Expressions – Example

Task: create a file containing the names of all source files in the current directory, sorted by the number of lines in each file Step 1: count lines in each file

Step 2: leave only '.c' and '.h' files

Step 3: sort in reverse order (largest first)

wc -l *

wc -l * | grep '\.[ch]$'

wc -l * | grep '\.[ch]$' | sort -nr

Page 49: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

494949

Pipes and Regular Expressions – Example

Step 4: squeeze leading spaces (into one)

Step 5: remove number field

Step 6: write output to file

wc -l * | grep '\.[ch]$' | sort -nr | tr -s " " | cut -d" " –f3 > sorted_source_files.txt

wc -l * | grep '\.[ch]$' | sort -nr | tr -s " "

wc -l * | grep '\.[ch]$' | sort -nr | tr -s " " | cut -d" " –f3

Page 50: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

505050

Which grep to Use?

In addition to grep itself, there are two more variants of it: egrep and fgrep Use grep for most standard text finding tasks Use egrep for complex tasks, where basic regular

expressions are just not enough, and you need to use extended regular expressions

Use fgrep when only fixed strings are searched, and speed is of the essence

Page 51: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

515151

Extended Regular Expressions – egrep

Extended regular expressions support all basic regular expression syntax, plus some additional special characters: + – similar to '*', but at least one appearance ? – similar to '*', but zero or one appearances () – grouping a|b – the OR operator – matches either regular

expression a or regular expression b

Page 52: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

525252

Extended Regular Expressions – egrep

Regular Expression

Matches Doesn't Match

num6+ num666 num654

num566 number

num6?5 num65num555

num6num665

Barret|Bennet BarretBennet

B(arr|enn)et BarretBennet

Page 53: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

535353

Lecture Overview

Character manipulation commands cut, paste, tr

Line manipulation commands sort, uniq, diff

Regular expressions and grep

Text replacement using sed

Page 54: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

545454

Stream Editor – sed

sed is a script editor for text streams, which supports basic regular expressions

It performs transformations on an input stream, based on simple instructions

sed has many commands, but the most commonly used is the substitute command:

sed 's/pattern/replacement/[g]' [file]

Page 55: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

555555

Stream Editor – sed

pattern is any basic regular expression replacement is a string that will replace one

or more matches of pattern The optional g flag defines whether the

operation is global – without it only the first match in every line is replaced

The special character '&' can be used inside replacement to refer to the matched text

Page 56: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

565656

Using Regular Expressions with grep – Examples

cat bugs.txt

big boybad bugbagbigger bagbetter

sed 's/b.g/XXX/' bugs.txt

XXX boybad XXXXXXXXXger bagbetter

sed 's/b.g/XXX/g' bugs.txt

XXX boybad XXXXXXXXXger XXXbetter

Page 57: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

575757

sed – Examples

head -2 my_phones.txt

head -2 my_phones.txt | sed 's/ [[:upper:]]/<&>/g'

ADAMS,< A>ndrew 7583BARRETT,< B>ruce 6466

ADAMS, Andrew 7583BARRETT, Bruce 6466

ADAMS, Andrew ###BARRETT, Bruce ###

head -2 my_phones.txt | sed 's/[[:digit:]]*$/###/g'

Page 58: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

585858

Matching and Reusing Portions ofa Pattern in sed

It is also possible to use portions of the matching pattern

Within the pattern, portions should be enclosed between '\(' and '\)'

In replacement , the special sequences: '\1', '\2', etc. can be used to refer to the matched portions

Page 59: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

595959

Matching and Reusing Portions ofa Pattern in sed – Examples

Remove the first name from each line:

Replace first name with initial:head -2 my_phones.txt |sed 's/ \([[:upper:]]\)[[:lower:]]* / \1. /'

ADAMS, A. 7583BARRETT, B. 6466

ADAMS, 7583BARRETT, 6466

head -2 my_phones.txt |sed 's/ [[:upper:]][[:lower:]]* / /'

Page 60: Advanced Text Processing. 222 Lecture Overview  Character manipulation commands cut, paste, tr  Line manipulation commands sort, uniq, diff  Regular.

606060

Matching and Reusing Portions ofa Pattern in sed – Examples

Switch between first and last names:

Switch names and parenthesize number:head -2 my_phones.txt |sed 's/\(.*\), \(.*\) \(.*\)/\2 \1: (03-555\3)/'

Andrew ADAMS: (03-5557583)Bruce BARRETT: (03-5556466)

Andrew ADAMS 7583Bruce BARRETT 6466

head -2 my_phones.txt |sed 's/\(.*\), \(.*\) /\2 \1 /'