Lecture 5 sed and awk. Last week Regular Expressions –grep (BRE) –egrep (ERE) Sed - Part I.
Grep Awk Sed
-
Upload
vishnuselva -
Category
Documents
-
view
217 -
download
0
Transcript of Grep Awk Sed
-
8/10/2019 Grep Awk Sed
1/9
grep, awk and sed
three VERY useful command-line utilities
Matt Probert, Uni of York
grep = global regular expression printIn the simplest terms, grep (global regular expression print) will search input files for a searchstring, and print the lines that match it. eginning at the first line in the file, grep copies a line into abuffer, compares it against the search string, and if the comparison passes, prints the line to thescreen. !rep will repeat this process until the file runs out of lines. "otice that nowhere in thisprocess does grep store lines, change lines, or search onl# a part of a line.
Example data file
Please cut $ paste the following data and sa%e to a file called &a'file&
bootbookboozemachinebootsbungiebark
aardvarkbroken$tuffrobots
A imple Example
he simplest possible example of grep is simpl#
grep "boo" a_file
In this example, grep would loop through e%er# line of the file *a'file* and print out e%er# line that
contains the word &boo&bootbookboozeboots
!seful "ptions
his is nice, but if #ou were working with a large fortran file of something similar, it wouldprobabl# be much more useful to #ou if the lines identified which line in the file the# were, whatwa# #ou could track down a particular string more easil#, if #ou needed to open the file in an editor
to make some changes. his can be accomplished b# adding the +n parameter
grep -n "boo" a_file
-
8/10/2019 Grep Awk Sed
2/9
his #ields a much more useful result, which explains which lines matched the search string
1:boot2:book3:booze5:boots
nother interesting switch is +%, which will print the negati%e result. In other words, grep will printall of the lines that do not match the search string, rather than printing the lines that match it. In thefollowing case, grep will print e%er# line that does not contain the string *boo,* and will displa# theline numbers, as in the last example
grep -vn "boo" a_file
In this particular case, it will print
:machine!:bungie:bark#:aaradvark
:robotshe +c option tells grep to supress the printing of matching lines, and onl# displa# the number oflines that match the -uer#. or instance, the following will print the number /, because there are /occurences of *boo* in a'file.
grep -c "boo" a_file
he +l option prints onl# the filenames of files in the -uer# that ha%e lines that match the searchstring. his is useful if #ou are searching through multiple files for the same string. like so
grep -l "boo" %
n option more useful for searching through non+code files is +i, ignore case. his option will treatupper and lower case as e-ui%alent while matching the search string. In the following example, thelines containg *boo* will be printed out, e%en though the search string is uppercase.
grep -i "&''" a_file
he +x option looks for e0act matches onl#. In other words, the following command will printnothing, because there are no lines that onl# contain the pattern *boo*
grep -( "boo" a_file
inall#, + allows #ou to specif# additional lines of context file, so #ou get the search string plus anumber of additional lines, e.g.
grep -)2 *mach+ a_filemachinebootsbungie
Regular Expressions
regular expression is a compact wa# of describing complex patterns in text. 1ith grep, #ou canuse them to search for patterns. 2ther tools let #ou use regular expressions (3regexps4) to modif#the text in complex wa#s. he normal strings we ha%e been using so far are in fact 5ust %er# simple
regular expressions. You ma# also come across them if #ou use wildcards such as &6& or &7& whenlisting filenames etc. You ma# use grep to search using basic regexps such as to search the file forlines ending with the letter e
-
8/10/2019 Grep Awk Sed
3/9
grep "e$" a_file
his will, of course, print
boozemachinebungie
If #ou want a wider range of regular expression commands then #ou must use &grep +8& (also knownas the egrep command). or instance, the regexp command 7 will match 9 or : occurences of thepre%ious character
grep -, "boots" a_file
his -uer# will return
bootboots
You can also combine multiple searches using the pipe (;) which means &or& so can do things like
grep -, "boot.boots" a_filebootboots
pecial characters
1hat if the thing #ou want to search for is a special character7 If #ou wanted to find all linescontaining the dollar character &
-
8/10/2019 Grep Awk Sed
4/9
A$%
text pattern scanning and processing language, created b# ho, 1einberger $ @ernighan (hencethe name). It can be -uite sophisticated so this is "2 a complete guide, but should gi%e #ou a taste
of what awk can do. It can be %er# simple to use, and is strongl# recommended. here are man# on+line tutorials of %ar#ing complexit#, and of course, there is alwa#s &man awk&.
A$% basics
n awk program operates on each line of an input file. It can ha%e an optional 8!I"AB section ofcommands that are done before processing an# content of the file, then the main AB section workson each line of the file, and finall# there is an optional 8"CAB section of actions that happen afterthe file reading has finished
&, 4 6 initialization a7k commands 84 6 a7k commands for each line of the file8,9 4 6 finalization a7k commands 8
or each line of the input file, it sees if there are an# pattern+matching instructions, in which case itonl# operates on lines that match that pattern, otherwise it operates on all lines. hese &pattern+matching& commands can contain regular expressions as for grep. he awk commands can do some-uite sophisticated maths and string manipulations, and awk also supports associati%e arra#s.
1@ sees each line as being made up of a number of fields, each being separated b# a &field
separator&. # default, this is one or more space characters, so the line
this is a line of te(t
contains D fields. 1ithin awk, the first field is referred to as
-
8/10/2019 Grep Awk Sed
5/9
-r7------- 1 mi;p1 mi;p1 3!2#! 'ct 21 12:@5 uniform_rand_period6agr
I can see the file siHe is reported as the thcolumn of data. >o if I wanted to know the total siHe of allthe files in this director# I could do
mi;p1andomumbers?$ ls -l . a7k /&, 4sumA@8 4sumAsumB$58 ,9
4print sum8/2!!#2!
"ote that &print sum& prints the %alue of the %ariable sum, so if sumFE then &print sum& gi%es theoutput &E& whereas &print
-
8/10/2019 Grep Awk Sed
6/9
A$% input(output statements include'
closeCfile G ho7?F Hlose fileG pipe or co-process6
getline Iet $@ from ne(t input record6
getline Efile Iet $@ from ne(t record of file6
getline var Iet var from ne(t input record6
getline var Efile Iet var from ne(t record of file6
ne(t Itop processing the current input record6 Jhe ne(tinput record is read and processing starts over7ith the first pattern in the )KL program6 f theend of the input data is reachedG the ,9 blockCsFGif an=G are e(ecuted6
ne(tfile Itop processing the current input file6 f the endof the input data is reachedG the ,9 blockCsFG if
an=G are e(ecuted6
print Mrints the current record6
print e(pr-list Mrints e(pressions6
print e(pr-listNfile
Mrints e(pressions on file6
printf fmtG e(pr-list
Oormat and print6
" he printf command lets #ou specif# the output format more closel#, using a +like s#ntax, for
example, #ou can specif# an integer of gi%en width, or a floating point number or a string, etc.
A$% numeric functions include'
atan2C=G (F Geturns the arctangent of #?x in radians.
cosCe(prF Geturns the cosine of expr, which is in radians.
e(pCe(prF he exponential function.
intCe(prF runcates to integer.
logCe(prF he natural logarithm function.
>andCF Geturns a random number ", between : and 9, such that : NF " N 9.
sinCe(prF Geturns the sine of expr, which is in radians.
sPrtCe(prF he s-uare root function.
srandCe(pr?F Uses expr as a new seed for the random number generator. If no expr ispro%ided, the time of da# is used.
-
8/10/2019 Grep Awk Sed
7/9
A$% string functions include'
gsubCrG s G t?F or each substring matching the regular expression r in the string t,substitute the string s, and return the number of substitutions. If t is not
supplied, use is used instead.
sprintfCfmtG e(pr-
listF
Prints expr+list according to fmt, and returns the resulting string.
strtonumCstrF 8xamines str, and returns its numeric %alue.
subCrG s G t?F Oust like gsub(), but onl# the first matching substring is replaced.
substrCsG i G n?F Geturns the at most n+character substring of s starting at i. If n isomitted, the rest of s is used.
tolo7erCstrF Geturns a cop# of the string str, with all the upper+case characters in strtranslated to their corresponding lower+case counterparts. "on+alphabetic characters are left unchanged.
toupperCstrF Geturns a cop# of the string str, with all the lower+case characters in str
translated to their corresponding upper+case counterparts. "on+alphabetic characters are left unchanged.
A$% command-line and usage
You can pass %ariables into an awk program using the &+%& flag as man# times as necessar#, e.g.
a7k -v skipA3 /4for CiA1DiEskipDiBBF 4getline8D print $@8/ a_file
You can also write an awk program using an editor, and then sa%e it as a special scripting file, e.g.
mi;p1
-
8/10/2019 Grep Awk Sed
8/9
-
8/10/2019 Grep Awk Sed
9/9
"ther E) commands
he general form is
sed -e /TpatternT command/ m=_file
where &pattern& is a regexp and &command& can be one of &s& F search $ replace, or &p& F print, or &d& F
delete, or &i&Finsert, or &a&Fappend, etc. "ote that the default action is to print all lines that do notmatch an#wa#, so if #ou want to suppress this #ou need to in%oke sed with the &+n& flag and then #oucan use the &p& command to control what is printed. >o if #ou want to do a listing of all the sub+directories #ou could use
ls -l . sed -n -e /TVdT p/
as the long+listing starts each line with the &d& s#mbol if it is a director#, so this will onl# print outthose lines that start with a &d& s#mbol.
>imilarl#, if #ou wanted to delete all lines that start with the comment s#mbol &S& #ou could use
sed -e /TVRT d/ m=_file
i.e. #ou can achie%e the same effect in different wa#s=
You can also use the range form
sed -e /1G1@@ command/ m=_file
to execute &command& on lines 9,9::. You can also use the special line number &o if #ou wanted to delete all but the first 9: lines of a file, #ou could use
sed -e /11G$ d/ m=_file
You can also use a pattern+range form, where the first regexp defines the start of the range, and thesecond the stop.
*urther Reading
here is much more #ou can do with sed K see http??www.gr#moire.com?Unix?>ed.htmlfor a nicetutorial.
T9.E Matt Probert E:?99?E:9/
http://www.grymoire.com/Unix/Sed.htmlhttp://www.grymoire.com/Unix/Sed.html