Grep Awk Sed

download Grep Awk Sed

of 9

Transcript of Grep Awk Sed

  • 8/10/2019 Grep Awk Sed

    1/9

    grep, awk and sed

    three VERY useful command-line utilities

    Matt Probert, Uni of York

    grep = global regular expression printIn the simplest terms, grep (global regular expression print) will search input files for a searchstring, and print the lines that match it. eginning at the first line in the file, grep copies a line into abuffer, compares it against the search string, and if the comparison passes, prints the line to thescreen. !rep will repeat this process until the file runs out of lines. "otice that nowhere in thisprocess does grep store lines, change lines, or search onl# a part of a line.

    Example data file

    Please cut $ paste the following data and sa%e to a file called &a'file&

    bootbookboozemachinebootsbungiebark

    aardvarkbroken$tuffrobots

    A imple Example

    he simplest possible example of grep is simpl#

    grep "boo" a_file

    In this example, grep would loop through e%er# line of the file *a'file* and print out e%er# line that

    contains the word &boo&bootbookboozeboots

    !seful "ptions

    his is nice, but if #ou were working with a large fortran file of something similar, it wouldprobabl# be much more useful to #ou if the lines identified which line in the file the# were, whatwa# #ou could track down a particular string more easil#, if #ou needed to open the file in an editor

    to make some changes. his can be accomplished b# adding the +n parameter

    grep -n "boo" a_file

  • 8/10/2019 Grep Awk Sed

    2/9

    his #ields a much more useful result, which explains which lines matched the search string

    1:boot2:book3:booze5:boots

    nother interesting switch is +%, which will print the negati%e result. In other words, grep will printall of the lines that do not match the search string, rather than printing the lines that match it. In thefollowing case, grep will print e%er# line that does not contain the string *boo,* and will displa# theline numbers, as in the last example

    grep -vn "boo" a_file

    In this particular case, it will print

    :machine!:bungie:bark#:aaradvark

    :robotshe +c option tells grep to supress the printing of matching lines, and onl# displa# the number oflines that match the -uer#. or instance, the following will print the number /, because there are /occurences of *boo* in a'file.

    grep -c "boo" a_file

    he +l option prints onl# the filenames of files in the -uer# that ha%e lines that match the searchstring. his is useful if #ou are searching through multiple files for the same string. like so

    grep -l "boo" %

    n option more useful for searching through non+code files is +i, ignore case. his option will treatupper and lower case as e-ui%alent while matching the search string. In the following example, thelines containg *boo* will be printed out, e%en though the search string is uppercase.

    grep -i "&''" a_file

    he +x option looks for e0act matches onl#. In other words, the following command will printnothing, because there are no lines that onl# contain the pattern *boo*

    grep -( "boo" a_file

    inall#, + allows #ou to specif# additional lines of context file, so #ou get the search string plus anumber of additional lines, e.g.

    grep -)2 *mach+ a_filemachinebootsbungie

    Regular Expressions

    regular expression is a compact wa# of describing complex patterns in text. 1ith grep, #ou canuse them to search for patterns. 2ther tools let #ou use regular expressions (3regexps4) to modif#the text in complex wa#s. he normal strings we ha%e been using so far are in fact 5ust %er# simple

    regular expressions. You ma# also come across them if #ou use wildcards such as &6& or &7& whenlisting filenames etc. You ma# use grep to search using basic regexps such as to search the file forlines ending with the letter e

  • 8/10/2019 Grep Awk Sed

    3/9

    grep "e$" a_file

    his will, of course, print

    boozemachinebungie

    If #ou want a wider range of regular expression commands then #ou must use &grep +8& (also knownas the egrep command). or instance, the regexp command 7 will match 9 or : occurences of thepre%ious character

    grep -, "boots" a_file

    his -uer# will return

    bootboots

    You can also combine multiple searches using the pipe (;) which means &or& so can do things like

    grep -, "boot.boots" a_filebootboots

    pecial characters

    1hat if the thing #ou want to search for is a special character7 If #ou wanted to find all linescontaining the dollar character &

  • 8/10/2019 Grep Awk Sed

    4/9

    A$%

    text pattern scanning and processing language, created b# ho, 1einberger $ @ernighan (hencethe name). It can be -uite sophisticated so this is "2 a complete guide, but should gi%e #ou a taste

    of what awk can do. It can be %er# simple to use, and is strongl# recommended. here are man# on+line tutorials of %ar#ing complexit#, and of course, there is alwa#s &man awk&.

    A$% basics

    n awk program operates on each line of an input file. It can ha%e an optional 8!I"AB section ofcommands that are done before processing an# content of the file, then the main AB section workson each line of the file, and finall# there is an optional 8"CAB section of actions that happen afterthe file reading has finished

    &, 4 6 initialization a7k commands 84 6 a7k commands for each line of the file8,9 4 6 finalization a7k commands 8

    or each line of the input file, it sees if there are an# pattern+matching instructions, in which case itonl# operates on lines that match that pattern, otherwise it operates on all lines. hese &pattern+matching& commands can contain regular expressions as for grep. he awk commands can do some-uite sophisticated maths and string manipulations, and awk also supports associati%e arra#s.

    1@ sees each line as being made up of a number of fields, each being separated b# a &field

    separator&. # default, this is one or more space characters, so the line

    this is a line of te(t

    contains D fields. 1ithin awk, the first field is referred to as

  • 8/10/2019 Grep Awk Sed

    5/9

    -r7------- 1 mi;p1 mi;p1 3!2#! 'ct 21 12:@5 uniform_rand_period6agr

    I can see the file siHe is reported as the thcolumn of data. >o if I wanted to know the total siHe of allthe files in this director# I could do

    mi;p1andomumbers?$ ls -l . a7k /&, 4sumA@8 4sumAsumB$58 ,9

    4print sum8/2!!#2!

    "ote that &print sum& prints the %alue of the %ariable sum, so if sumFE then &print sum& gi%es theoutput &E& whereas &print

  • 8/10/2019 Grep Awk Sed

    6/9

    A$% input(output statements include'

    closeCfile G ho7?F Hlose fileG pipe or co-process6

    getline Iet $@ from ne(t input record6

    getline Efile Iet $@ from ne(t record of file6

    getline var Iet var from ne(t input record6

    getline var Efile Iet var from ne(t record of file6

    ne(t Itop processing the current input record6 Jhe ne(tinput record is read and processing starts over7ith the first pattern in the )KL program6 f theend of the input data is reachedG the ,9 blockCsFGif an=G are e(ecuted6

    ne(tfile Itop processing the current input file6 f the endof the input data is reachedG the ,9 blockCsFG if

    an=G are e(ecuted6

    print Mrints the current record6

    print e(pr-list Mrints e(pressions6

    print e(pr-listNfile

    Mrints e(pressions on file6

    printf fmtG e(pr-list

    Oormat and print6

    " he printf command lets #ou specif# the output format more closel#, using a +like s#ntax, for

    example, #ou can specif# an integer of gi%en width, or a floating point number or a string, etc.

    A$% numeric functions include'

    atan2C=G (F Geturns the arctangent of #?x in radians.

    cosCe(prF Geturns the cosine of expr, which is in radians.

    e(pCe(prF he exponential function.

    intCe(prF runcates to integer.

    logCe(prF he natural logarithm function.

    >andCF Geturns a random number ", between : and 9, such that : NF " N 9.

    sinCe(prF Geturns the sine of expr, which is in radians.

    sPrtCe(prF he s-uare root function.

    srandCe(pr?F Uses expr as a new seed for the random number generator. If no expr ispro%ided, the time of da# is used.

  • 8/10/2019 Grep Awk Sed

    7/9

    A$% string functions include'

    gsubCrG s G t?F or each substring matching the regular expression r in the string t,substitute the string s, and return the number of substitutions. If t is not

    supplied, use is used instead.

    sprintfCfmtG e(pr-

    listF

    Prints expr+list according to fmt, and returns the resulting string.

    strtonumCstrF 8xamines str, and returns its numeric %alue.

    subCrG s G t?F Oust like gsub(), but onl# the first matching substring is replaced.

    substrCsG i G n?F Geturns the at most n+character substring of s starting at i. If n isomitted, the rest of s is used.

    tolo7erCstrF Geturns a cop# of the string str, with all the upper+case characters in strtranslated to their corresponding lower+case counterparts. "on+alphabetic characters are left unchanged.

    toupperCstrF Geturns a cop# of the string str, with all the lower+case characters in str

    translated to their corresponding upper+case counterparts. "on+alphabetic characters are left unchanged.

    A$% command-line and usage

    You can pass %ariables into an awk program using the &+%& flag as man# times as necessar#, e.g.

    a7k -v skipA3 /4for CiA1DiEskipDiBBF 4getline8D print $@8/ a_file

    You can also write an awk program using an editor, and then sa%e it as a special scripting file, e.g.

    mi;p1

  • 8/10/2019 Grep Awk Sed

    8/9

  • 8/10/2019 Grep Awk Sed

    9/9

    "ther E) commands

    he general form is

    sed -e /TpatternT command/ m=_file

    where &pattern& is a regexp and &command& can be one of &s& F search $ replace, or &p& F print, or &d& F

    delete, or &i&Finsert, or &a&Fappend, etc. "ote that the default action is to print all lines that do notmatch an#wa#, so if #ou want to suppress this #ou need to in%oke sed with the &+n& flag and then #oucan use the &p& command to control what is printed. >o if #ou want to do a listing of all the sub+directories #ou could use

    ls -l . sed -n -e /TVdT p/

    as the long+listing starts each line with the &d& s#mbol if it is a director#, so this will onl# print outthose lines that start with a &d& s#mbol.

    >imilarl#, if #ou wanted to delete all lines that start with the comment s#mbol &S& #ou could use

    sed -e /TVRT d/ m=_file

    i.e. #ou can achie%e the same effect in different wa#s=

    You can also use the range form

    sed -e /1G1@@ command/ m=_file

    to execute &command& on lines 9,9::. You can also use the special line number &o if #ou wanted to delete all but the first 9: lines of a file, #ou could use

    sed -e /11G$ d/ m=_file

    You can also use a pattern+range form, where the first regexp defines the start of the range, and thesecond the stop.

    *urther Reading

    here is much more #ou can do with sed K see http??www.gr#moire.com?Unix?>ed.htmlfor a nicetutorial.

    T9.E Matt Probert E:?99?E:9/

    http://www.grymoire.com/Unix/Sed.htmlhttp://www.grymoire.com/Unix/Sed.html