Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
-
Upload
wesley-nash -
Category
Documents
-
view
215 -
download
1
Transcript of Time to talk about your class projects!. Shell Scripting Awk (lecture 2)
Time to talk about your class projects!
Shell ScriptingAwk (lecture 2)
Basic structure of AWK use
The essential organization of an AWK program follows the form:
pattern { action }
The pattern specifies when the action is performed.
Like most UNIX utilities, AWK is line oriented.
That is, the pattern specifies a test that is performed with each line read as input.
If the condition is true, then the action is taken.
The default pattern is something that matches every line.
This is the blank or null pattern.
Program syntaxBEGIN { } : the begin block contains all
modifications to built-in variables and anything you want done before awk procedures are implemented
{ }: list of procedures carried out on all lines
END { } : the end block contains all final calculations or print summaries
As you might expect, these two words specify actions to be taken before any lines are read, and after the last line is
read.
The AWK program:
BEGIN { print "START" }{ print }
END { print "STOP" }
adds one line before and one line after the input file.
Example:
#!/usr/bin/nawk -f
BEGIN {
FS=“:” #the –F of the command line becomes FS in a script
}
{ print $1}
END {
print “Finished working on this file”
}
%chmod 755 example.awk
%./example.awk /etc/passwd | tail
noaccess
nobody4
Finished working on this file
Input file:
Jimmy the Weasel
100 Pleasant Drive
San Francisco, CA 12345
Big Tony
200 Incognito Ave.
Suburbia, WA 67890
Cousin Vinnie
Vinnie's Auto Shop
300 City Alley
Sosueme, OR 76543
Awk script:
#!/usr/bin/awk –f
BEGIN {
FS="\n"
RS=""
ORS=""
}
{
x=1
while ( x<NF ) {
print $x "\t"
x++
}
print $NF "\n"
}
Looping Constructsawk loop syntax are very similar to C and perl
while: continues to loop as long as condition exited successfully
while ( x==y ) {
commands
}
do/whiledo the following set of commands, while
condition is true
do {
commands
} while ( x==y )
The difference between while and do/while is when the condition is tested. It is tested prior to running the commands for a while loop, but tested after the set of commands is run once in a do/while loop
for loopsone of the most common loop structures is the for
loop, which iterates over an array of objects
for ( x=1; x<=NF; x++) { #in awk, arrays start at 1
commands
}
* if you take anything away from this lecture, memorize the above for loop syntax
break and continuebreak: breaks out of a loop
continue: restarts at the beginning of the loop
x=1
while (1) {
if ( x == 4 ) {
x++
continue
}
print "iteration",x
if ( x > 20 ) {
break
}
x++
}
if/else/else ifif loops work much like they did in bash but
the syntax is a bit different (no then or fi)
if ( conditional1 ) {
commands
} else if ( conditional2 ) { #optional
commands
} else { #optional
commands
}
you can have an if loop without an else if or else, but you can’t have an else if or else without an if
Arraysarray indices start at 1 (in most computer
programming languages, except fortran and matlab, arrays start at 0)
mis-indexing arrays is one of the most common bugs in any code
arrays are commonly indexed by numbers, but in awk, they can be indexed by strings
to explicitly set an array element, use brackets to specify which index of the array you are setting
myarray[1]=“jim” #note, strings appear in quotes
myarray[2]=456
or
myarray[“name”]=“jim” #index strings appear in quotes too
to reference an array element, use brackets to specify what index you want
for ( x in myarray ) {
print myarray[x]
}
#x gets set to an index variable by use of the in function, but the access order of the index variables is random
to delete an array element, use the delete command
delete myarray[1]
to test if an element exists, use a if loop
for ( 1 in myarray ) {
print “It’s there”
} else {
print “It’s missing”
}
you can also set arrays using the split command
split(“string”,destination array,separator)
split returns the number of indices
numelements=split("Jan,Feb,Mar,Apr,May",mymonths,",")
so that numelements=5 and mymonths[1]=“Jan”
Formatted outputprintf : the formatted print function returns with
the standard C syntax
%s specifies strings
%d specifies integers
%f specifies floating point values
printf(“%s %s version %d\n”, “Hello”, “world”, 2)
Hello world version 2
you can control how many spaces are reserved for the formatted print (%) by adding numbers
%10s - 10 character string print%5d - reserves 5 spaces for the integer%10.2f - reserves 10 spaces for the float and prints only to the 100ths value 9.05
the default format is right justified. To make formatted text left justified, add a – after the %
%-10.2f becomes 9.05
sprintf sends formatted print to a string variable rather to stdout
n=sprintf ("%d plus %d is %d", a, b, a+b);
Sub-stringssubstr : allows you to cut specific characters from
strings.
this function also available in C and perl
substr(string,startcharacter,numberofcharacters)oldstring=“How are you?”
newstr=substr(oldstring,9,3)
What is newstr in this example?
Other string functionslength : returns the number of characters in a
stringlength(oldstring) returns 12
index : returns the start character of the one string in another
index(oldstring,”you”) returns 9
tolower/toupper : converts string to all lower or to all upper case
subroutines (aka functions)
• Format -- "function", then the name, and then the parameters separated by commas, inside parentheses.
• "{ }" code block contains the code that you'd like this function to execute.
function monthdigit(mymonth) {return (index(months,mymonth)+3)/4
}
nawk provides a "return" statement that allows the function to return a value.
function monthdigit(mymonth) {return (index(months,mymonth)+3)/4
}
This function converts a month name in a 3-letter string format into its numeric equivalent. For example, this:
print monthdigit("Mar")
....will print this: 3
What does this do?
index(months,mymonth)
Built-in string function index, returns the starting position of the occurrence of a substring (the second parameter) in another string (the first paramater), or it will return 0 if the string isn't found.
months="Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec”
000000000111111111122222222223333333333444444444 123456789012345678901234567890123456789012345678
print index(months,”Aug”)29
To get the number associated with the month (based on the string with the 12 months) add 3
to the index (29+3=32) and divide by 4 (32/4=8, Aug is 8th month).
The string months was designed so the calculation gave the month number.
Matching Regular Expressions
match : search for a regular expression, set the built-in variables RSTART to start character and RLENGTH to the matched string length
match returns the start character by default
start=match(oldstring,/you/) #note, regexp format
print start RSTART RLENGTH
9 9 3
String substitutionsub and gsub : serve as single search and replace
or global search and replace functions that work with regular expressions
sub(regexp,replacestring,oldstring)
sub(/o/,"O",oldstring) #this changes the given string
print oldstring
oldstring="How are you doing today?"
gsub(/o/,"O”,oldstring)
print oldstring
HOw are you doing today?
HOw are yOu dOing tOday?
Input file:23 Aug 2000 food - - Y Jimmy's Buffet30.2523 Aug 2000 - inco - Y Boss Man 2001.00
Note, there are tabs between the fields, which you can’t really see with this screen copy
Example Script
#!/usr/bin/awk -fBEGIN { #set global variables and built-in functions
FS="\t+"months="Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec"
}
function monthdigit(mymonth) { #set subroutines (aka functions)
return (index(months,mymonth)+3)/4}function doincome(mybalance) {
mybalance[curmonth,$3] += amountmybalance[0,$3] += amount
}function doexpense(mybalance) {
mybalance[curmonth,$2] -= amountmybalance[0,$2] -= amount
}function dotransfer(mybalance) {
mybalance[0,$2] -= amountmybalance[curmonth,$2] -= amountmybalance[0,$3] += amountmybalance[curmonth,$3] += amount
}
#main program{
curmonth=monthdigit(substr($1,4,3))amount=$7
#record all the categories encounteredif ( $2 != "-" )
globcat[$2]="yes"if ( $3 != "-" )
globcat[$3]="yes"
#tally up the transaction properlyif ( $2 == "-" ) {
if ( $3 == "-" ) {print "Error: inc and exp fields are both blank!"exit 1
} else {#this is incomedoincome(balance)if ( $5 == "Y" )
doincome(balance2)}
} else if ( $3 == "-" ) {#this is an expense doexpense(balance)if ( $5 == "Y" ) doexpense(balance2)
} else {#this is a transferdotransfer(balance)if ( $5 == "Y" )
dotransfer(balance2)}
}#end of main programEND {
bal=0bal2=0for (x in globcat) {
bal=bal+balance[0,x]bal2=bal2+balance2[0,x]
} printf("Your available funds: %10.2f\n", bal) printf("Your account balance: %10.2f\n", bal2)}
Input file:23 Aug 2000 food - - Y Jimmy's Buffet30.2523 Aug 2000 - inco - Y Boss Man 2001.00
Output to the screen: Your available funds: 1174.22Your account balance: 2399.33