Review of Awk Principles

41
Review of Awk Principles Awk’s purpose: to give Unix a general purpose programming language that handles text (strings) as easily as numbers This makes Awk one of the most powerful of the Unix utilities Awk process fields while ed/sed process lines nawk (new awk) is the new standard for Awk Designed to facilitate large awk programs Awk gets it’s input from files redirection and pipes directly from standard input

description

Review of Awk Principles. Awk’s purpose: to give Unix a general purpose programming language that handles text (strings) as easily as numbers This makes Awk one of the most powerful of the Unix utilities Awk process fields while ed/sed process lines nawk (new awk) is the new standard for Awk - PowerPoint PPT Presentation

Transcript of Review of Awk Principles

Page 1: Review of Awk Principles

Review of Awk Principles

Awk’s purpose: to give Unix a general purpose programming language that handles text (strings) as easily as numbersThis makes Awk one of the most powerful of the Unix

utilities Awk process fields while ed/sed process lines nawk (new awk) is the new standard for Awk

Designed to facilitate large awk programs Awk gets it’s input from

files redirection and pipes directly from standard input

Page 2: Review of Awk Principles

History

Originally designed/implemented in 1977 by Al Aho, Peter Weinberger, and Brian Kernigan In part as an experiment to see how grep and sed could

be generalized to deal with numbers as well as textOriginally intended for very short programsBut people started using it and the programs kept

getting bigger and bigger!

In 1985, new awk, or nawk, was written to add enhancements to facilitate larger program developmentMajor new feature is user defined functions

Page 3: Review of Awk Principles

Other enhancements in nawk include:Dynamic regular expressions

Text substitution and pattern matching functionsAdditional built-in functions and variablesNew operators and statements Input from more than one fileAccess to command line arguments

nawk also improved error messages which makes debugging considerably easier under nawk than awk

On most systems, nawk has replaced awkOn ours, both exist

Page 4: Review of Awk Principles

Running an AWK Program

There are several ways to run an Awk programawk ‘program’ input_file(s)

program and input files are provided as command-line arguments

awk ‘program’program is a command-line argument; input is taken

from standard input (yes, awk is a filter!)awk -f program_file_name input_files

program is read from a file

Page 5: Review of Awk Principles

Awk as a Filter

Since Awk is a filter, you can also use pipes with other filters to massage its output even further

Suppose you want to print the data for each employee along with their pay and have it sorted in order of increasing pay

awk ‘{ printf(“%6.2f %s\n”, $2 * $3, $0) }’ emp.data | sort

Page 6: Review of Awk Principles

Errors If you make an error, Awk will provide a diagnostic

error messageawk '$3 == 0 [ print $1 }' emp.dataawk: syntax error near line 1awk: bailing out near line 1

Or if you are using nawknawk '$3 == 0 [ print $1 }' emp.datanawk: syntax error at source line 1 context is $3 == 0 >>> [ <<< 1 extra } 1 extra [nawk: bailing out at source line 1 1 extra } 1 extra [

Page 7: Review of Awk Principles

BEGIN{action}

pattern {action}

pattern {action}

.

.

.

pattern { action}

END {action}

Structure of an AWK Program

An Awk program consists of:An optional BEGIN segment

For processing to execute prior to reading input

pattern - action pairsProcessing for input dataFor each pattern matched,

the corresponding action is taken

An optional END segmentProcessing after end of input

data

Page 8: Review of Awk Principles

BEGIN and END

Special pattern BEGIN matches before the first input line is read; END matches after the last input line has been read

This allows for initial and wrap-up processingBEGIN { print “NAME RATE HOURS”; print “” }

{ print }

END { print “total number of employees is”, NR }

Page 9: Review of Awk Principles

Pattern-Action Pairs

Both are optional, but one or the other is requiredDefault pattern is match every recordDefault action is print record

PatternsBEGIN and ENDexpressions

$3 < 100$4 == “Asia”

string-matching/regex/ - /^.*$/string - abc

– matches the first occurrence of regex or string in the record

Page 10: Review of Awk Principles

compound$3 < 100 && $4 == “Asia”

– && is a logical AND– || is a logical OR

rangeNR == 10, NR == 20

– matches records 10 through 20 inclusive

Patterns can take any of these forms and for /regex/ and string patterns will match the first instance in the record

Page 11: Review of Awk Principles

Selection

Awk patterns are good for selecting specific lines from the input for further processing

Selection by Comparison$2 >=5 { print }

Selection by Computation$2 * $3 > 50 { printf(“%6.2f for %s\n”, $2 * $3, $1) }

Selection by Text Content$1 == “Susie” /Susie/

Combinations of Patterns$2 >= 4 || $3 >= 20

Page 12: Review of Awk Principles

Data Validation

Validating data is a common operation Awk is excellent at data validation

NF != 3 { print $0, “number of fields not equal to 3” }$2 < 3.35 { print $0, “rate is below minimum wage” }$2 > 10 { print $0, “rate exceeds $10 per hour” }$3 < 0 { print $0, “negative hours worked” }$3 > 60 { print $0, “too many hours worked” }

Page 13: Review of Awk Principles

Regular Expressions in Awk

Awk uses the same regular expressions we’ve been using^ $ - beginning of/end of field . - any character [abcd] - character class [^abcd] - negated character class [a-z] - range of characters (regex1|regex2) - alternation * - zero or more occurrences of preceding expression+ - one or more occurrences of preceding expression? - zero or one occurrence of preceding expressionNOTE: the min max {m, n} or variations {m}, {m,} syntax

is NOT supported

Page 14: Review of Awk Principles

Awk Variables

$0, $1, $2, … ,$NF NR - Number of records read FNR - Number of records read from current file NF - Number of fields in current record FILENAME - name of current input file FS - Field separator, space or TAB by default OFS - Output field separator, space by default ARGC/ARGV - Argument Count, Argument Value

arrayUsed to get arguments from the command line

Page 15: Review of Awk Principles

Arrays

Awk provides arrays for storing groups of related data values# reverse - print input in reverse order by line

{ line[NR] = $0 } # remember each line

END { i = NR # print lines in reverse order

while (i > 0) {

print line[i]

i = i - 1

}

}

Page 16: Review of Awk Principles

Operators

= assignment operator; sets a variable equal to a value or string

== equality operator; returns TRUE is both sides are equal

!= inverse equality operator && logical AND || logical OR ! logical NOT <, >, <=, >= relational operators +, -, /, *, %, ^ String concatenation

Page 17: Review of Awk Principles

Control Flow Statements

Awk provides several control flow statements for making decisions and writing loops

If-Else if (expression is true or non-zero){

statement1

}

else {

statement2

}

where statement1 and/or statement2 can be multiple statements enclosed in curly braces { }s

the else and associated statement2 are optional

Page 18: Review of Awk Principles

Loop Control

Whilewhile (expression is true or non-zero) {

statement1

}

Page 19: Review of Awk Principles

Forfor(expression1; expression2; expression3) {

statement1

}This has the same effect as:

expression1

while (expression2) {

statement1

expression3

} for(;;) is an infinite loop

Page 20: Review of Awk Principles

Do Whiledo {

statement1

}

while (expression)

Page 21: Review of Awk Principles

Computing with AWK

Counting is easy to do with Awk$3 > 15 { emp = emp + 1}

END { print emp, “employees worked more than 15 hrs”}

Computing Sums and Averages is also simple { pay = pay + $2 * $3 }

END { print NR, “employees”

print “total pay is”, pay

print “average pay is”, pay/NR

}

Page 22: Review of Awk Principles

Handling Text

One major advantage of Awk is its ability to handle strings as easily as many languages handle numbers

Awk variables can hold strings of characters as well as numbers, and Awk conveniently translates back and forth as needed

This program finds the employee who is paid the most per hour

$2 > maxrate { maxrate = $2; maxemp = $1 }

END { print “highest hourly rate:”, maxrate, “for”, maxemp }

Page 23: Review of Awk Principles

String ConcatenationNew strings can be created by combining old ones

{ names = names $1 “ “ }

END { print names }

Printing the Last Input LineAlthough NR retains its value after the last input line

has been read, $0 does not

{ last = $0 }

END { print last }

Page 24: Review of Awk Principles

Command Line Arguments

Accessed via built-ins ARGC and ARGV ARGC is set to the number of command line

arguments ARGV[ ] contains each of the arguments

For the command lineawk ‘script’ filename

ARGC == 2ARGV[0] == “awk”ARGV[1] == “filenamethe script is not considered an argument

Page 25: Review of Awk Principles

ARGC and ARGV can be used like any other variable

They can be assigned, compared, used in expressions, printed

They are commonly used for verifying that the correct number of arguments were provided

Page 26: Review of Awk Principles

ARGC/ARGV in Action

#argv.awk – get a cmd line argument and display

BEGIN {if(ARGC != 2)

{print "Not enough arguments!"}

else

{print "Good evening,", ARGV[1]}

}

Page 27: Review of Awk Principles

BEGIN {if(ARGC != 3)

{print "Not enough arguments!"

print "Usage is awk -f script in_file field_separator"

exit}

else

{FS=ARGV[2]

delete ARGV[2]}

}

$1 ~ /..3/ {print $1 "'s name in real life is", $5; ++nr}

END {print; print "There are", nr, "students registered in your class."}

Page 28: Review of Awk Principles

getline

How do you get input into your awk script other than on the command line?

The getline function provides input capabilities getline is used to read input from either the

current input or from a file or pipe getline returns 1 if a record was present, 0 if an

end-of-file was encountered, and –1 if some error occurred

Page 29: Review of Awk Principles

getline Function

Expression Sets

getline $0, NF, NR, FNR

getline var var, NR, FNR

getline <"file" $0, NF

getline var <"file" var

"cmd" | getline $0, NF

"cmd" | getline var var

Page 30: Review of Awk Principles

getline from stdin

#getline.awk - demonstrate the getline function

BEGIN {print "What is your first name and major? "

while (getline > 0)

print "Hi", $1 ", your major is", $2 "."

}

Page 31: Review of Awk Principles

getline From a File

#getline1.awk - demo getline with a file

BEGIN {while (getline <"emp.data" >0)

print $0}

Page 32: Review of Awk Principles

getline From a Pipe

#getline2.awk - show using getline with a pipe

BEGIN {{while ("who" | getline)

nr++}

print "There are", nr, "people logged on clyde right now."}

Page 33: Review of Awk Principles

Simple Output From AWK

Printing Every Line If an action has no pattern, the action is performed for

all input lines{ print } will print all input lines on stdout{ print $0 } will do the same thing

Printing Certain FieldsMultiple items can be printed on the same output line

with a single print statement { print $1, $3 }Expressions separated by a comma are, by default,

separated by a single space when output

Page 34: Review of Awk Principles

NF, the Number of FieldsAny valid expression can be used after a $ to indicate a

particular fieldOne built-in expression is NF, or Number of Fields { print NF, $1, $NF } will print the number of fields, the

first field, and the last field in the current record

Computing and PrintingYou can also do computations on the field values and

include the results in your output { print $1, $2 * $3 }

Page 35: Review of Awk Principles

Printing Line NumbersThe built-in variable NR can be used to print line

numbers { print NR, $0 } will print each line prefixed with its line

number

Putting Text in the OutputYou can also add other text to the output besides what

is in the current record { print “total pay for”, $1, “is”, $2 * $3 }Note that the inserted text needs to be surrounded by

double quotes

Page 36: Review of Awk Principles

Formatted Output

printf provides formatted output Syntax is printf(“format string”, var1, var2, ….) Format specifiers

%c – single character %d - number %f - floating point number %s - string \n - NEWLINE \t - TAB

Format modifiers - left justify in column n column width .n number of decimal places to print

Page 37: Review of Awk Principles

printf Examples

printf(“I have %d %s\n”, how_many, animal_type) format a number (%d) followed by a string (%s)

printf(“%-10s has $%6.2f in their account\n”, name, amount) prints a left justified string in a 10 character wide field and a float

with 2 decimal places in a six character wide field printf(“%10s %-4.2f %-6d\n”, name, interest_rate,

account_number > "account_rates") prints a right justified string in a 10 character wide field, a left

justified float with 2 decimal places in a 4 digit wide field and a left justified decimal number in a 6 digit wide field to a file

printf(“\t%d\t%d\t%6.2f\t%s\n”, id_no, age, balance, name >> "account") appends a TAB separated number, number, 6.2 float and a string

to a file

Page 38: Review of Awk Principles

Built-In Functions

Arithmeticsin, cos, atan, exp, int, log, rand, sqrt

String length, substitution, find substrings, split strings

Outputprint, printf, print and printf to file

Specialsystem - executes a Unix command

system(“clear”) to clear the screenNote double quotes around the Unix command

exit - stop reading input and go immediately to the END pattern-action pair if it exists, otherwise exit the script

Page 39: Review of Awk Principles

Built-In Arithmetic Functions

Function Return Value

atan2(y,x) arctangent of y/x (-to

cos(x) cosine of x, with x in radians

sin(x) sine of x, with x in radians

exp(x) exponential of x, ex

int(x) integer part of x

log(x) natural (base e) logarithm of x

rand() random number between 0 and 1

srand(x) new seed for rand()

sqrt(x) square root of x

Page 40: Review of Awk Principles

Built-In String Functions

Function Description

gsub(r, s) substitute s for r globally in $0, return number of substitutions made

gsub(r, s, t) substitute s for r globally in string t, return number of substitutions made

index(s, t) return first position of string t in s, or 0 if t is not present

length(s) return number of characters in s

match(s, r) test whether s contains a substring matched by r, return index or 0

sprint(fmt, expr-list) return expr-list formatted according to format string fmt

Page 41: Review of Awk Principles

Built-In String Functions

Function Description

split(s, a) split s into array a on FS, return number of fields

split(s, a, fs) split s into array a on field separator fs, return number of fields

sub(r, s) substitute s for the leftmost longest substring of $0 matched by r

sub(r, s, t) substitute s for the leftmost longest substring of t matched by r

substr(s, p) return suffix of s starting at position p

substr(s, p, n) return substring of s of length n starting at position p