Awk Introduction

Post on 18-Dec-2014

225 views 0 download

description

A quick introduction to awk command line tool.

Transcript of Awk Introduction

Colloquium - awkv1.0

A. Magee

April 4, 2010

1 / 19

Colloquium - awk, v1.0

A. Magee

Outline

1 IntroductionWhat does awk offer?When should I use awk?

2 Learning by exampleSample FilePolling a FieldDoing a Little Math

2 / 19

Colloquium - awk, v1.0

A. Magee

Outline

1 IntroductionWhat does awk offer?When should I use awk?

2 Learning by exampleSample FilePolling a FieldDoing a Little Math

2 / 19

Colloquium - awk, v1.0

A. Magee

Introduction What?

What does awk offer?

awk is a text processor that works well on database types of files.

It operates on a file or stream of characters where a newline characterterminates a line.

It works best on files with unique text item delimiters like whitespace,comma, colon, etc.

It can operate on specific lines that you describe.

It can make programatic text manipulation quick and painless.

3 / 19

Colloquium - awk, v1.0

A. Magee

Introduction What?

What does awk offer?

awk is a text processor that works well on database types of files.

It operates on a file or stream of characters where a newline characterterminates a line.

It works best on files with unique text item delimiters like whitespace,comma, colon, etc.

It can operate on specific lines that you describe.

It can make programatic text manipulation quick and painless.

3 / 19

Colloquium - awk, v1.0

A. Magee

Introduction What?

What does awk offer?

awk is a text processor that works well on database types of files.

It operates on a file or stream of characters where a newline characterterminates a line.

It works best on files with unique text item delimiters like whitespace,comma, colon, etc.

It can operate on specific lines that you describe.

It can make programatic text manipulation quick and painless.

3 / 19

Colloquium - awk, v1.0

A. Magee

Introduction When?

When should I use awk?

For parsing well structured data.

For editing a file at precisely defined places.

When you are too lazy (or smart) to open a WYSIWYG editor.

4 / 19

Colloquium - awk, v1.0

A. Magee

Introduction When?

When should I use awk?

For parsing well structured data.

For editing a file at precisely defined places.

When you are too lazy (or smart) to open a WYSIWYG editor.

4 / 19

Colloquium - awk, v1.0

A. Magee

Introduction When?

When should I use awk?

For parsing well structured data.

For editing a file at precisely defined places.

When you are too lazy (or smart) to open a WYSIWYG editor.

4 / 19

Colloquium - awk, v1.0

A. Magee

Examples Sample File

A sample file

Here’s a short file from an ls listing that we can play with, let’s call itsample.txt.

drwxr-xr-x 22 root root 4096 2010-02-15 12:59 .

drwxr-xr-x 22 root root 4096 2010-02-15 12:59 ..

drwxr-xr-x 2 root root 4096 2010-02-27 19:25 bin

drwxr-xr-x 3 root root 4096 2010-02-27 19:27 boot

lrwxrwxrwx 1 root root 11 2008-03-08 08:56 cdrom -> media/cdrom

drwxr-xr-x 14 root root 3200 2010-01-17 11:45 dev

drwxr-xr-x 85 root root 12288 2010-04-04 22:16 etc

lrwxrwxrwx 1 root root 22 2010-02-10 12:09 home -> /usr/bob

5 / 19

Colloquium - awk, v1.0

A. Magee

Examples Sample File

Another sample file

Here’s a short file from a database that we can play with, let’s call itsample2.txt.

psmith01 CLASS2B YEAR2 1 N ADVANCED STAFF 1 Y Y

smehta CLASS3G LOCAL 1 Y STANDARD PUPIL 2.1 N Y

mrsjohns SNHOJ UNRESTRICTED -1 Y ADVANCED STAFF 2 Y N

psmith02 CLASS4D UKSCHOOLS 0 N ADVANCED STAFF 10 Y Y

scohen CLASS3G LOCAL 2 Y STANDARD PUPIL 1 N N

swright CLASS1J YEAR1 1 N STANDARD PUPIL 1 N Y

amarkov CLASS4E UKSCHOOLS 3 Y STANDARD PUPIL 1 N N

6 / 19

Colloquium - awk, v1.0

A. Magee

Examples Polling

Example 1

> awk ’{print NF}’ sample.txt

8

8

8

8

10

8

8

10

Each line awk processes in called a record.

As with many commands we generally want to wrap our expressionwith quotes.

{...}: A command group.

NF: The number of fields in the record.

7 / 19

Colloquium - awk, v1.0

A. Magee

Examples Polling

Example 1

> awk ’{print NF}’ sample.txt

8

8

8

8

10

8

8

10

Each line awk processes in called a record.

As with many commands we generally want to wrap our expressionwith quotes.

{...}: A command group.

NF: The number of fields in the record.

7 / 19

Colloquium - awk, v1.0

A. Magee

Examples Polling

Example 1

> awk ’{print NF}’ sample.txt

8

8

8

8

10

8

8

10

Each line awk processes in called a record.

As with many commands we generally want to wrap our expressionwith quotes.

{...}: A command group.

NF: The number of fields in the record.

7 / 19

Colloquium - awk, v1.0

A. Magee

Examples Polling

Example 2

> awk ’/ˆl/ {print $NF}’ sample.txtmedia/cdrom/usr/bob

/.../: This matches any line containing the regex.In this case we match any line that starts with the letter l.

{...}: A command group.

$NF: The last field of the line.

This command prints all the destinations of the symbolic links fromthe listing.

What’s another way to get the same results?

8 / 19

Colloquium - awk, v1.0

A. Magee

Examples Polling

Example 2

> awk ’/ˆl/ {print $NF}’ sample.txtmedia/cdrom/usr/bob

/.../: This matches any line containing the regex.In this case we match any line that starts with the letter l.

{...}: A command group.

$NF: The last field of the line.

This command prints all the destinations of the symbolic links fromthe listing.

What’s another way to get the same results?

8 / 19

Colloquium - awk, v1.0

A. Magee

Examples Polling

Example 2

> awk ’/ˆl/ {print $NF}’ sample.txtmedia/cdrom/usr/bob

/.../: This matches any line containing the regex.In this case we match any line that starts with the letter l.

{...}: A command group.

$NF: The last field of the line.

This command prints all the destinations of the symbolic links fromthe listing.

What’s another way to get the same results?

8 / 19

Colloquium - awk, v1.0

A. Magee

Examples Polling

Example 3

> awk ’{print NR,$0}’ sample.txt

1 drwxr-xr-x 22 root root 4096 2010-02-15 12:59 .

2 drwxr-xr-x 22 root root 4096 2010-02-15 12:59 ..

3 drwxr-xr-x 2 root root 4096 2010-02-27 19:25 bin

4 drwxr-xr-x 3 root root 4096 2010-02-27 19:27 boot

5 lrwxrwxrwx 1 root root 11 2008-03-08 08:56 cdrom -> media/cdrom

6 drwxr-xr-x 14 root root 3200 2010-01-17 11:45 dev

7 drwxr-xr-x 85 root root 12288 2010-04-04 22:16 etc

8 lrwxrwxrwx 1 root root 22 2010-02-10 12:09 home -> /usr/bob

NR: The current record number.

$0: Special symbol representing every field.

This simply prints each line preceded by it’s record number.

9 / 19

Colloquium - awk, v1.0

A. Magee

Examples Polling

Example 4

> awk ’{print $NR}’ sample.txt

drwxr-xr-x

22

root

root

11

2010-01-17

22:16

home

What does this silly command do?

Could it be useful?

10 / 19

Colloquium - awk, v1.0

A. Magee

Examples Math

Example 5

> awk -F, ’BEGIN {prod = 1} {prod *= $NR} END{print prod}’ diag.dat24

The file diag.dat contains a square upper-diagonal matrix.

The determinate of such a matrix is simply the product of thediagonals.

prod must be initialized to 1, otherwise it is assumed to be 0.

Initializations are done in the BEGIN {...} command

The END keyword delimits which commands should be run after therecords are processed.

-F: Redefine a single character field delimiter.

11 / 19

Colloquium - awk, v1.0

A. Magee

Examples Math

Example 5

> awk -F, ’BEGIN {prod = 1} {prod *= $NR} END{print prod}’ diag.dat24

The file diag.dat contains a square upper-diagonal matrix.

The determinate of such a matrix is simply the product of thediagonals.

prod must be initialized to 1, otherwise it is assumed to be 0.

Initializations are done in the BEGIN {...} command

The END keyword delimits which commands should be run after therecords are processed.

-F: Redefine a single character field delimiter.

11 / 19

Colloquium - awk, v1.0

A. Magee

Examples Math

Example 5

> awk -F, ’BEGIN {prod = 1} {prod *= $NR} END{print prod}’ diag.dat24

The file diag.dat contains a square upper-diagonal matrix.

The determinate of such a matrix is simply the product of thediagonals.

prod must be initialized to 1, otherwise it is assumed to be 0.

Initializations are done in the BEGIN {...} command

The END keyword delimits which commands should be run after therecords are processed.

-F: Redefine a single character field delimiter.

11 / 19

Colloquium - awk, v1.0

A. Magee

Examples Math

Example 5

> awk -F, ’BEGIN {prod = 1} {prod *= $NR} END{print prod}’ diag.dat24

The file diag.dat contains a square upper-diagonal matrix.

The determinate of such a matrix is simply the product of thediagonals.

prod must be initialized to 1, otherwise it is assumed to be 0.

Initializations are done in the BEGIN {...} command

The END keyword delimits which commands should be run after therecords are processed.

-F: Redefine a single character field delimiter.

11 / 19

Colloquium - awk, v1.0

A. Magee

Examples Math

Example 5

> awk -F, ’BEGIN {prod = 1} {prod *= $NR} END{print prod}’ diag.dat24

The file diag.dat contains a square upper-diagonal matrix.

The determinate of such a matrix is simply the product of thediagonals.

prod must be initialized to 1, otherwise it is assumed to be 0.

Initializations are done in the BEGIN {...} command

The END keyword delimits which commands should be run after therecords are processed.

-F: Redefine a single character field delimiter.

11 / 19

Colloquium - awk, v1.0

A. Magee

Examples Math

Non-explicit Details

> awk ’{sum += $5; print $5} END {print "total: "sum}’ sample.txt31905

Variables do not need predefinition; undefined variables are null.

This c-like syntax sums the fifth column of each record.

Commands in a {...} are separated by semicolons (;).

General structure isBEGIN {...} pattern {...} pattern {...} ... END {...}Variables are not strongly typed. They may be a string or numberdepending on how you operate on it.

12 / 19

Colloquium - awk, v1.0

A. Magee

Examples Math

Example 6 & 7

> awk ’{sum += $8} END {print sum/NR}’ sample2.txt2.2625

This is not correct! (compute by hand to verify.)

Examine the file carefully to understand why.

> awk ’!/ˆ#/ {sum += $8; cnt++} END {print sum/cnt}’ sample2.txt2.58571

Here the problem has been resolved by keeping a count of linesmatched.

Notice that lines starting with a # have been excluded.

13 / 19

Colloquium - awk, v1.0

A. Magee

Examples Math

Example 6 & 7

> awk ’{sum += $8} END {print sum/NR}’ sample2.txt2.2625

This is not correct! (compute by hand to verify.)

Examine the file carefully to understand why.

> awk ’!/ˆ#/ {sum += $8; cnt++} END {print sum/cnt}’ sample2.txt2.58571

Here the problem has been resolved by keeping a count of linesmatched.

Notice that lines starting with a # have been excluded.

13 / 19

Colloquium - awk, v1.0

A. Magee

Examples Math

Example 8

Recall the sed addressing model x∼y.

> awk ’(1+NR)%3 == 0 {print $0}’ sample2.txtpsmith01 CLASS2B YEAR2 1 N ADVANCED STAFF 1 Y Ypsmith02 CLASS4D UKSCHOOLS 0 N ADVANCED STAFFE 10 Y Yamarkov CLASS4E UKSCHOOLS 3 Y STANDARD PUPIL 1 N N

NB: NR is zero indexed.

Here x is 1 and y is 3.

14 / 19

Colloquium - awk, v1.0

A. Magee

Appendix

3 AppendixTons of Control

15 / 19

Colloquium - awk, v1.0

A. Magee

Appendix Tons of Control

More Built-Ins

FILENAME - Input file name.

FS - The field separator.

RS - The record separator (default is newline).

OFS - Output field separator.

ORS - Output record separator.

OFMT - Output format for numbers.

16 / 19

Colloquium - awk, v1.0

A. Magee

Appendix Tons of Control

Math Functions

Relationals: <,≤, ! =, ==,≥, >

Operators: +,−, ∗, /,∧, %Also pre- and post- increment and decrement.++,−−

Assignment: =, + =,− =, ∗ =, / =, % =

Many other math operations: sqrt(), log(), exp(), int(), etc.

17 / 19

Colloquium - awk, v1.0

A. Magee

Appendix Tons of Control

String Functions

substr(string, begin, length)

split(string, array, separator)

index(string, substring)

18 / 19

Colloquium - awk, v1.0

A. Magee

Appendix Tons of Control

Control Structures

if ... else

while

for

19 / 19

Colloquium - awk, v1.0

A. Magee