An Introduction to Perl Sources and inspirations:...

58
An Introduction to Perl Sources and inspirations: http://www.cs.utk.edu/~plank/plank/classes/cs494/494/notes/ Perl/lecture.html Randal L. Schwartz and Tom Christiansen, “Learning Perl” 2nd ed., O’Reilly Randal L. Schwartz and Tom Phoenix, “Learning Perl” 3rd ed., O’Reilly Dr. Nathalie Japkowicz, Dr. Alan Williams Go O'Reilly ! CSI 3125, Perl, page 1

Transcript of An Introduction to Perl Sources and inspirations:...

Page 1: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

An Introduction to Perl

Sources and inspirations:

http://www.cs.utk.edu/~plank/plank/classes/cs494/494/notes/Perl/lecture.html

Randal L. Schwartz and Tom Christiansen,“Learning Perl” 2nd ed., O’Reilly

Randal L. Schwartz and Tom Phoenix,“Learning Perl” 3rd ed., O’Reilly

Dr. Nathalie Japkowicz, Dr. Alan Williams

Go O'Reilly! CSI 3125, Perl, page 1

Page 2: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 2

Perl overview (1)

• Perl = Practical extraction and report language

• Perl = Pathologically eclectic rubbish lister

• It is a powerful general-purpose language, which is particularly useful for writing “quick and dirty” programs.

• Invented by Larry Wall, with no apologies for its lack of elegance (!).

• If you know C and a fair bit of Unix (or Linux), you can learn Perl in days (well, some of it...).

Page 3: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 3

Perl overview (2)

• In the hierarchy of programming language, Perl is located half-way between high-level languages such as Pascal, C and C++, and shell scripts (languages that add control structure to the Unix command line instructions) such as sh, sed and awk.

• By the way:

– awk = Aho, Weinberger, Kernighan

– sed = Stream Editor.

Page 4: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 4

Advantages of Perl (1)

• Perl combines the best (according to its admirers ) features of:

Unix/Linux shell programming,

The commands sed, grep, awk and tr,

C,

Cobol.

• Shell scripts are usually written in many small files that refer to each other. Perl achieves the functionality of such scripts in a single program file.

Page 5: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 5

Advantages of Perl (2)

• Perl offers extremely strong regular expression capabilities, which allow fast, flexible and reliable string handling operations, especially pattern matching.

As a result, Perl works particularly well in text processing applications.

• As a matter of fact, it is Perl that allowed a lot of text documents to be quickly moved to the HTML format in the early 1990s, allowing the Web to expand so rapidly.

Page 6: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 6

Disadvantages of Perl

• Perl is a jumble! It contains many, many features from many languages and tools.

• It contains different constructs for the same functionality (for example, there are at least 5 ways to perform a one-line if statement).

It is not a very readable language.

• You cannot distribute a Perl program as an opaque binary. That is, you cannot really commercialize products you develop in Perl.

Page 7: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 7

Perl resources and versions

• http://www.perl.org tells you everything that you want to know about Perl.

• What you will see here is Perl 5.

• Perl 5.8.0 has been released in July 2002.

• Perl 6 (http://dev.perl.org/perl6/) is the next version, still under development, but moving along nicely. The first book on Perl 6 is in stores (http://www.oreilly.com/catalog/perl6es).

Page 8: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 8

Scalar data: strings and numbersScalars need not to be defined or their types declared:Perl understands from context.

% cat hellos.pl#!/usr/bin/perl -wprint "Hello" . " " . "world\n";print "hi there " . 2 . " worlds!" ."\n";print (("5" + 6) . " eggs\n" . " in " . " 3 + 2 = " . ("3" + "2") . " baskets\n" );

invoke Perl% hellos.pl

Hello worldhi there 2 worlds!11 eggs in 3 + 2 = 5 baskets

Page 9: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 9

Scalar variables

Scalar variable names start with a dollar sign. They do not have to be declared.

% cat scalar.pl#!/usr/bin/perl -w$i = 1;$j = "2";print "$i and $j \n";$k = $i + $j;print "$k\n";print $i . $j . "\n";print '$k\n' . "\n";

% scalar.pl

1 and 2312$k\n

Page 10: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 10

Quotes and substitution

Suppose $x = 3Single-quotes ' ' allow no substitution except for the escape sequences \\ and \'.

print('$x\n'); gives $x\n and no new line.Double-quotes " " allow substitution of variables like $x and control codes like \n (newline).

print("$x\n"); gives 3 (and a new line).Back-quotes ` ` also allow substitution, then try to execute the result as a system command, returning as the final value whatever the system command outputs.

$y = `date`; print($y); results in Sun Aug 10 07:04:17 EDT 2003

Page 11: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 11

Control statements: if, else, elsif

% names.plstan'stan' follows 'fred'

my input

cut newline

Perl's output

% cat names.pl#!/usr/bin/perl -w$name = <STDIN>;chomp($name);if ($name gt 'fred') { print "'$name' follows 'fred'\n";}elsif ($name eq 'fred') { print "both names are 'fred'\n";}else { print "'$name' precedes 'fred'\n";}

% names.plStan'Stan' precedes 'fred'

standard input

Page 12: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 12

Control statements: loops (1)

% oddsum_while.pl10Use of uninitialized value at oddnums.pl line 6, <STDIN> chunk 1.The total is 25.

my input

% cat oddsum_while.pl#!/usr/bin/perl -w# Add up some odd numbers$max = <STDIN>;$n = 1;while ($n < $max) { $sum += $n; $n += 2; } # On to the next odd numberprint "The total is $sum.\n";

a warningPerl's output

Page 13: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 13

Control statements: loops (2)

• End-line comments begin with #

• It is okay, though not nice, to use a variable without initialization (like $sum). Such a variable is initialized to 0 if it is first used as a number or to the empty string "" if it is first used as a string. In fact, it is always undef, variously converted.

• Perl can, if asked, issue a warning (use the -w flag).

• Of course, while is only one of many looping constructs in Perl. Read on...

Page 14: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 14

Control statements: loops (3)

% cat oddsum_until.pl#!/usr/bin/perl -w# Add up some odd numbers$max = <STDIN>;$n = 1;$sum = 0;until ($n >= $max) { $sum += $n; $n += 2; } # On to the next odd numberprint "The total is $sum.\n";% oddsum_until.pl10The total is 25.

Page 15: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 15

Control statements: loops (4)

% cat oddsum_for.pl#!/usr/bin/perl -w# Add up some odd numbers$max = <STDIN>;$sum = 0;for ($n = 1 ; $n < $max ; $n += 2) { $sum += $n; }print "The total is $sum.\n";% oddsum_for.pl10The total is 25.

We also have do-while and do-until, and we have foreach. Read on.

Page 16: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 16

Control statements: loops (5)

% cat oddsum_foreach.pl#!/usr/bin/perl -w# Add up some odd numbers$max = <STDIN>;$sum = 0;foreach $n ( (1 .. $max) ) { if ( $n % 2 != 0 ) { $sum += $n; }

}print "The total is $sum.\n";% oddsum_foreach.pl10The total is 25.

Page 17: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 17

Control constructs compared

C Perl (braces required)

the same if () { ... } if () { ... }

if (! ) { ... } unless () { ... }

different } else if () { ... } } elsif () { ... }

the same while () { ... } while () { ... }

the same for (aa;bb;cc) {...} for (aa;bb;cc) {...}

foreach $v (@array){... }

different break last

different continue next

similar 0 is FALSE 0, "0", and "" are FALSE

similar != 0 is TRUE anything not false is TRUE

Page 18: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 18

Lists and arrays

• A list is an ordered collection of scalars. An array is a variable that contains a list.

• Each element is an independent scalar value. A list can hold numbers, strings, undef values—any mixture of kinds of scalar values.

• To use an array element, prefix the array name with a $; place a subscript in square brackets.

• To access the whole array, prefix its name with a @.

• You can copy an array into another. You can use the operators sort, reverse, push, pop, split.

Page 19: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 19

Command-line arguments

Suppose that a Perl program stored in the file cleanUp is invoked in Unix/Linux with the command:

cleanUp -o result.htm data.htm

The built-in list named @ARGV then contains three elements:

('-o', 'result.htm', 'data.htm')

These three element can be accessed as:$ARGV[0]$ARGV[1]$ARGV[2]

Page 20: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 20

Array examples (1)% cat arraysort.pl#!/usr/bin/perl -w$i = 0;while ($k = <STDIN>) { $a[$i++] = $k; }print "===== sorted =====\n";print sort(@a);% arraysort.plNathalieFrankhelloJohnZebranotarynil

control-D here

===== sorted =====FrankJohnNathalieZebrahellonilnotary

Page 21: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 21

Array examples (2A)

% whole_rev.pla b c de fg h i== reversed ==g h ie fa b c d

Reversing a text file (whole lines).

% cat whole_rev.pl#!/usr/bin/perl -wwhile ($k = <STDIN>) { push(@a, $k); }print "== reversed ==\n";while ($oldval = pop(@a)) { print $oldval; }

control-D here

Page 22: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 22

Array examples (2B)

% each_rev.pl

a bc d efg

efg d bc a

hi j

j hi

klm nopq st

st nopq klm

Reversing each line in a text file

% cat each_rev.pl#!/usr/bin/perl -wwhile($k = <STDIN>) { @a = split(/\s+/, $k); $s = ""; for ($i = @a; $i > 0; $i--) { $s = "$s$a[$i-1] "; } chop($s); print "$s\n"}

outputcontrol-Dsplit cuts the line on white space

(we will see regular expressions soon)

Page 23: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 23

Array examples (3)

Reversing a text file (whole lines)

print reverse(<STDIN>);

Reversing each line in a text file

while($k = <STDIN>) { $s = ""; foreach $i (reverse(split(/\s+/, $k))) { $s = "$s$i "; } chop($s); print "$s\n";}

Page 24: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 24

A digression:Perl's favourite default variable

while(<STDIN>) { $s = ""; foreach $i (reverse(split(/\s+/, $_))) { $s = "$s$i "; } chop($s); print "$s\n";}

by default,Perl reads into $_

while(<STDIN>) { $s = ""; foreach $i (reverse(split(/\s+/ ))) { $s = "$s$i "; } chop($s); print "$s\n";}

by default,Perl splits

$_ too!

Page 25: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 25

Hashes

• A hash is similar to an array, but instead of subscripts, we can have anything as a key, and we use curly brackets rather than square brackets.

• The official name is associative array (known to be implemented by hashing ).

• Keys and values can be any scalars; keys are always converted to strings.

• To refer to a hash as a whole, prefix its name with a %.

• If you assign a hash to an array, it becomes a simple list.

Page 26: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 26

Hash examples I (1)

% cat hash_array.pl#!/usr/bin/perl -w%some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello", "wilma", 1.72e30, "betty", "bye\n");@an_array = %some_hash;print "@an_array\n========\n";foreach $key (keys %some_hash) { print "$key: "; print delete $some_hash{$key}; print "\n";}

Page 27: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 27

Hash examples I (2)

% hash_array.pl

betty bye

wilma 1.72e+30 foo 35 2.5 hello bar 12.4

========

betty: bye

wilma: 1.72e+30

foo: 35

2.5: hello

bar: 12.4

%some_hash = ("foo", 35, "bar", 12.4, 2.5, "hello", "wilma", 1.72e30, "betty", "bye\n");@an_array = %some_hash;print "@an_array\n========\n";foreach $key (keys %some_hash) { print "$key: "; print delete $some_hash{$key}; print "\n";}

Page 28: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 28

Hash examples II

% hash_arrows.pla => 1b => 2c => 3

% cat hash_arrows.pl#!/usr/bin/perl -wmy %hash = ( "a" => 1, "b" => 2, "c" => 3);foreach $key (sort keys %hash) { $value = $hash{$key}; print "$key => $value\n";}

Page 29: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 29

A brief interlude:the diamond operator

% cat aone-atwo-a% cat bthree-bfour-bfive-b% concat a bone-atwo-athree-bfour-bfive-b

% concat a b >c% cat cone-atwo-athree-bfour-bfive-b

<> loops over the files listed as command-line arguments;$_ is the current input line

% cat concat#!/usr/bin/perl -wwhile ( <> ) { print $_; }

Page 30: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 30

Hash examples III:character frequency count

% cat frequency.pl#!/usr/bin/perl -wwhile (<>) { # split $_ into single characters, loop foreach $c (split //) { # Increment $count of $c ++$count{$c};} }# end of input, print %countfor $c (sort keys %count) { print "$c\t$count{$c}\n";}

Page 31: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 31

Character frequency count (2)

% frequency.plNathalieFranhelloJohnratherNotaryF 1J 1

^D

8 21 2F 2J 2N 2a 5e 3h 4i 1l 3n 2o 3r 4t 3y 1

space\n

Page 32: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 32

Subroutines

• A subroutine is a user-defined function. The syntax is very simple; so is the semantics.

#!/usr/bin/perlsub max { if ( $x > $y ) { $x } else { $y }}$x = 10; $y = 11;print &max . "\n";

• There are no arguments; the script accesses two global variables. The subroutine call is marked with &. The value returned is that of the last expression evaluated.

Page 33: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 33

Subroutines (2)

A few housekeeping rules.• You can place your definitions anywhere in the file,

though it is recommended to have them at the beginning.• Perl always uses the latest definition in the file—any

preceding one is ignored.• Certain elements of the syntax are optional.

• The & might sometimes be omitted (but it is not a good idea).

• The return operator may precede a value to be returned (this can be useful):

if ( $x > $y ) { return $x }

else { return $y }

Page 34: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 34

Subroutines (3)

• Clearly, the use of global variables is much too limited. Subroutines take arguments, and work on them via a predefined list variable @_ or its elements $_[0], $_[1] and so on.

#!/usr/bin/perlsub max { if ( $_[0] > $_[1] ) { $_[0] } else { $_[1] }}print &max ( 12, 13 ) . "\n";

Page 35: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 35

Subroutines (4)

•$_[0], $_[1] are not fun to work with. We can rename them locally, using the my operator—it creates a sub's private variables. Here, we declare two such variables and right away initialize them.

#!/usr/bin/perlsub max { my ( $a, $b ) = @_; if ( $a > $b ) { $a } else { $b }}print &max ( 15, 14 ) . "\n";

Page 36: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 36

Subroutines (5)

• But: this is not a safe max calculation.

#!/usr/bin/perlsub max { my ( $a, $b ) = @_; if ( $a > $b ) { $a } else { $b }}print &max ( 16, 19, 23 ) . "\n";print &max ( 26 ) . "\n";

• This produces 19 (23 gets ignored) and 26 (the second value is undef, that is, 0).

Page 37: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 37

Subroutines (6)

• We could stop the subroutine if the number of arguments is wrong. The (generally very useful!) operator die does that for us.

#!/usr/bin/perlsub max { if ( @_ != 2 ) { die "max needs two arguments: @_\n"; } my ( $a, $b ) = @_; if ( $a > $b ) { $a } else { $b }}print &max ( 16, 19, 23 ) . "\n";

The script is stopped after printing this:max needs two arguments: 16 19 23

Page 38: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 38

Subroutines (7)

• We can have just a warning, if we use the operator warn instead.

#!/usr/bin/perlsub max { if ( @_ != 2 ) { warn "max needs two arguments: @_\n"; } my ( $a, $b ) = @_; if ( $a > $b ) { $a } else { $b }}print &max ( 16, 19, 23 ) . "\n";

The script prints this:max needs two arguments: 16 19 2319

Page 39: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 39

Subroutines (8)

• It is, by the way, not a bad idea to generalize max by allowing it to take any number of arguments.

#!/usr/bin/perlsub max { my ( $curr_max ) = shift @_; foreach ( @_ ) { if ( $_ > $curr_max ) { $curr_max = $_; } } $curr_max}print &max ( 15, 14 ) . "\n";print &max ( 16, 19, 23 ) . "\n";print &max ( 26 ) . "\n";

Page 40: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 40

Subroutines (9)

• This even works for empty lists.

#!/usr/bin/perlsub max { my ( $curr_max ) = shift @_; foreach ( @_ ) { if ( $_ > $curr_max ) { $curr_max = $_; } } $curr_max}$z = &max ( );if ( defined $z ) { print $z . "\n"; }else { print "undefined\n"; }

Page 41: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 41

Regular expressions (1)

• A regular expression (also called a pattern) is a template that describes a class of strings. A string can either match or not match the pattern.

• The simplest pattern is one character.• A character class—the pattern matches any of

these characters—is written in square brackets:[01234567] an octal digit[0-7] an octal digit[0-9A-F] a hex digit[^A-Za-z] not a letter (^ "negates")[0-9-] a decimal digit

or a minus

Page 42: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 42

Regular expressions (2)

• Metacharacters:. (dot) any character except \n

• Anchors:^ the beginning of a string$ the end of a string

• Multipliers:* repeat the preceding item 0 or more times+ repeat the preceding item 1 or more times? make the preceding item optional{n} repeat n times{n, m} repeat n to m times (n <= m){n,} repeat n or more times

Page 43: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 43

Regular expressions (3)

$x = "01239876AGH";

if ( $x =~ /^0[1-9]{4,}/ ){ print "yes1\n"; }

if ( $x =~ /[A-Z]{3}$/ ){ print "yes2\n"; }

if ( $x =~ /^.*[A-Z]{4}$/ ){ print "yes3\n"; }

• The Boolean operator =~ tries to match a string with a regular expression written inside slashes.

Page 44: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 44

Regular expressions (4)

$x = "01239876AGH";

if ( $x =~ /([0-9]{4}|[A-Z]{3}){2,}/ ){ print "yes4\n"; }

if ( $x =~ /(0?|4)(5|[1abc]{1,})/ ){ print "yes5\n"; }

• Patterns can be grouped by parentheses (the whole pattern becomes one item).

Alternative is denoted by the bar |.

Page 45: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 45

Regular expressions (5)

• The precedence of pattern elements:parentheses ( )multipliers * + ? {n} {n,m} {n,}sequence, anchors ^ $alternation |

• Some character classes are predefined:class not

classdigit \d \Dword char [a-zA-Z0-9_] \w \Wwhitespace \s \S

• Some additional anchors:word boundary \b \B

Page 46: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 46

Regular expression examples (1)

$i = "Jim";

match

$i =~ /Jim/; yes

$i =~ /J/; yes

$i =~ /j/; no

$i =~ /j/i; yes

$i =~ /\w/; yes

$i =~ /\W/; no

Case is ignored in matching if the postfix i is used.

Page 47: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 47

Regular expression examples (2)

$j = "JjJjJjJj";

$j =~ /j*/; yes: matches anything

$j =~ /j+/; yes: matches the first j

$j =~ /j?/; yes: matches the first j

$j =~ /j{2}/; no

$j =~ /j{2}/i; yes: ignores case

$j =~ /(Jj){3}/; yes

Page 48: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 48

Regular expression examples (3)

$k = "Boom Boom, out go the lights!";

$k =~ /Jim|Boom/; # yes: matches Boom

$k =~ /(Boom){2}/; # no: a space between Booms

$k =~ /(Boom ){2}/; # no: fails on the comma

$k =~ /(Boom\W){2}/; # yes: \W is space, comma

$k =~ /\bBoom\b/; # yes

$k =~ /\bBoom.*the\b/; # yes

$k =~ /\Bgo\B/; # no: "go" is a complete word

$k =~ /\Bgh\B/; # yes: the "gh" inside "lights"

Page 49: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 49

Regular expression substitution (1)

We can modify a string variable by applying a substitution.The operator is =~ and the substitution is written as:

s/pattern1/pattern2/

$v = "a string to play with";

$v =~ s/^\w+/just a single/;

print "$v\n";

just a single string to play with

Page 50: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 50

Regular expression substitution (2)

Matched patterns are remembered in built-in variables$1, $2, $3 etc. These variables keep their values till the next matching operation.Each set of paretheses in a pattern corresponds to a "memory" variable.

# $v == "just a single string to play with"

$v =~ s/(\b\w*\b)(.*)/'$1'$2/;

print "$v\n";

print "$2, $1 $1\n";

'just' a single string to play with

a single string to play with, just just

Page 51: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 51

Regular expression substitution (3)

A substitution can be applied to all occurrences of the pattern, that is, globally:

s/pattern1/pattern2/g

# $v == "'just' a single string to play with"

$v =~ s/\b\w*\b/word/g;

print "$v\n";

'word' word word word word word word

$v =~ s/\b\w*\b$/last/;

print "$v\n";

'word' word word word word word last

Page 52: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 52

Regular expression substitution (4)

$v = "This is a double double word.";

$v =~ s/(\b\w+\b) \1/\1/;

print "$v\n";

This is a double word.

$v = "This is a triple triple triple word.";

$v =~ s/(\b\w+\b) \1 \1/\1/;

print "$v\n";

This is a triple word.

Parentheses as memory can help construct powerful patterns with "instant repetition". We can use \1, \2 etc. for matched substrings.

Page 53: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 53

Regular expression substitution (5)

$Day = '0[1-9]|[12][0-9]|3[01]|[1-9]';$Month = '0[1-9]|1[012]|[1-9]';# Year number up to 31 must have a leading zero or two.$Year = '[0-9]{4}|[0-9]{3}|3[2-9]|[4-9][0-9]';while(<>){ # Find all dates, selecting and reinserting the context. # $1 and $6 match the context. Superfluous digits, # as 43 and 55 in 432001-01-2255, belong in the context. # "Dates" such as April 31 or February 30 are allowed. # There are no provisions for leap years. s/(\D*)(($Year)-($Month)-($Day))(\D|.*$)/$1<date>$2<\/date>$6/g; s/(\D*)(($Day)-($Month)-($Year))(\D|.*$)/$1<date>$2<\/date>$6/g; print $_;}

Here is a more realistic example (last year's homework).

You rather need explanations: in class, please.

Page 54: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 54

Regular expression substitution (6)

DATA

Both 12-09-2000 and 25-8-324 are good dates,

but 30-14-1955 and 10-10-10 are not. OTOH, 10-10-010 is.

RESULTS

Both <date>12-09-2000</date> and <date>25-8-324</date> are good dates,

but 30-14-1955 and 10-10-10 are not. OTOH, <date>10-10-010</date> is.

One example run, to show how it works.

Page 55: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 55

In another course

• Predefined variables (lots!)

• More on lists, arrays and hashes

• More on regular expressions

• File management

• Directory management

• Process management

• Perl database facilities

• CGI programming

• ... and more, and much more

Page 56: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 56

Adapted from Programming Perl, page 361.

1. Testing "all-at-once" instead of incrementally, either bottom-up or top-down.

2. Optimistically skipping print scaffolding to dump values and show progress.

3. Not running with the perl -w switch to catch obvious typographical errors.

4. Leaving off $ or @ or % from the front of a variable.

5. Forgetting the trailing semicolon.

6. Forgetting curly braces around a block.

Mistakes that novices make (1)Thanks to Alan Williams for this list

Page 57: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 57

7. Unbalanced (), {}, [], "", '', ``, and sometimes <>.

8. Confusing '' and "", or / and \.

9. Using == instead of eq, != instead of ne, = instead of ==, and so on.

• ('White' == 'Black') and ($x = 5) evaluate as (0 == 0) and (5) and thus are true!

10.Using "else if" instead of "elsif".

11.Putting a comma after the file handle in a print statement.

Mistakes that novices make (2)

Page 58: An Introduction to Perl Sources and inspirations: plank/plank/classes/cs494/494/notes/Perl/lecture.html Randal L. Schwartz and Tom.

CSI 3125, Perl, page 58

Mistakes that novices make (3)

12.Not chopping the output of backquotes `date` or not chopping input:• print "Enter y to proceed: ";• $ans = <STDIN>;• chop $ans;• if ($ans eq 'y') { print "You said y\

n";}• else { print "You did not say 'y'\n";}

13.Forgetting that Perl array subscripts and string indexes normally start at 0, not 1.

14.Using $_, $1, or other side-effect variables, then modifying the code in a way that unknowingly affects or is affected by these.

15.Forgetting that regular expressions are greedy, seeking the longest match not the shortest match.