Programming in Perl regular expressions and m,s operators
description
Transcript of Programming in Perl regular expressions and m,s operators
Programming in Perlregular expressions and m,s
operators
Peter VerhásJanuary 2002.
Pattern Matching Operator
expression =~ m/regexp/options;
$a = "apple";
print "yes!" if $a =~ m/pp/;
The result is TRUE (1) or FALSE (0).
M operator options
• g global search• i case insensitive search• m multi-line string• s single line string• o evaluate once only• x extended regular expression
Now let’s see what Regular expression is and then we will return to m operator fine points.
Regular Expressions
• A regular expression is a string with joker characters and joker expressions.
• We will look at examples to explain it.
Regular Expression to Verify Email (1)
@mail = ( '[email protected]', 'hab.akukk%mikkamakka@jeno', );
for( @mail ){ if( /^.*\@\w+\..+$/ ){ print "$_ seems to be a good eMail\n"; }else{ print "$_ bad address\n"; } }OUTPUT:[email protected] seems to be a good eMailhab.akukk%mikkamakka@jeno bad address
NOTES:$_ is used as defaultm/ is default when / is used$_ =~ m/^.*@\w+\..+$/
@ would also work instead of \@ but \@ is safe
Regular Expression to Verify Email (2)
/^.*\@\w+\..+$/• ^ at the start of the string• .* zero or more any-character
– * means zero or more of what stands before
• \@ a single @ character• \w+ one or more alpha character
– + means one or more of what stands before
• \. one . (dot) character– special regexp character is escaped with \
• .+ one or more any character• $ until end of string
Search and Replace Example of Regular Expressions
$text = 'JavaScript is not used on island Java.';
$text =~ s/Java(?!Script)/Borneo/;
print $text;
OUTPUT:JavaScript is not used on island Borneo.
NOTES:Operator s will be dicussed later in detail(?! ) is zero length forward look, detailed later
Meta (joker) Character
• . any character but new line• ^ start of string• $ end of string• \ escaping the next character• \w any alpha character• \W any non-alpha character• \s any white space• \S any non-white space
Only examples, there are
other meta characters, see the Perl
manual.
Parentheses (1)
$text = 'Hook is not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/;print "$1 $2 $3 $4 $5 $6\n";#$text = 'Hook i not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((l|s)(a|l))/;print "$1 $2 $3 $4 $5 $6\n";
OUTPUT:Hook ok is la l aHook ok i sl s l NOTES:
Numbering is in the order of the opening parentheses
Parentheses without $n
$text = 'Hook is not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/;print "$1 $2 $3 $4 $5 .$6.\n";$text = 'Hook i not used on island Java.';$text =~ /(Ho(ok))\s(is?).*\3((?:l|s)(a|l))/;print "$1 $2 $3 $4 $5 .$6.\n";
OUTPUT:Hook ok is la a ..Hook ok i sl l .. NOTES:
(?: ) groups sub-expression without creating reference
$6 is zero string
Character classes
• List of characters between [ and ]• Interval, e.g. [a-f]• Negative character set [^a-f]
Repetitions
• * zero or more times• + one or more times• ? zero or one time• {n} exactly n times• {n,} at least n times• {n,m} at least n times, at most m
times
NOTES:There is {n,} but there is
not {,m}
Why? (hint: {0,m} works, but {n,???}??)
Greedy repetition
• Repetitions are greedy, eat as many characters as possible
$text = 'Hook is not used on island Java.';$text =~ /(.*)is/; #1print "$1.\n";$text =~ /(.*?)is/; #2print "$1.\n";$text =~ /(.*?)is.*n/; #3print "$1.\n";
OUTPUT:Hook is not used on .Hook .Hook .
Other extensions
• Other UNIX tools also use simpler, similar regular expressions
• Perl regular expressions are more powerful
List of some extensions on the next slides
Regular expression comment
(?# comment comes here)
• Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments! Use comments!
Regular Expression Parentheses
• (?: sub expression w/o $n)
(?: we have discussed it already beforehand as it came up in an example, but this is the proper
place to discuss this construct.)
Positive look forward
(?= subregexp)
$t = 'jamaica rum rum kingston rum';
$t =~ s/([aeoui])(?=\w)/uc($1)/ge;
print $t;
• OUTPUT:jAmAIca rUm rUm kIngstOn rUm
Example:Uppercase all vowels standing inside a word
to upper case.
Negative look forward
(?! subregexp)
$t = 'jamaica rum rum kingston rum';
$t =~ s/([aeoui])(?!\w)/uc($1)/ge;
print $t;
• OUTPUT:jamaicA rum rum kingston rum
Example:Uppercase all vowels standing end of a word
to upper case.
Option change inside the regular expression
(? imsx)• This can be used inside m/ or s/
operator.• i and g options can not be used
Now we go back to operator m/ and discuss some details.
M operator array result
@k = "abbabaa" =~ m/(bb).+(a.)/;
print $#k; print ' ',$k[0],' ',$k[1],"\n";
OUTPUT:1 bb aa
NOTES:Parts of the expression are closed into ( )$1, $2 ... are the default variables where the
substrings are put
M operator option g (1)
@k = "abbabaa" =~ m/(b)(a)/g;
print $#k,' ',$k[0],' ',$k[1],' ',$k[2],' ',$k[3],"\n";
OUTPUT:3 b a b a
NOTES:$_ is used as defaultm/ is default when / is used@ would also work instead of \@
but it is safe
M operator option g (2)
$t = "abbabaa";
while( $t =~ m/(ab)(b|a)/g ){
print pos($t)," $1 $2\n";
}
OUTPUT:3 ab b
6 ab a
M operator option i
• Case insensitive matchprint '.',"apple" =~ /AppLe/,".\n";
print '.',"apple" =~ /AppLe/i,".\n";
• prints..
.1.
M operator options m and s
$t = "mah\na\nb";while( $t =~ /(.?.)$/mg ){ print '.',$1; }print ".\n";while( $t =~ /(.?.)$/sg ){ print '.',$1; }print ".\n";while( $t =~ /(.?.)$/g ){ print '.',$1; }print ".\n";• OUTPUT:.ah.a.b..b..b.
m matches $ to all \n in the strings matches . to \n (otherwise . is any character but \n)
M operator option o
• Evaluate the regular expression only once to save processor
$t = "al brab";$a = 'al'; $b = 'rab';&q;&p;$b = 'fe';&q;&p;sub q { print ' q',$t =~ /$a\sb$b/o }sub p { print ' p',$t =~ /$a\sb$b/ }
• prints
q1 p1 q1 p
M operator option x
@k = "abbabaa" =~ m/(bb) #two or more 'b' gets into $1
.+ #one or more any-character
(a.) #a letter 'a' and exactly one any-character
/x; #space and comment allowed
print $#k;
print ' ',$k[0],' ',$k[1],"\n";
OUTPUT:1 bb aa
This option allows space (\ is space) and comments to ease readability.
Operator s
$text =~ s/regexp/replace/egimosx• Options:
– e replace is interpreted as expression– g global search and replace– i case insensitive search– m string is treated as multi-line – o regular expression is evaluated only once– s string is treated as single-line– x extended syntax for the regexp
Global Search and Replace
$t = "abbab" ;
$t =~ s/ab/aa/g;
print $t;OUTPUT:
aabaa replaces all occurrences of the search regular expression to the
replacement string
m and s operators with different delimiters
• / is the default, but you can use• ' to have non-interpolated string• Other non alphanumeric
characters• () {} [] with matching character
pairs– In this case s{search}{replace}
m and s operators with different delimiters example
$text = 'a@bba@bbabb';@b = ('bba');$text =~ s{@b}{q}g;print "$text\n";$text = 'a@bba@bbabb';$text =~ s'@b'q'g;print "$text\n";OUTPUT:a@q@qbbaqbaqbabb
@b is evaluated in the first search but not in the second
Thank you for your kind attention.