PERL Regular Expression

30
Unit 6: Regular Expression VSRivera IBM Learning Services Worldwide Certified Manual

Transcript of PERL Regular Expression

Unit 6:Regular Expression

VSRiveraIBM Learning ServicesWorldwide Certified Manual

Objectives

After completing this unit, you should understand the concepts of regular expressions and be able to use: Single-character patterns Grouping patterns Anchoring patterns Pattern precedence Matching and substitution operators

Regular Expression

Used to describe a string of characters

Perl’s match operator uses regular expression to search for matching text

The substitute operator uses regular expressions to select the text to be replaced

Regular Expression

A regular expression is made up of patterns of ordinary characters and special metacharacters

Regular expression are usually shown in slashes: /pattern/

Perl’s regular expressions are similar to those of other UNIX programs

Types of Patterns

Single character patterns A single alphanumeric character matches

itself A single dot (.) matches any single

character except a newline A list of characters in [ ] matches any

single character contained – this is called a character class

Types of Patterns

Multiple character patterns Anchoring patterns – match position

rather than characters.

Types of Patterns To match metacharacter

Example: . [ ]protect them with a backslash \. \[ \]

The simplest pattern is an alphanumeric character which matches itself. /a/ matches the first “a” in a string. /S/ matches the first “S” etc.

Character Class Examples /[AEIOU]/ an uppercase English vowel /[2468]/ a single, non-zero, even digit /[0-5]/ define range using hyphen /[789-]/ hyphen first or last to match

a hyphen /[a-z]/ lowercase alphabetic /[^0-9]/ initial caret specifies a

negated character class /[^a-z]/ NOT a lowercase alphabetic

English letter /[A-Za-z0-9]/ an alphanumeric – multiple

ranges are OK

Character Class

A character class matches any single character in the brackets

Ascending ranges are allowed. To match a hyphen, make it the first or lass character in the brackets so that it cannot be mistaken for range, or escape it with a backslash.

Character Class To match a class to includes a right

bracket, put the close bracket immediately after the open bracket: [][a-z] matches [ or ] or a letter. Or escape them with a backslash.

If the first character of a class is a ^, the whole class is negated. – it matches a single character not in the class.

Multiple ranges are also possible

Character Class Shortcuts Perl provides a set of shortcuts for

character classes Syntax:

\d digits: [0-9] \D non-digits [^0-9] \w word characters: [A-Za-z0-9_] \W word characters: [^A-Za-z0-9_] \s whitespace: [\t\n\r\f] \S non-whitespace: [^\t\n\r\f]

Character Class Shortcuts

Shortcuts are “locale aware” if locales are available and configured

Example: If (/ \d/ ) { print “found a digit!”; }

Character Class Shortcuts

#!/usr/bin/perl -wprint "\nEnter a value\n";$_ =<> ; if (/\d/) { print "Digit\n"; } else { print "not a digit!\n"; }

Character Class Shortcuts

#!/usr/bin/perl -wprint "\nEnter a value\n";$_ =<> ; if (/[\d_.]/) { print "Digit, or underscore, or real\n"; } else { print "Not a digit, or underscore, or real!\

n"; }

#!/usr/bin/perl -wprint “\nEnter a value\n";$_ =<> ; if (/[A-Za-z0-9_]/) { print "Valid Characters\n"; } else { print "Not a Valid Character!\n"; }

Multiple Character Patterns We Usually need to match more than

one character There are four ways of grouping

patterns to match multiple characters Sequence Alternation Multipliers Parentheses as memory and for

precedence

Sequence of Patterns A sequence of patterns, matches the sequence

of characters matched by the patterns Example:If ( /112/ ) { print “Emergency number!”; }

Match “1” followed by “1” followed by “2”If( /XyzzY/ ) { print ”Knows the magic word!”; }

Match “X” then “y” then “z” then “z” then “Y”If ( /2\.4\.d/ ) { print “Reasonably current”; }

Matches 2.4.any-single-digit

Sequence of Patterns

#!/usr/bin/perl -wprint "Enter a value\n";$_ =<> ; if (/911/) { print "Emergency Number\n"; } else { print "Not an Emergency Number\n"; }

Sequence of Patterns

The simplest grouping pattern is a sequence of patterns

The target string must match all the patterns, in the same order for the match to succeed.

Sequence of Patterns

#!/usr/bin/perl -wprint "Enter Linux Version Number: ";$_ =<> ; if (/Linux Version 2\.4\.\d/) { print "Valid Linux Version\n"; } else { print "Unknown Linux Version\n"; }

Sequence of Patterns

#!/usr/bin/perl -wprint "Enter a word: ";$_ =<> ; if (/abc$/)

{print "abc at the end of the string"; } elsif (/^abc/)

{print "abc at the beginning of the string\n"; }

else { print "No abc\n"; }

Sequence of Patterns

#!/usr/bin/perl -wprint "Enter a word: ";$_ =<> ; if (/abc$/i)

{print "abc at the end of the string"; } elsif (/^abc/i)

{print "abc at the beginning of the string\n"; }

else { print "No abc\n"; }

Matching Alternatives

Alternation Enables multiple patterns to be tested

Separate the alternatives with a | Syntax:

Pattern1|pattern2

Matching Alternatives

#!/usr/bin/perl -wprint "Enter a number/letter: ";$_ =<> ; if (/911|[A-Z]|\d./) { print "Valid Number/Letter\n"; } else { print "Not a Valid Number/Letter\n"; }

Pattern Multipliers

A Multiplier is a special character applied to the immediately preceding pattern

A pattern an its multiplier can match variable numbers of character.

Multipliers, or quantifiers, enables you to match variable length patterns

Pattern Multipliers

Syntax: ? zero or one * zero or more + one or more {m,n} from m to n occurences {m,} m or more {,n} at most n {i} exactly i occurences

Pattern Multipliers

Example: /ca?t/# ct cat /ye*s/ # ys yes yees yeees…. /wh+y/ /wh{5}0/ /ca{3,}r /co{1,4}w/

Pattern Multipliers /A*/ /y.{5}wdg.{2,3}s/ # the dot matches

any character except new line

Midterm Quiz 1

Get ¼ sheet of yellow pad