Objectives
After completing this unit, you should understand the concepts of regular expressions and be able to use: Single-character patterns Grouping patterns Anchoring patterns Pattern precedence Matching and substitution operators
Regular Expression
Used to describe a string of characters
Perl’s match operator uses regular expression to search for matching text
The substitute operator uses regular expressions to select the text to be replaced
Regular Expression
A regular expression is made up of patterns of ordinary characters and special metacharacters
Regular expression are usually shown in slashes: /pattern/
Perl’s regular expressions are similar to those of other UNIX programs
Types of Patterns
Single character patterns A single alphanumeric character matches
itself A single dot (.) matches any single
character except a newline A list of characters in [ ] matches any
single character contained – this is called a character class
Types of Patterns
Multiple character patterns Anchoring patterns – match position
rather than characters.
Types of Patterns To match metacharacter
Example: . [ ]protect them with a backslash \. \[ \]
The simplest pattern is an alphanumeric character which matches itself. /a/ matches the first “a” in a string. /S/ matches the first “S” etc.
Character Class Examples /[AEIOU]/ an uppercase English vowel /[2468]/ a single, non-zero, even digit /[0-5]/ define range using hyphen /[789-]/ hyphen first or last to match
a hyphen /[a-z]/ lowercase alphabetic /[^0-9]/ initial caret specifies a
negated character class /[^a-z]/ NOT a lowercase alphabetic
English letter /[A-Za-z0-9]/ an alphanumeric – multiple
ranges are OK
Character Class
A character class matches any single character in the brackets
Ascending ranges are allowed. To match a hyphen, make it the first or lass character in the brackets so that it cannot be mistaken for range, or escape it with a backslash.
Character Class To match a class to includes a right
bracket, put the close bracket immediately after the open bracket: [][a-z] matches [ or ] or a letter. Or escape them with a backslash.
If the first character of a class is a ^, the whole class is negated. – it matches a single character not in the class.
Multiple ranges are also possible
Character Class Shortcuts Perl provides a set of shortcuts for
character classes Syntax:
\d digits: [0-9] \D non-digits [^0-9] \w word characters: [A-Za-z0-9_] \W word characters: [^A-Za-z0-9_] \s whitespace: [\t\n\r\f] \S non-whitespace: [^\t\n\r\f]
Character Class Shortcuts
Shortcuts are “locale aware” if locales are available and configured
Example: If (/ \d/ ) { print “found a digit!”; }
Character Class Shortcuts
#!/usr/bin/perl -wprint "\nEnter a value\n";$_ =<> ; if (/\d/) { print "Digit\n"; } else { print "not a digit!\n"; }
Character Class Shortcuts
#!/usr/bin/perl -wprint "\nEnter a value\n";$_ =<> ; if (/[\d_.]/) { print "Digit, or underscore, or real\n"; } else { print "Not a digit, or underscore, or real!\
n"; }
#!/usr/bin/perl -wprint “\nEnter a value\n";$_ =<> ; if (/[A-Za-z0-9_]/) { print "Valid Characters\n"; } else { print "Not a Valid Character!\n"; }
Multiple Character Patterns We Usually need to match more than
one character There are four ways of grouping
patterns to match multiple characters Sequence Alternation Multipliers Parentheses as memory and for
precedence
Sequence of Patterns A sequence of patterns, matches the sequence
of characters matched by the patterns Example:If ( /112/ ) { print “Emergency number!”; }
Match “1” followed by “1” followed by “2”If( /XyzzY/ ) { print ”Knows the magic word!”; }
Match “X” then “y” then “z” then “z” then “Y”If ( /2\.4\.d/ ) { print “Reasonably current”; }
Matches 2.4.any-single-digit
Sequence of Patterns
#!/usr/bin/perl -wprint "Enter a value\n";$_ =<> ; if (/911/) { print "Emergency Number\n"; } else { print "Not an Emergency Number\n"; }
Sequence of Patterns
The simplest grouping pattern is a sequence of patterns
The target string must match all the patterns, in the same order for the match to succeed.
Sequence of Patterns
#!/usr/bin/perl -wprint "Enter Linux Version Number: ";$_ =<> ; if (/Linux Version 2\.4\.\d/) { print "Valid Linux Version\n"; } else { print "Unknown Linux Version\n"; }
Sequence of Patterns
#!/usr/bin/perl -wprint "Enter a word: ";$_ =<> ; if (/abc$/)
{print "abc at the end of the string"; } elsif (/^abc/)
{print "abc at the beginning of the string\n"; }
else { print "No abc\n"; }
Sequence of Patterns
#!/usr/bin/perl -wprint "Enter a word: ";$_ =<> ; if (/abc$/i)
{print "abc at the end of the string"; } elsif (/^abc/i)
{print "abc at the beginning of the string\n"; }
else { print "No abc\n"; }
Matching Alternatives
Alternation Enables multiple patterns to be tested
Separate the alternatives with a | Syntax:
Pattern1|pattern2
Matching Alternatives
#!/usr/bin/perl -wprint "Enter a number/letter: ";$_ =<> ; if (/911|[A-Z]|\d./) { print "Valid Number/Letter\n"; } else { print "Not a Valid Number/Letter\n"; }
Pattern Multipliers
A Multiplier is a special character applied to the immediately preceding pattern
A pattern an its multiplier can match variable numbers of character.
Multipliers, or quantifiers, enables you to match variable length patterns
Pattern Multipliers
Syntax: ? zero or one * zero or more + one or more {m,n} from m to n occurences {m,} m or more {,n} at most n {i} exactly i occurences
Pattern Multipliers
Example: /ca?t/# ct cat /ye*s/ # ys yes yees yeees…. /wh+y/ /wh{5}0/ /ca{3,}r /co{1,4}w/
Top Related