Regular Expressions
-
Upload
satyanarayana-venkata -
Category
Technology
-
view
4.182 -
download
3
Transcript of Regular Expressions
![Page 2: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/2.jpg)
Topics
• What? • Why?• History - Who?• Flavou?rs• Grammar• Meta Chars• Character Classes• Shorthand Char Classes• Anchors• Repeaters or Quantifiers• Grouping & Capturing• Alternation• Match Float
• Atomic Grouping• Look Around• Conditional Expr.• Recursive Regex• Code Evalution• Code Expr.• Inline Modifiers• Regex Tools• Q&A
![Page 3: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/3.jpg)
What are Regular Expressions?
• A Regular expression is a pattern describing a certain amount of text.
• A regular expression, often called a pattern, is an expression that describes a set of strings. - Wikipedia
![Page 4: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/4.jpg)
Why do we need?
• Regular expressions allow matching and manipulation of textual data.
• Requirements• Matching/Finding• Doing something with matched text• Validation of data• Case insensitive matching• Parsing data ( ex: html )• Converting data into diff. form etc.
![Page 5: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/5.jpg)
History
Stephen KleeneA mathematician discovered ‘regular sets’.
![Page 6: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/6.jpg)
History
Ken Thompson1968 - Regular Expression Search Algorithm.
Qed -> ed -> g/re/p
![Page 7: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/7.jpg)
History
Henry Spencer1986 – Wrote a regex library in C
![Page 8: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/8.jpg)
Regex Flavors
BRE - Basic Regular Expressions• \?, \+, \{, \|, \(, and \)• ed, g/re/p, sed
ERE - Extended Regular Expressions• ?, +, {, |, (, and )• grep –E == egrep, awk
PCRE - Philip Hazel• Perl, PHP, Tcl etc.
![Page 9: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/9.jpg)
Grammar of Regex* RE = one or more non-empty ‘branches‘ separated by ‘|’
Branch = one or more ‘pieces’
Piece = atom followed by quantifier
Quantifier = ‘*,+,?’ or ‘bound’
Bound = atom{n}, atom{n,}, atom{m, n}
Atom = (RE) or
() or
‘^,$,’ or
\ followed by `^.[$()|*+?{\’ or
any-char or
‘bracket expression’
Bracket Expression = is a list of characters enclosed in `[ ]'
![Page 10: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/10.jpg)
Meta Chars?
2 + 4
Here ‘+’ has some special meaning
In a normal Expression like :
![Page 11: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/11.jpg)
Meta Chars
\ Quote the next metacharacter ^ Match the beginning of the line . Match any character (except newline) = [^\n] $ Match the end of the line (or before newline at the end) | Alternation ( ) Grouping [ ] Character class { } Match m to n times * Match 0 or more times + Match 1 or more times ? Match 1 or 0 times
![Page 12: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/12.jpg)
Non-printable Chars
\t tab (HT, TAB) \n newline (LF, NL) \r return (CR) \f form feed (FF) \a alarm (bell) (BEL) \e escape (think troff) (ESC) \033 octal char (example: ESC) \x1B hex char (example: ESC) \x{263a} long hex char (example: Unicode SMILEY) \cK control char (example: VT) \N{name} named Unicode character
![Page 13: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/13.jpg)
Character Classes – [ ]• Set of character placed inside square brackets. Inside brackets
meta characters lose their meaning ( except ‘] \ ^ - ‘)• Requirements
• Matches one and only one character of a specified chars.• Range can be specified using ‘-’.
• a-z matches 26 lower case English alphabets • 0-9 matches any digit.• Negation can be specified using ‘^’ at the beginning of class.• In order to match above specified exceptional chars literally either escape them or
need to specify at end.
[0-9] Matches any one of 0,1,2,3,4,5,6,7,8,9.[aeiou] Matches one English vowel char.[^aeiou] Matches any non-vowel char.[a-z-] Matches a to z and ‘-’[a-z0-9] Union matches a to z and 0 to 9.[a-z&&[m-z]] Intersection matches m to z.[a-z-[m-z] Subtraction matches a to l.
![Page 14: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/14.jpg)
POSIX Character Classes – [: … :]
[^[:digit:] ]= \D = [^0-9]
![Page 15: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/15.jpg)
Shorthand Chars
\w word character [A-Za-z0-9_]\d decimal digit [0-9]\s whitespace [ \n\r\t\f]
\W not a word character [^A-Za-z0-9_]\D not a decimal digit [^0-9]\S not whitespace [^ \n\r\t\f]
![Page 16: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/16.jpg)
Anchors/Assertions• Anchor matches a certain position in the subject string and it won’t consume any characters.
^ Match the beginning of the line $ Match the end of the line (or before newline at the end) \A Matches only at the very beginning \z Matches only at the very end \Z Matches like $ used in single-line mode \b Matches when the current position is a word boundary\<,\> Matches when the current position is a word boundary \B Matches when the current position is not a word boundary
![Page 17: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/17.jpg)
^Anchors• Anchor matches a certain position in the subject string and it won’t consume any characters.
^ Match the beginning of the line
Anchor matches a certain position In the subject string and it won’t consume any characters
/^a/
String begin with ‘a’
![Page 18: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/18.jpg)
Anchors$• Anchor matches a certain position in the subject string and it won’t consume any characters.
$ Match the end of the line (or before newline at the end)
Anchor matches a certain position In the subject string and it won’t consume any characters
/s$/
String end with ‘s’
![Page 19: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/19.jpg)
\A Anchors• Anchor matches a certain position in the subject string and it won’t consume any characters.
\A Matches only at the very beginning
Anchor matches a certain position In the subject string and it won’t consume any characters
^ Vs \A
![Page 20: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/20.jpg)
\z, \Z Anchors• Anchor matches a certain position in the subject string and it won’t consume any characters.
\z Matches only at the very end \Z Matches like $ used in single-line mode
Anchor matches a certain position In the subject string and it won’t consume any characters
$ Vs \z, \Z
![Page 21: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/21.jpg)
\b, \B Anchors• Anchor matches a certain position in the subject string and it won’t consume any characters.
\b = \W\w|\w\W = Matches a word boundary \B Matches when the current position is not a word boundary
/\b2\b/
/\B2\B/
$ xl2twiki file 2 > /dev/null
![Page 22: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/22.jpg)
Quantifiers• Why? – Because we are not sure about text. Specifies how many times regex component must repeat.
{m, n} = Matches minimum of m and a max of n occurrences. * = {0,} = Matches zero or more occurrences ( any amount).
+ = {1,} = Matches one or more occurrences.
? = {0,1} = Matches zero or one occurrence ( means optional ).
Quantifiers ( repetition) :
![Page 23: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/23.jpg)
Quantifiers• By default quantifiers are greedy.
/\d{2,4}/ 2010
/<.+>/ My first <strong> regex </strong> test. <strong> regex </strong>
/\w+sion/ Expression
If the entire match fails because they consumed too much, then they are forced to give up as much as needed to make the rest of regex succeed
![Page 24: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/24.jpg)
Non Greedy Quantifiers
{,}? *?
+?
??
To make non greedy quantifiers append ‘?’
<.+?> My first <strong> regex </strong> test. <strong>
Use negated classes
<[^>]+> My first <strong> regex </strong> test. <strong>
![Page 25: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/25.jpg)
Grouping – ( )
• Why? – To create sub patterns, so that you can apply regex operators to whole sub patterns or you can reference them by corresponding sub group numbers.
\d{2}-\d{2}-\d{2}(\d{2})?
Will match 01-01-10 and 01-01-2010 also.
• Grouping can be used for alternation.
![Page 26: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/26.jpg)
Alternation - |
• Why? – Lets you to match more than one sub-expression at same point.
/\b(get|set)Value\b/ Match either getValue or setValue.
• Branches are tried from left->right.• Eagerness - Most likely pattern as first alternative
• (and|android) -> ‘robot and an android fight’
![Page 27: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/27.jpg)
Capturing – ( )
• Allows us to access sub-parts of pattern for later processing.• All captured sub patterns are stored in memory.• Captured patterns are numbered from left to right.
/\b((\d{2})-(\d{2})-(\d{2}(\d{2})?))\b/
\b((\d{2})-(\d{2})-(\d{2}(\d{2})?))\b
Today is ‘18-08-2010’.
\1 -> date -> 18-08-2010\2 -> day-> 18\3 -> month -> 08\4 -> year -> 2010\5 -> year -> last two digits -> 10
![Page 28: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/28.jpg)
Non-Capturing sub patterns– (?: )
• If you really don’t require back referencing make sub expressions as non-capture, It will save memory and processing time.
\d{2}-\d{2}-\d{2}(?:\d{2})?
Will match 01-01-10 and 01-01-2010 also.
![Page 29: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/29.jpg)
• We can give names for sub patterns instead of numbers.
(?P<name>pattern) -> Python Style, Perl 5.12(?P=name) -> Back reference(?<name>pattern) or (?’name’pattern) ->Perl 5.10\k<name> or \k’name’ or -> Back reference\g{name}\g{-1}, \g{-2} -> Relative Back reference.
(?<vowel>[ai]).\k<vowel>.\1 abracadabra !!/(\w+)\s+\g{-1}/ "Thus joyful Troy Troy maintained the the watch of night...”
$date="18-08-2010";$date =~ s/(?<day>\d{2})-(?<month>\d{2})-(?<year>\d{4})/$+{year}-$+{month}-$+{day}/;
Named Capture – (?<> )
![Page 30: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/30.jpg)
• Hits• Lines that I want to match.
• Misses• Lines that I don’t want to match.
• Omissions• Lines that I didn’t match but wanted to match.
• False alarms• Lines that I matched but didn’t want to match.
Before Evaluating Regex
![Page 31: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/31.jpg)
Float number = integerpart.factionalpart
Matching a float number
Basic Principle – Split your task into sub tasks
![Page 32: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/32.jpg)
Integerpart = \d+ -> will match one or more digits
Matching a float number
![Page 33: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/33.jpg)
Matching a float number
Literal dot = \.
Integerpart = \d+ -> will match one or more digits
![Page 34: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/34.jpg)
Matching a float number
Literal dot = \.
Integerpart = \d+ -> will match one or more digits
Fractional part= \d+ -> will match one or more digits
![Page 35: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/35.jpg)
Integerpart = \d+
Matching a float number
Literal dot = \.
Fractional part = \d+
Combine all of them = \d+\.\d+
![Page 36: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/36.jpg)
Matching a float number
/\d+\.\d+/ -> Is generic.
It won’t match -123.45 or +123.45
![Page 37: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/37.jpg)
Matching a float number
/\d+\.\d+/ -> Is generic.
It won’t match -123.45 or +123.45
/[+-]?\d+\.\d+/ -> will match.
![Page 38: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/38.jpg)
Matching a float number
But It won’t match - 123.45 or + 123.45
/[+-]?\d+\.\d+/ -> will match.
/[+-]? *\d+\.\d+/ -> will match.
But It won’t match 123. or .45
![Page 39: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/39.jpg)
Matching a float number
/[+-]? *(?:\d+\.\d+|\d+\.|\.\d+)/ -> will match.
But It won’t match 123. or .45
/[+-]? \ *
(?: \d+\.\d+
| \d+\. | \.\d+ )/
![Page 40: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/40.jpg)
Matching a float number
/[+-]? *(?:\d+\.\d+|\d+\.|\.\d+)(?:[eE]\d+)?/ -> will match.
But It won’t match 10e2 or 101E5
/ [+-]? \ *
(?: \d+\.\d+
| \d+\. | \.\d+ )
(?: [eE]\d+)?
/
![Page 41: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/41.jpg)
Matching a float number
/^[+-]? *(?:\d+\.\d+|\d+\.|\.\d+)(?:[eE][+-]?\d+)?$/ -> will match.
But It won’t match 10e-2
/ ^[+-]? \ *
(?: \d+\.\d+
| \d+\. | \.\d+ )
(?: [eE][+-]?\d+)?
$/x
![Page 42: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/42.jpg)
Match a float number
/^ [+-]?\ * # first, match an optional sign (?: # then match integers or f.p. mantissas: \d+\.\d+ # mantissa of the form a.b |\d+\. # mantissa of the form a. |\.\d+ # mantissa of the form .b |\d+ # integer of the form a ) (?:[eE][+-]?\d+)? # finally, optionally match an exponent $/x;
![Page 43: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/43.jpg)
• Before looking into Atomic grouping need to know about Backtracking.
• Backtracking – If you don’t succeed try and try again...
Atomic Grouping – (?> )
\d+99 19999\d 19999 -> Add 1 to match -> 1
\d+ 19999 -> Add 9 to match -> 19
\d+ 19999 -> Add 9 to match -> 199
\d+ 19999 -> Add 9 to match -> 1999
\d+ 19999 -> Add 9 to match -> 19999
\d+ 19999 -> Still need to match 99
\d+99 19999 -> Give up a 9
\d+99 19999 -> Give up one more 9
\d+99 19999 -> Success
![Page 44: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/44.jpg)
• Before looking into Atomic grouping need to know about Backtracking.
• Backtracking – If you don’t succeed try and try again...
Atomic Grouping – (?> )
\d+xx 199Rs\d 199Rs -> Add 1 to match -> 1
\d+ 199Rs -> Add 9 to match -> 19
\d+ 199Rs -> Add 9 to match -> 199
\d+x 199Rs -> x not matched with R
\d+x 199Rs -> Give up 9, still cannot match x
\d+x 199Rs -> Give up 9, still cannot match x
\d+x 199Rs -> Cannot give 1 due to \d+
\d+xx 199Rs -> Failure
![Page 45: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/45.jpg)
• Atomic Grouping disables backtracking and speeds up the process.
• (?>pattern) here pattern will be treated as atomic token.• (?>\d+)xx here (?>\d+) won’t give up any digits and it locks.
• fails right at matching x with R.• Atomic groups are not captured and can be nested.
Atomic Grouping – (?> )
• Use Possessive quantifiers for single items to overcome backtracking.• Adding ‘+’ will make quantifier as possessive• (?>\d+)xx == \d++xx
Atomic Grouping:
Possessive Quantifiers:
![Page 46: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/46.jpg)
Look Around
Ahead Behind
Positive Negative Positive Negative
(?=...) (?!...) (?<=...) (?<!...)
(?=...) Zero-width positive lookahead assertion
(?!...) Zero-width negative lookahead assertion
(?<=...) Zero-width positive lookbehind assertion
(?<!...) Zero-width negative lookbehind assertion
*Note : Assertions can be nested.Example : /(?<=,
(?! (?<=\d,)(?=\d) ) )/
![Page 47: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/47.jpg)
/cat(?=\s+)/ I catch the housecat 'Tom-cat' with catnip
/(?<=\s)cat\w+/ I catch the housecat 'Tom-cat' with catnip
/\bcat\b / I catch the housecat 'Tom-cat' with catnip
/(?<=\s)cat(?=\s)/ no isolated 'cat’
Look Around
“I catch the housecat 'Tom-cat' with catnip”
/cat(?!\s)/ I catch the housecat 'Tom-cat' with catnip
/(?<!\s)cat/ I catch the housecat 'Tom-cat' with catnip
*Note : look-behind expressions cannot be of variable length. means you cannot use quantifiers (?, *, +, or {1,5}) or alternation of different-length items inside them.
![Page 48: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/48.jpg)
Conditional expressions• A conditional expression is a form of if-then-else statement that allows one to choose which patterns are to be matched, based on some condition
• (?(condition)yes-regexp)" is like an 'if () {}' statement• (?(condition)yes-regexp|no-regexp) 'if () {} else {}' statement
• Condition can be• Sub pattern match corresponding number• Lookaround Assertion• Recursive call
Match a (quoted)? string -> /^("|')?[^”’]*(?(1)\1)$/
Matches 'blah blah’Matches “blah blah”Matches blah blahWon’t Match ‘blah blah”
![Page 49: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/49.jpg)
Conditional expression• A conditional expression is a form of if-then-else statement that allows one to choose which patterns are to be matched, based on some condition
• (?(condition)yes-regexp)" is like an 'if () {}' statement• (?(condition)yes-regexp|no-regexp) 'if () {} else {}' statement
/(.)\1(?(<=AA)G|C)$/
ATGAAGTAGBBCGATGGC
/usr/share/dict/words -> /^(.+)(.+)?(?(2)\2\1|\1)$/
aabababeriberimaamvetitive
![Page 50: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/50.jpg)
• (x(x)y(x)x)
• Palindrome -> /^((.)(?:(?1)|\w)*(\2))$/
Recursive Patterns – (?)
qr/
^ # Start of string ( # Start capture group 1 \( # Open paren (?> # Possessive capture subgroup [^()]++ # Grab all the non parens we can | # or (?1) # Recurse into group 1 )* # Zero more times \) # Close Paren ) # End capture group 1 $ # End of string/x;
![Page 51: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/51.jpg)
• Perl code can be evaluated inside regular expressions using • (?{ }) construct.
Code Evaluation – (?{ })
$x = "aaaa”;$x =~ /(a(?{print "Yow\n";}))*aa/;
produces
Yow Yow Yow Yow
![Page 52: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/52.jpg)
• Pattern code expression - the result of the code evaluation is treated as a regular expression and matched immediately.
• Construct is (??{ })
$length = 5;
$char = 'a';
$str = 'aaaaabb';
$str =~ /(??{$char x $length})/x; # matches, there are 5 of 'a'
Pattern Code Expression – (??{ })
![Page 53: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/53.jpg)
Matching can be modified inline by placing modifiers.
(?i) enables case-insensitive mode(?m) enables multiline matching for ^ and $(?s) makes dot metacharacter match newline also(?x) ignores literal whitespace(?U) makes quantifiers ungreedy (lazy) by default
$answers =~ /(?i)y(?-i)(?:es)?/ -> Will match ‘y’, ’Y’, ’yes’, ’Yes’ but not ‘YES’.
Inline modifiers & Comments
Comments can be inserted inline using (?#) construct.
/^(?#begin)\d+(?#match integer part)\.(?#match dot)\d+(?#match fractional part)$/
![Page 54: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/54.jpg)
Regex Testers Tools EditorsVim, TextMate, Edit Pad Pro, NoteTab, UltraEdit
RegexBuddy
Reggy – http://reggyapp.com
http://rubular.com (Ruby)
RegexPal (JavaScript) - http://www.regexpal.com
http://www.gskinner.com/RegExr/
http://www.spaweditor.com/scripts/regex/index.php
http://regex.larsolavtorvik.com/ (PHP, JavaScript)
http://www.nregex.com/ ( .NET )
http://www.myregexp.com/ ( Java )
http://osteele.com/tools/reanimator ( NFA Graphic repr. )
Expresso - http://www.ultrapico.com/Expresso.htm ( .NET )
Regulator - http://sourceforge.net/projects/regulator ( .NET )
RegexRenamer - http://regexrenamer.sourceforge.net/ ( .NET )
PowerGREP http://www.powergrep.com/
Windows Grep - http://www.wingrep.com/
![Page 55: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/55.jpg)
Regex Resources
$perldoc perlre perlretut perlreref
$man re_format
“Mastering Regular Expressions”by Jeffrey Friedl
http://oreilly.com/catalog/9780596528126/
“Regular Expressions Cookbook”by Jan Goyvaerts & Steven Levithan
http://oreilly.com/catalog/9780596520694
![Page 56: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/56.jpg)
Questions?
*
{
}
\
^
]
+
$
[
(?
.
)-
:#
![Page 57: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/57.jpg)
Thank Y!ou
*
{
}
\
^
]
+
$
[
(?
.
)-
:#
![Page 58: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/58.jpg)
Java Regeximport java.util.regex.*;
public class MatchTest {
public static void main(String[] args) throws Exception {
String date = "12/30/1969"; Pattern p =Pattern.compile("^(\\d\\d)[-/](\\d\\d)[-/](\\d\\d(?:\\d\\d)?$"); Matcher m = p.matcher(date);
if (m.find( )) {
String month = m.group(1);String day = m.group(2);String year = m.group(3);System.out.printf("Found %s-%s-%s\n", year, month, day);
}
}
}
![Page 59: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/59.jpg)
PHP Regex
$date = "12/30/1969";
$p = "!^(\\d\\d)[-/](\\d\\d)[-/](\\d\\d(?:\\d\\d)?)$!";
if (preg_match($p,$date,$matches) {$month = $matches[1];$day = $matches[2];$year = $matches[3];
}
$text = "Hello world. <br>";
$pattern = "{<br>}i";
echo preg_replace($pattern, "<br />", $text);
![Page 60: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/60.jpg)
JavaScript Regexvar date = "12/30/1969";
var p =new RegExp("^(\\d\\d)[-/](\\d\\d)[-/](\\d\\d(?:\\d\\d)?)$");
var result = p.exec(date);
if (result != null) {
var month = result[1];var day = result[2];var year = result[3];
}
String text = "Hello world. <br>";
var pattern = /<br>/ig;
test.replace(pattern, "<br />");
![Page 61: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/61.jpg)
.NET Regexusing System.Text.RegularExpressions;
class MatchTest {
static void Main( ) {
string date = "12/30/1969";Regex r =new Regex( @"^(\d\d)[-/](\d\d)[-/](\d\d(?:\d\d)?)$" );Match m = r.Match(date);if (m.Success) {
string month = m.Groups[1].Value;string day = m.Groups[2].Value;string year = m.Groups[3].Value;
}
}}
![Page 62: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/62.jpg)
Python Regex
import re
date = '12/30/1969’
regex = re.compile(r'^(\d\d)[-/](\d\d)[-/](\d\d(?:\d\d)?)$')
match = regex.match(date)
if match:month = match.group(1) #12day = match.group(2) #30year = match.group(3) #1969
![Page 63: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/63.jpg)
Ruby Regex
date = '12/30/1969’
regexp = Regexp.new('^(\d\d)[-/](\d\d)[-/](\d\d(?:\d\d)?)$')
if md = regexp.match(date)month = md[1] #12day = md[2] #30year = md[3] #1969
end
![Page 64: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/64.jpg)
Unicode Properties
![Page 65: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/65.jpg)
• Pattern code expression - the result of the code evaluation is treated as a regular expression and matched immediately.
• (??{ })
Find Incremental numbers ?
$str="abc 123hai cde 34567 efg 1245 a132 123456789 10adf";
print "$1\n" while($str=~/\D( (\d) (?{$x=$2}) ( (??{++$x%10}) )*
) \D/gx);'
Pattern Code Expression – (??{ })
![Page 66: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/66.jpg)
Commify a number$no=123456789;substr($no,0,length($no)-1)=~s/(?=(?<=\d)(?:\d\d)+$)/,/g;print $no’
Produce 12,34,56,789
![Page 67: Regular Expressions](https://reader036.fdocuments.in/reader036/viewer/2022062312/555ee024d8b42a772f8b5578/html5/thumbnails/67.jpg)
Find Incremental numbers ?
$str="abc 123hai cde 34567 efg 1245 a132 123456789 10adf";
print "$1\n" while($str=~/\D( (\d) (?{$x=$2}) ( (??{++$x%10}) )*
) \D/gx);’
Non Capture group in a capture group won’t work :perl -e '$x="cat cat cat";$x=~/(cat(?:\s+))/;print ":$1:";’