Understanding advanced regular expressions
-
Upload
westhoff -
Category
Technology
-
view
3.189 -
download
2
Transcript of Understanding advanced regular expressions
![Page 1: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/1.jpg)
Deeper down the rabbit holeAdvanced Regular Expressions
Jakob Westhoff <[email protected]>@jakobwesthoff
PHPBarcamp.atMay 3, 2010
http://westhoffswelt.de [email protected] slide: 1 / 26
![Page 2: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/2.jpg)
About Me
Jakob Westhoff
PHP developer for several years
Computer science student at the TU Dortmund
Co-Founder of the PHP Usergroup Dortmund
Active in different Open Source projects
http://westhoffswelt.de [email protected] slide: 2 / 26
![Page 3: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/3.jpg)
Asking the audience
Who does already work with regular expressions?
Regular expressions like this:
/ [ a−zA−Z]+/
Or like this:
(?P<image >(?: none | i n h e r i t ) | ( ? : u r l \(\ s ∗ ( ? : ’ | ” )? ( ? : \ \ [ ’ ” \ \ ) ] | \ \ [ ˆ \ ’ ” \ \ ) ] | [ ˆ ’ ” \ \ ) ] ) ∗ ( ? : ’ | ” ) ?\ s ∗\)) )
http://westhoffswelt.de [email protected] slide: 3 / 26
![Page 4: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/4.jpg)
Asking the audience
Who does already work with regular expressions?
Regular expressions like this:
/ [ a−zA−Z]+/
Or like this:
(?P<image >(?: none | i n h e r i t ) | ( ? : u r l \(\ s ∗ ( ? : ’ | ” )? ( ? : \ \ [ ’ ” \ \ ) ] | \ \ [ ˆ \ ’ ” \ \ ) ] | [ ˆ ’ ” \ \ ) ] ) ∗ ( ? : ’ | ” ) ?\ s ∗\)) )
http://westhoffswelt.de [email protected] slide: 3 / 26
![Page 5: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/5.jpg)
Asking the audience
Who does already work with regular expressions?
Regular expressions like this:
/ [ a−zA−Z]+/
Or like this:
(?P<image >(?: none | i n h e r i t ) | ( ? : u r l \(\ s ∗ ( ? : ’ | ” )? ( ? : \ \ [ ’ ” \ \ ) ] | \ \ [ ˆ \ ’ ” \ \ ) ] | [ ˆ ’ ” \ \ ) ] ) ∗ ( ? : ’ | ” ) ?\ s ∗\)) )
http://westhoffswelt.de [email protected] slide: 3 / 26
![Page 6: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/6.jpg)
Goals of this session
Learn advanced techniques to use in (PCRE) regularexpressions
AssertionsOnce only subpatternsConditional subpatternsPattern recursion. . .
Learn howto to handle Unicode in your regular expressions
http://westhoffswelt.de [email protected] slide: 4 / 26
![Page 7: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/7.jpg)
Goals of this session
Learn advanced techniques to use in (PCRE) regularexpressions
AssertionsOnce only subpatternsConditional subpatternsPattern recursion. . .
Learn howto to handle Unicode in your regular expressions
http://westhoffswelt.de [email protected] slide: 4 / 26
![Page 8: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/8.jpg)
Goals of this session
Learn advanced techniques to use in (PCRE) regularexpressions
AssertionsOnce only subpatternsConditional subpatternsPattern recursion. . .
Learn howto to handle Unicode in your regular expressions
http://westhoffswelt.de [email protected] slide: 4 / 26
![Page 9: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/9.jpg)
What Regular Expressions are. . .
In theoretical computer science:
Express regular languagesLanguages which can be described by deterministic finite stateautomataType 3 grammars in the Chomsky hierarchy
http://westhoffswelt.de [email protected] slide: 5 / 26
![Page 10: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/10.jpg)
What Regular Expressions are. . .
In theoretical computer science:
Express regular languagesLanguages which can be described by deterministic finite stateautomataType 3 grammars in the Chomsky hierarchy
http://westhoffswelt.de [email protected] slide: 5 / 26
![Page 11: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/11.jpg)
What Regular Expressions are. . .
In theoretical computer science:
Express regular languagesLanguages which can be described by deterministic finite stateautomataType 3 grammars in the Chomsky hierarchy
http://westhoffswelt.de [email protected] slide: 5 / 26
![Page 12: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/12.jpg)
What Regular Expressions are. . .
In practical day to day usage:
“[. . . ]regular expressions provide concise and flexible means foridentifying strings of text of interest, such as particular characters,words, or patterns of characters.”
– Wikipedia [1]
http://westhoffswelt.de [email protected] slide: 6 / 26
![Page 13: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/13.jpg)
What Regular Expressions are. . .
In practical day to day usage:
“[. . . ]regular expressions provide concise and flexible means foridentifying strings of text of interest, such as particular characters,words, or patterns of characters.”
– Wikipedia [1]
http://westhoffswelt.de [email protected] slide: 6 / 26
![Page 14: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/14.jpg)
What Regular Expressions are. . .
In practical day to day usage:
“[. . . ]regular expressions provide concise and flexible means foridentifying strings of text of interest, such as particular characters,words, or patterns of characters.”
– Wikipedia [1]
http://westhoffswelt.de [email protected] slide: 6 / 26
![Page 15: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/15.jpg)
What Regular Expressions are. . .
In practical day to day usage:
“[. . . ]regular expressions provide concise and flexible means foridentifying strings of text of interest, such as particular characters,words, or patterns of characters.”
– Wikipedia [1]
http://westhoffswelt.de [email protected] slide: 6 / 26
![Page 16: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/16.jpg)
What Regular Expressions are. . .
In practical day to day usage:
“[. . . ]regular expressions provide concise and flexible means foridentifying strings of text of interest, such as particular characters,words, or patterns of characters.”
– Wikipedia [1]
http://westhoffswelt.de [email protected] slide: 6 / 26
![Page 17: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/17.jpg)
Building Blocks of a Regular Expression
Basic structure of every regular expression
/[a-z]+/im
Delimiter
Equal characters of arbitrary choice (must be escaped inexpression)May be ( and ) in PCRE
Expression
Modifier
A sequence of characters providing processing instructions
http://westhoffswelt.de [email protected] slide: 7 / 26
![Page 18: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/18.jpg)
Building Blocks of a Regular Expression
Basic structure of every regular expression
/[a-z]+/im
Delimiter
Equal characters of arbitrary choice (must be escaped inexpression)May be ( and ) in PCRE
Expression
Modifier
A sequence of characters providing processing instructions
http://westhoffswelt.de [email protected] slide: 7 / 26
![Page 19: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/19.jpg)
Building Blocks of a Regular Expression
Basic structure of every regular expression
/[a-z]+/im
Delimiter
Equal characters of arbitrary choice (must be escaped inexpression)May be ( and ) in PCRE
Expression
Modifier
A sequence of characters providing processing instructions
http://westhoffswelt.de [email protected] slide: 7 / 26
![Page 20: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/20.jpg)
Building Blocks of a Regular Expression
Basic structure of every regular expression
/[a-z]+/im
Delimiter
Equal characters of arbitrary choice (must be escaped inexpression)May be ( and ) in PCRE
Expression
Modifier
A sequence of characters providing processing instructions
http://westhoffswelt.de [email protected] slide: 7 / 26
![Page 21: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/21.jpg)
Building Blocks of a Regular Expression
Basic structure of every regular expression
/[a-z]+/im
Delimiter
Equal characters of arbitrary choice (must be escaped inexpression)May be ( and ) in PCRE
Expression
Modifier
A sequence of characters providing processing instructions
http://westhoffswelt.de [email protected] slide: 7 / 26
![Page 22: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/22.jpg)
Building Blocks of a Regular Expression
Basic structure of every regular expression
/[a-z]+/im
Delimiter
Equal characters of arbitrary choice (must be escaped inexpression)May be ( and ) in PCRE
Expression
Modifier
A sequence of characters providing processing instructions
http://westhoffswelt.de [email protected] slide: 7 / 26
![Page 23: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/23.jpg)
Building Blocks of a Regular Expression
Basic structure of every regular expression
/[a-z]+/im
Delimiter
Equal characters of arbitrary choice (must be escaped inexpression)May be ( and ) in PCRE
Expression
Modifier
A sequence of characters providing processing instructions
http://westhoffswelt.de [email protected] slide: 7 / 26
![Page 24: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/24.jpg)
Getting everybody up to speed
., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
^, $ - Start and end of subject (or line in multiline mode)
foo|bar - Logical Or
(foo)(bar) - Subpattern grouping
/(foo|bar)baz(\1)/ - Backreferences
[a-z], [^a-z] - Character classes
http://westhoffswelt.de [email protected] slide: 8 / 26
![Page 25: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/25.jpg)
Getting everybody up to speed
., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
^, $ - Start and end of subject (or line in multiline mode)
foo|bar - Logical Or
(foo)(bar) - Subpattern grouping
/(foo|bar)baz(\1)/ - Backreferences
[a-z], [^a-z] - Character classes
http://westhoffswelt.de [email protected] slide: 8 / 26
![Page 26: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/26.jpg)
Getting everybody up to speed
., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
^, $ - Start and end of subject (or line in multiline mode)
foo|bar - Logical Or
(foo)(bar) - Subpattern grouping
/(foo|bar)baz(\1)/ - Backreferences
[a-z], [^a-z] - Character classes
http://westhoffswelt.de [email protected] slide: 8 / 26
![Page 27: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/27.jpg)
Getting everybody up to speed
., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
^, $ - Start and end of subject (or line in multiline mode)
foo|bar - Logical Or
(foo)(bar) - Subpattern grouping
/(foo|bar)baz(\1)/ - Backreferences
[a-z], [^a-z] - Character classes
http://westhoffswelt.de [email protected] slide: 8 / 26
![Page 28: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/28.jpg)
Getting everybody up to speed
., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
^, $ - Start and end of subject (or line in multiline mode)
foo|bar - Logical Or
(foo)(bar) - Subpattern grouping
/(foo|bar)baz(\1)/ - Backreferences
[a-z], [^a-z] - Character classes
http://westhoffswelt.de [email protected] slide: 8 / 26
![Page 29: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/29.jpg)
Getting everybody up to speed
., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
^, $ - Start and end of subject (or line in multiline mode)
foo|bar - Logical Or
(foo)(bar) - Subpattern grouping
/(foo|bar)baz(\1)/ - Backreferences
[a-z], [^a-z] - Character classes
http://westhoffswelt.de [email protected] slide: 8 / 26
![Page 30: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/30.jpg)
Getting everybody up to speed
., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
^, $ - Start and end of subject (or line in multiline mode)
foo|bar - Logical Or
(foo)(bar) - Subpattern grouping
/(foo|bar)baz(\1)/ - Backreferences
[a-z], [^a-z] - Character classes
http://westhoffswelt.de [email protected] slide: 8 / 26
![Page 31: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/31.jpg)
Getting everybody up to speed
., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
^, $ - Start and end of subject (or line in multiline mode)
foo|bar - Logical Or
(foo)(bar) - Subpattern grouping
/(foo|bar)baz(\1)/ - Backreferences
[a-z], [^a-z] - Character classes
http://westhoffswelt.de [email protected] slide: 8 / 26
![Page 32: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/32.jpg)
Getting everybody up to speed
., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
^, $ - Start and end of subject (or line in multiline mode)
foo|bar - Logical Or
(foo)(bar) - Subpattern grouping
/(foo|bar)baz(\1)/ - Backreferences
[a-z], [^a-z] - Character classes
http://westhoffswelt.de [email protected] slide: 8 / 26
![Page 33: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/33.jpg)
Getting everybody up to speed
., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
^, $ - Start and end of subject (or line in multiline mode)
foo|bar - Logical Or
(foo)(bar) - Subpattern grouping
/(foo|bar)baz(\1)/ - Backreferences
[a-z], [^a-z] - Character classes
http://westhoffswelt.de [email protected] slide: 8 / 26
![Page 34: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/34.jpg)
Getting everybody up to speed
., .*, .+, .?, .{1,2} - Arbitrary characters and repetitions
^, $ - Start and end of subject (or line in multiline mode)
foo|bar - Logical Or
(foo)(bar) - Subpattern grouping
/(foo|bar)baz(\1)/ - Backreferences
[a-z], [^a-z] - Character classes
http://westhoffswelt.de [email protected] slide: 8 / 26
![Page 35: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/35.jpg)
Grouping Without Subpattern Creation
Grouping might be needed without creating a subpattern
/(?:foobar)*/
http://westhoffswelt.de [email protected] slide: 9 / 26
![Page 36: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/36.jpg)
Grouping Without Subpattern Creation
Grouping might be needed without creating a subpattern
/(?:foobar)*/
http://westhoffswelt.de [email protected] slide: 9 / 26
![Page 37: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/37.jpg)
Subpattern identification
Subpatterns are numbered by opening paranthesis
/(foo(bar)(baz))/1 foobarbaz2 bar3 baz
Matches available from within PHP
$matches = ar ray (0 => ” fooba rbaz ” ,1 => ” fooba rbaz ” ,2 => ” bar ” ,3 => ”baz” ,
)
http://westhoffswelt.de [email protected] slide: 10 / 26
![Page 38: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/38.jpg)
Subpattern identification
Subpatterns are numbered by opening paranthesis
/(foo(bar)(baz))/1 foobarbaz2 bar3 baz
Matches available from within PHP
$matches = ar ray (0 => ” fooba rbaz ” ,1 => ” fooba rbaz ” ,2 => ” bar ” ,3 => ”baz” ,
)
http://westhoffswelt.de [email protected] slide: 10 / 26
![Page 39: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/39.jpg)
Subpattern identification
Subpatterns are numbered by opening paranthesis
/(foo(bar)(baz))/1 foobarbaz2 bar3 baz
Matches available from within PHP
$matches = ar ray (0 => ” fooba rbaz ” ,1 => ” fooba rbaz ” ,2 => ” bar ” ,3 => ”baz” ,
)
http://westhoffswelt.de [email protected] slide: 10 / 26
![Page 40: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/40.jpg)
Subpattern identification
Subpatterns are numbered by opening paranthesis
/(foo(bar)(baz))/1 foobarbaz2 bar3 baz
Matches available from within PHP
$matches = ar ray (0 => ” fooba rbaz ” ,1 => ” fooba rbaz ” ,2 => ” bar ” ,3 => ”baz” ,
)
http://westhoffswelt.de [email protected] slide: 10 / 26
![Page 41: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/41.jpg)
Subpattern identification
Subpatterns are numbered by opening paranthesis
/(foo(bar)(baz))/1 foobarbaz2 bar3 baz
Matches available from within PHP
$matches = ar ray (0 => ” fooba rbaz ” ,1 => ” fooba rbaz ” ,2 => ” bar ” ,3 => ”baz” ,
)
http://westhoffswelt.de [email protected] slide: 10 / 26
![Page 42: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/42.jpg)
Subpattern identification
Subpatterns are numbered by opening paranthesis
/(foo(bar)(baz))/1 foobarbaz2 bar3 baz
Matches available from within PHP
$matches = ar ray (0 => ” fooba rbaz ” ,1 => ” fooba rbaz ” ,2 => ” bar ” ,3 => ”baz” ,
)
http://westhoffswelt.de [email protected] slide: 10 / 26
![Page 43: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/43.jpg)
Subpattern Naming
PCRE allows custom naming
/(?P<firstname>[A-Za-z]+) (?P<lastname>[A-Za-z]+)/
Result with input Jakob Westhoff
ar ray (0 => ’ Jakob Westhof f ’ ,’ f i r s t n ame ’ => ’ Jakob ’ ,1 => ’ Jakob ’ ,’ l a s tname ’ => ’ Westhof f ’ ,2 => ’ Westhof f ’ ,
)
http://westhoffswelt.de [email protected] slide: 11 / 26
![Page 44: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/44.jpg)
Subpattern Naming
PCRE allows custom naming
/(?P<firstname>[A-Za-z]+) (?P<lastname>[A-Za-z]+)/
Result with input Jakob Westhoff
ar ray (0 => ’ Jakob Westhof f ’ ,’ f i r s t n ame ’ => ’ Jakob ’ ,1 => ’ Jakob ’ ,’ l a s tname ’ => ’ Westhof f ’ ,2 => ’ Westhof f ’ ,
)
http://westhoffswelt.de [email protected] slide: 11 / 26
![Page 45: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/45.jpg)
Assertions
Formulate assertions on the matched string withoutconsuming them
Example
/foo(?=foo)/
Input
foofoofoo
Match
foofoofoo
http://westhoffswelt.de [email protected] slide: 12 / 26
![Page 46: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/46.jpg)
Assertions
Formulate assertions on the matched string withoutconsuming them
Example
/foo(?=foo)/
Input
foofoofoo
Match
foofoofoo
http://westhoffswelt.de [email protected] slide: 12 / 26
![Page 47: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/47.jpg)
Assertions
Formulate assertions on the matched string withoutconsuming them
Example
/foo(?=foo)/
Input
foofoofoo
Match
foofoofoo
http://westhoffswelt.de [email protected] slide: 12 / 26
![Page 48: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/48.jpg)
Assertions
Formulate assertions on the matched string withoutconsuming them
Example
/foo(?=foo)/
Input
foofoofoo
Match
foofoofoo
http://westhoffswelt.de [email protected] slide: 12 / 26
![Page 49: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/49.jpg)
Assertions
Formulate assertions on the matched string withoutconsuming them
Example
/foo(?=foo)/
Input
foofoofoo
Match
foofoofoo
http://westhoffswelt.de [email protected] slide: 12 / 26
![Page 50: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/50.jpg)
Assertions
Formulate assertions on the matched string withoutconsuming them
Example
/foo(?=foo)/
Input
foofoofoo
Match
foofoofoo
http://westhoffswelt.de [email protected] slide: 12 / 26
![Page 51: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/51.jpg)
Assertions
Formulate assertions on the matched string withoutconsuming them
Example
/foo(?=foo)/
Input
foofoofoo
Match
foofoofoo
http://westhoffswelt.de [email protected] slide: 12 / 26
![Page 52: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/52.jpg)
Assertions
Formulate assertions on the matched string withoutconsuming them
Example
/foo(?=foo)/
Input
foofoofoo
Match
foofoofoo
http://westhoffswelt.de [email protected] slide: 12 / 26
![Page 53: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/53.jpg)
Assertions
Formulate assertions on the matched string withoutconsuming them
Example
/foo(?=foo)/
Input
foofoofoo
Match
foofoofoo
http://westhoffswelt.de [email protected] slide: 12 / 26
![Page 54: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/54.jpg)
Negative Assertions
Negative assertions are possible
foo not followed by another foo
/foo(?!foo)/
http://westhoffswelt.de [email protected] slide: 13 / 26
![Page 55: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/55.jpg)
Negative Assertions
Negative assertions are possible
foo not followed by another foo
/foo(?!foo)/
http://westhoffswelt.de [email protected] slide: 13 / 26
![Page 56: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/56.jpg)
Backward Assertions
bar preceeded by foo
/////////////////////(?=foo)bar////?
Backward assertion
/(?<=foo)bar/
Negative backward assertion
bar not preceeded by foo
/(?<!foo)bar/
http://westhoffswelt.de [email protected] slide: 14 / 26
![Page 57: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/57.jpg)
Backward Assertions
bar preceeded by foo
/(?=foo)bar/ ?
Backward assertion
/(?<=foo)bar/
Negative backward assertion
bar not preceeded by foo
/(?<!foo)bar/
http://westhoffswelt.de [email protected] slide: 14 / 26
![Page 58: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/58.jpg)
Backward Assertions
bar preceeded by foo
/////////////////////(?=foo)bar////?
Backward assertion
/(?<=foo)bar/
Negative backward assertion
bar not preceeded by foo
/(?<!foo)bar/
http://westhoffswelt.de [email protected] slide: 14 / 26
![Page 59: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/59.jpg)
Backward Assertions
bar preceeded by foo
/////////////////////(?=foo)bar////?
Backward assertion
/(?<=foo)bar/
Negative backward assertion
bar not preceeded by foo
/(?<!foo)bar/
http://westhoffswelt.de [email protected] slide: 14 / 26
![Page 60: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/60.jpg)
Inner workings of the PCRE matcher
PCRE uses backtracking to find matches
Pattern: /\d+foo/Subject: 123456789bar
1 Eat up all the numbers: 123456789
2 Try to match foo
3 Backtrack one number and try to match foo again
4 Repeat step 3 until a match is found or the subjects beginningis reached
http://westhoffswelt.de [email protected] slide: 15 / 26
![Page 61: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/61.jpg)
Inner workings of the PCRE matcher
PCRE uses backtracking to find matches
Pattern: /\d+foo/Subject: 123456789bar
1 Eat up all the numbers: 123456789
2 Try to match foo
3 Backtrack one number and try to match foo again
4 Repeat step 3 until a match is found or the subjects beginningis reached
http://westhoffswelt.de [email protected] slide: 15 / 26
![Page 62: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/62.jpg)
Inner workings of the PCRE matcher
PCRE uses backtracking to find matches
Pattern: /\d+foo/Subject: 123456789bar
1 Eat up all the numbers: 123456789
2 Try to match foo
3 Backtrack one number and try to match foo again
4 Repeat step 3 until a match is found or the subjects beginningis reached
http://westhoffswelt.de [email protected] slide: 15 / 26
![Page 63: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/63.jpg)
Inner workings of the PCRE matcher
PCRE uses backtracking to find matches
Pattern: /\d+foo/Subject: 123456789bar
1 Eat up all the numbers: 123456789
2 Try to match foo
3 Backtrack one number and try to match foo again
4 Repeat step 3 until a match is found or the subjects beginningis reached
http://westhoffswelt.de [email protected] slide: 15 / 26
![Page 64: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/64.jpg)
Inner workings of the PCRE matcher
PCRE uses backtracking to find matches
Pattern: /\d+foo/Subject: 123456789bar
1 Eat up all the numbers: 123456789
2 Try to match foo
3 Backtrack one number and try to match foo again
4 Repeat step 3 until a match is found or the subjects beginningis reached
http://westhoffswelt.de [email protected] slide: 15 / 26
![Page 65: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/65.jpg)
Inner workings of the PCRE matcher
PCRE uses backtracking to find matches
Pattern: /\d+foo/Subject: 123456789bar
1 Eat up all the numbers: 123456789
2 Try to match foo
3 Backtrack one number and try to match foo again
4 Repeat step 3 until a match is found or the subjects beginningis reached
http://westhoffswelt.de [email protected] slide: 15 / 26
![Page 66: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/66.jpg)
Once only subpattern
Once only subpatterns prevent backtracking once a certainpattern has acquired a match.
Applying a once only pattern to the shown example
/(?>\d+)foo/After matching the numbers and determining the followingstring is not foo the matcher stops
123456789bar
Can massively improve regex speed if used correctly
http://westhoffswelt.de [email protected] slide: 16 / 26
![Page 67: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/67.jpg)
Once only subpattern
Once only subpatterns prevent backtracking once a certainpattern has acquired a match.
Applying a once only pattern to the shown example
/(?>\d+)foo/After matching the numbers and determining the followingstring is not foo the matcher stops
123456789bar
Can massively improve regex speed if used correctly
http://westhoffswelt.de [email protected] slide: 16 / 26
![Page 68: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/68.jpg)
Once only subpattern
Once only subpatterns prevent backtracking once a certainpattern has acquired a match.
Applying a once only pattern to the shown example
/(?>\d+)foo/After matching the numbers and determining the followingstring is not foo the matcher stops
123456789bar
Can massively improve regex speed if used correctly
http://westhoffswelt.de [email protected] slide: 16 / 26
![Page 69: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/69.jpg)
Once only subpattern
Once only subpatterns prevent backtracking once a certainpattern has acquired a match.
Applying a once only pattern to the shown example
/(?>\d+)foo/After matching the numbers and determining the followingstring is not foo the matcher stops
123456789bar
Can massively improve regex speed if used correctly
http://westhoffswelt.de [email protected] slide: 16 / 26
![Page 70: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/70.jpg)
Conditional subpattern
If statement aquivalent in PCRE
/(?(condition)yes-pattern|no-pattern)/
Conditions can be direct matches or assertions
Numbers need to be followed by foo, while everything elseneeds to be followed by bar
/(?(\d+)foo|bar)/
http://westhoffswelt.de [email protected] slide: 17 / 26
![Page 71: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/71.jpg)
Conditional subpattern
If statement aquivalent in PCRE
/(?(condition)yes-pattern|no-pattern)/
Conditions can be direct matches or assertions
Numbers need to be followed by foo, while everything elseneeds to be followed by bar
/(?(\d+)foo|bar)/
http://westhoffswelt.de [email protected] slide: 17 / 26
![Page 72: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/72.jpg)
Conditional subpattern
If statement aquivalent in PCRE
/(?(condition)yes-pattern|no-pattern)/
Conditions can be direct matches or assertions
Numbers need to be followed by foo, while everything elseneeds to be followed by bar
/(?(\d+)foo|bar)/
http://westhoffswelt.de [email protected] slide: 17 / 26
![Page 73: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/73.jpg)
Conditional subpattern
If statement aquivalent in PCRE
/(?(condition)yes-pattern|no-pattern)/
Conditions can be direct matches or assertions
Numbers need to be followed by foo, while everything elseneeds to be followed by bar
/(?(\d+)foo|bar)/
http://westhoffswelt.de [email protected] slide: 17 / 26
![Page 74: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/74.jpg)
Conditional subpattern
If statement aquivalent in PCRE
/(?(condition)yes-pattern|no-pattern)/
Conditions can be direct matches or assertions
Numbers need to be followed by foo, while everything elseneeds to be followed by bar
/(?(\d+)foo|bar)/
http://westhoffswelt.de [email protected] slide: 17 / 26
![Page 75: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/75.jpg)
Conditional subpattern
If statement aquivalent in PCRE
/(?(condition)yes-pattern|no-pattern)/
Conditions can be direct matches or assertions
Numbers need to be followed by foo, while everything elseneeds to be followed by bar
/(?(\d+)foo|bar)/
http://westhoffswelt.de [email protected] slide: 17 / 26
![Page 76: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/76.jpg)
Conditional subpattern
If statement aquivalent in PCRE
/(?(condition)yes-pattern|no-pattern)/
Conditions can be direct matches or assertions
Numbers need to be followed by foo, while everything elseneeds to be followed by bar
/(?(\d+)foo|bar)/
http://westhoffswelt.de [email protected] slide: 17 / 26
![Page 77: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/77.jpg)
Unicode: Character, code points and graphemes
Unicode consists of different code points
The letter a: U+0061The mark ‘: U+0300
One character might consist of multiple code points
The letter a with the mark ‘ (a) : U+0061 U+0300
Some of these combinations exists as single code points
The letter a: U+00E0
http://westhoffswelt.de [email protected] slide: 18 / 26
![Page 78: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/78.jpg)
Unicode: Character, code points and graphemes
Unicode consists of different code points
The letter a: U+0061The mark ‘: U+0300
One character might consist of multiple code points
The letter a with the mark ‘ (a) : U+0061 U+0300
Some of these combinations exists as single code points
The letter a: U+00E0
http://westhoffswelt.de [email protected] slide: 18 / 26
![Page 79: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/79.jpg)
Unicode: Character, code points and graphemes
Unicode consists of different code points
The letter a: U+0061The mark ‘: U+0300
One character might consist of multiple code points
The letter a with the mark ‘ (a) : U+0061 U+0300
Some of these combinations exists as single code points
The letter a: U+00E0
http://westhoffswelt.de [email protected] slide: 18 / 26
![Page 80: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/80.jpg)
Unicode: Character, code points and graphemes
Unicode consists of different code points
The letter a: U+0061The mark ‘: U+0300
One character might consist of multiple code points
The letter a with the mark ‘ (a) : U+0061 U+0300
Some of these combinations exists as single code points
The letter a: U+00E0
http://westhoffswelt.de [email protected] slide: 18 / 26
![Page 81: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/81.jpg)
Unicode: Character, code points and graphemes
Unicode consists of different code points
The letter a: U+0061The mark ‘: U+0300
One character might consist of multiple code points
The letter a with the mark ‘ (a) : U+0061 U+0300
Some of these combinations exists as single code points
The letter a: U+00E0
http://westhoffswelt.de [email protected] slide: 18 / 26
![Page 82: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/82.jpg)
Unicode: Character, code points and graphemes
Unicode consists of different code points
The letter a: U+0061The mark ‘: U+0300
One character might consist of multiple code points
The letter a with the mark ‘ (a) : U+0061 U+0300
Some of these combinations exists as single code points
The letter a: U+00E0
http://westhoffswelt.de [email protected] slide: 18 / 26
![Page 83: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/83.jpg)
Unicode: Character, code points and graphemes
Unicode consists of different code points
The letter a: U+0061The mark ‘: U+0300
One character might consist of multiple code points
The letter a with the mark ‘ (a) : U+0061 U+0300
Some of these combinations exists as single code points
The letter a: U+00E0
http://westhoffswelt.de [email protected] slide: 18 / 26
![Page 84: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/84.jpg)
Unicode: Pattern matching
Unicode processing is enabled using the u modifier
PCRE works on UTF-8 encoded strings
Each code point is handled as one character
Match any unicode code point: \x{FFFF}
Remember the letter a with the mark ‘ (a)
/\x{0061}\x{0030}/U
http://westhoffswelt.de [email protected] slide: 19 / 26
![Page 85: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/85.jpg)
Unicode: Pattern matching
Unicode processing is enabled using the u modifier
PCRE works on UTF-8 encoded strings
Each code point is handled as one character
Match any unicode code point: \x{FFFF}
Remember the letter a with the mark ‘ (a)
/\x{0061}\x{0030}/U
http://westhoffswelt.de [email protected] slide: 19 / 26
![Page 86: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/86.jpg)
Unicode: Pattern matching
Unicode processing is enabled using the u modifier
PCRE works on UTF-8 encoded strings
Each code point is handled as one character
Match any unicode code point: \x{FFFF}
Remember the letter a with the mark ‘ (a)
/\x{0061}\x{0030}/U
http://westhoffswelt.de [email protected] slide: 19 / 26
![Page 87: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/87.jpg)
Unicode: Pattern matching
Unicode processing is enabled using the u modifier
PCRE works on UTF-8 encoded strings
Each code point is handled as one character
Match any unicode code point: \x{FFFF}
Remember the letter a with the mark ‘ (a)
/\x{0061}\x{0030}/U
http://westhoffswelt.de [email protected] slide: 19 / 26
![Page 88: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/88.jpg)
Unicode: Pattern matching
Unicode processing is enabled using the u modifier
PCRE works on UTF-8 encoded strings
Each code point is handled as one character
Match any unicode code point: \x{FFFF}
Remember the letter a with the mark ‘ (a)
/\x{0061}\x{0030}/U
http://westhoffswelt.de [email protected] slide: 19 / 26
![Page 89: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/89.jpg)
Unicode: Pattern matching
Unicode processing is enabled using the u modifier
PCRE works on UTF-8 encoded strings
Each code point is handled as one character
Match any unicode code point: \x{FFFF}
Remember the letter a with the mark ‘ (a)
/\x{0061}\x{0030}/U
http://westhoffswelt.de [email protected] slide: 19 / 26
![Page 90: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/90.jpg)
Unicode: Extended unicode sequences
How to match the single and multi code point character?
Remember: a = U+0061 U+0300 oder U+00E0
Using escape for extended unicode sequences: \X
\X is aquivalent to (?>\P{M}\p{M}*)Wait. What? → Unicode character properties
http://westhoffswelt.de [email protected] slide: 20 / 26
![Page 91: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/91.jpg)
Unicode: Extended unicode sequences
How to match the single and multi code point character?
Remember: a = U+0061 U+0300 oder U+00E0
Using escape for extended unicode sequences: \X
\X is aquivalent to (?>\P{M}\p{M}*)Wait. What? → Unicode character properties
http://westhoffswelt.de [email protected] slide: 20 / 26
![Page 92: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/92.jpg)
Unicode: Extended unicode sequences
How to match the single and multi code point character?
Remember: a = U+0061 U+0300 oder U+00E0
Using escape for extended unicode sequences: \X
\X is aquivalent to (?>\P{M}\p{M}*)Wait. What? → Unicode character properties
http://westhoffswelt.de [email protected] slide: 20 / 26
![Page 93: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/93.jpg)
Unicode: Extended unicode sequences
How to match the single and multi code point character?
Remember: a = U+0061 U+0300 oder U+00E0
Using escape for extended unicode sequences: \X
\X is aquivalent to (?>\P{M}\p{M}*)Wait. What? → Unicode character properties
http://westhoffswelt.de [email protected] slide: 20 / 26
![Page 94: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/94.jpg)
Unicode: Extended unicode sequences
How to match the single and multi code point character?
Remember: a = U+0061 U+0300 oder U+00E0
Using escape for extended unicode sequences: \X
\X is aquivalent to (?>\P{M}\p{M}*)Wait. What? → Unicode character properties
http://westhoffswelt.de [email protected] slide: 20 / 26
![Page 95: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/95.jpg)
Unicode: Character properties
Every unicode code point has a certain property assigned
Characters may be matched by these properties
Escapes \p and \P are used for this:
\p{xx}: All code points with the property xx\P{xx}: All code points without the property xx
Possible properties:
L: LetterM: MarkP: PunctationSc: Currency symbol. . .
http://westhoffswelt.de [email protected] slide: 21 / 26
![Page 96: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/96.jpg)
Unicode: Character properties
Every unicode code point has a certain property assigned
Characters may be matched by these properties
Escapes \p and \P are used for this:
\p{xx}: All code points with the property xx\P{xx}: All code points without the property xx
Possible properties:
L: LetterM: MarkP: PunctationSc: Currency symbol. . .
http://westhoffswelt.de [email protected] slide: 21 / 26
![Page 97: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/97.jpg)
Unicode: Character properties
Every unicode code point has a certain property assigned
Characters may be matched by these properties
Escapes \p and \P are used for this:
\p{xx}: All code points with the property xx\P{xx}: All code points without the property xx
Possible properties:
L: LetterM: MarkP: PunctationSc: Currency symbol. . .
http://westhoffswelt.de [email protected] slide: 21 / 26
![Page 98: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/98.jpg)
Unicode: Character properties
Every unicode code point has a certain property assigned
Characters may be matched by these properties
Escapes \p and \P are used for this:
\p{xx}: All code points with the property xx\P{xx}: All code points without the property xx
Possible properties:
L: LetterM: MarkP: PunctationSc: Currency symbol. . .
http://westhoffswelt.de [email protected] slide: 21 / 26
![Page 99: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/99.jpg)
Pattern Recursion
Recursion in regular expressions ?
Possible with PCRE
Validate BB-Code using PCRE
[b]Hello [i]World[/i]![/b]
http://westhoffswelt.de [email protected] slide: 22 / 26
![Page 100: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/100.jpg)
Pattern Recursion
Recursion in regular expressions ?
Possible with PCRE
Validate BB-Code using PCRE
[b]Hello [i]World[/i]![/b]
http://westhoffswelt.de [email protected] slide: 22 / 26
![Page 101: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/101.jpg)
Pattern Recursion
Recursion in regular expressions ?
Possible with PCRE
Validate BB-Code using PCRE
[b]Hello [i]World[/i]![/b]
http://westhoffswelt.de [email protected] slide: 22 / 26
![Page 102: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/102.jpg)
BB-Code Recursion Example
[b]Hello [i]World[/i]![/b]
Recursive regular expression pattern
([^\[]*\[(b|i)\]
(?:[^\[]+|(?R))\[/\1\]
[^\[]*)
http://westhoffswelt.de [email protected] slide: 23 / 26
![Page 103: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/103.jpg)
BB-Code Recursion Example
[b]Hello [i]World[/i]![/b]
Recursive regular expression pattern
([^\[]*\[(b|i)\]
(?:[^\[]+|(?R))\[/\1\]
[^\[]*)
http://westhoffswelt.de [email protected] slide: 23 / 26
![Page 104: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/104.jpg)
BB-Code Recursion Example
[b]Hello [i]World[/i]![/b]
Recursive regular expression pattern
([^\[]*\[(b|i)\]
(?:[^\[]+|(?R))\[/\1\]
[^\[]*)
http://westhoffswelt.de [email protected] slide: 23 / 26
![Page 105: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/105.jpg)
BB-Code Recursion Example
[b]Hello [i]World[/i]![/b]
Recursive regular expression pattern
([^\[]*\[(b|i)\]
(?:[^\[]+|(?R))\[/\1\]
[^\[]*)
http://westhoffswelt.de [email protected] slide: 23 / 26
![Page 106: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/106.jpg)
BB-Code Recursion Example
[b]Hello [i]World[/i]![/b]
Recursive regular expression pattern
([^\[]*\[(b|i)\]
(?:[^\[]+|(?R))\[/\1\]
[^\[]*)
http://westhoffswelt.de [email protected] slide: 23 / 26
![Page 107: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/107.jpg)
BB-Code Recursion Example
[b]Hello [i]World[/i]![/b]
Recursive regular expression pattern
([^\[]*\[(b|i)\]
(?:[^\[]+|(?R))\[/\1\]
[^\[]*)
http://westhoffswelt.de [email protected] slide: 23 / 26
![Page 108: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/108.jpg)
Do NOT Parse Using Regular Expressions
Even though this is possible you do NOT want to do it
It is not maintainableIt is nearly impossible to find errorsUseful information extraction (building an AST) is not possible
Use regular expressions for
Match Patterns (not recursive structures)Tokenizing stringsValidate really restricted input values
http://westhoffswelt.de [email protected] slide: 24 / 26
![Page 109: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/109.jpg)
Do NOT Parse Using Regular Expressions
Even though this is possible you do NOT want to do it
It is not maintainableIt is nearly impossible to find errorsUseful information extraction (building an AST) is not possible
Use regular expressions for
Match Patterns (not recursive structures)Tokenizing stringsValidate really restricted input values
http://westhoffswelt.de [email protected] slide: 24 / 26
![Page 110: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/110.jpg)
Do NOT Parse Using Regular Expressions
Even though this is possible you do NOT want to do it
It is not maintainableIt is nearly impossible to find errorsUseful information extraction (building an AST) is not possible
Use regular expressions for
Match Patterns (not recursive structures)Tokenizing stringsValidate really restricted input values
http://westhoffswelt.de [email protected] slide: 24 / 26
![Page 111: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/111.jpg)
Do NOT Parse Using Regular Expressions
Even though this is possible you do NOT want to do it
It is not maintainableIt is nearly impossible to find errorsUseful information extraction (building an AST) is not possible
Use regular expressions for
Match Patterns (not recursive structures)Tokenizing stringsValidate really restricted input values
http://westhoffswelt.de [email protected] slide: 24 / 26
![Page 112: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/112.jpg)
Do NOT Parse Using Regular Expressions
Even though this is possible you do NOT want to do it
It is not maintainableIt is nearly impossible to find errorsUseful information extraction (building an AST) is not possible
Use regular expressions for
Match Patterns (not recursive structures)Tokenizing stringsValidate really restricted input values
http://westhoffswelt.de [email protected] slide: 24 / 26
![Page 113: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/113.jpg)
Do NOT Parse Using Regular Expressions
Even though this is possible you do NOT want to do it
It is not maintainableIt is nearly impossible to find errorsUseful information extraction (building an AST) is not possible
Use regular expressions for
Match Patterns (not recursive structures)Tokenizing stringsValidate really restricted input values
http://westhoffswelt.de [email protected] slide: 24 / 26
![Page 114: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/114.jpg)
Do NOT Parse Using Regular Expressions
Even though this is possible you do NOT want to do it
It is not maintainableIt is nearly impossible to find errorsUseful information extraction (building an AST) is not possible
Use regular expressions for
Match Patterns (not recursive structures)Tokenizing stringsValidate really restricted input values
http://westhoffswelt.de [email protected] slide: 24 / 26
![Page 115: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/115.jpg)
Do NOT Parse Using Regular Expressions
Even though this is possible you do NOT want to do it
It is not maintainableIt is nearly impossible to find errorsUseful information extraction (building an AST) is not possible
Use regular expressions for
Match Patterns (not recursive structures)Tokenizing stringsValidate really restricted input values
http://westhoffswelt.de [email protected] slide: 24 / 26
![Page 116: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/116.jpg)
Thanks for listening
Questions, comments or annotations?
Slides: http://westhoffswelt.de/portfolio.htm
Contact: Jakob Westhoff <[email protected]>Twitter: @jakobwesthoff
Please leave comments and vote at: http://joind.in/1620
http://westhoffswelt.de [email protected] slide: 25 / 26
![Page 117: Understanding advanced regular expressions](https://reader033.fdocuments.in/reader033/viewer/2022052822/554dc529b4c905bd488b503b/html5/thumbnails/117.jpg)
Bibliography I
[1] Wikipedia.Regular expressions — wikipedia, the free encyclopedia, 2002.[Online; accessed 25-February-2002].
http://westhoffswelt.de [email protected] slide: 26 / 26