Post on 03-Jan-2016
Regular Expression
Dr. Tran, Van Hoai
Faculty of Computer Science and Engineering HCMC Uni. of Technology
hoai@cse.hcmut.edu.vn
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n
Text pattern
Used for text-processing utilities Text-pattern = normal characters +
metacharacters = regular expression Metacharaters in regular expressions are
different from those in file name expansion
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n
Example (1)
grep [A-Z]* script*.shmeansgrep a.txt abc script1.sh script2.sh
grep "[a-z]*" script*.shmeans to find the pattern "[a-z]*" in
"script*.sh" Good and safe solutions are "" and ''
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n
Metacharacter sets Depends on usage context
searchingreplacing
Also depends on programs Different engines
PerlPHP .NET regular expression libraryJava JDK
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n
Searching patterns (1)
Character Pattern
. single character, except newline
* any number of characters immediately preceding it
^ the following regex at the beginning
[ ] any one of the enclosed characters, which can be given in range (-). "^" right after "[" means not
{n,m} a range of occurences of regex preceding it. {n} matches exactly n occurrences. {n,} matches at least n occurrences. {n,m} matches occurrences between n and m
\ turn off special meaning
\b word boundary
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n
Searching patterns (2)
Character Pattern
\{n,m\} a range of occurences of regex preceding it. {n} matches exactly n occurrences. {n,} matches at least n occurrences. {n,m} matches occurrences between n and m
+ one or more instances of preceding regex
? zero or one instances of preceding regex
| alternation
( ) apply match to an enclosed group of regex
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n
Example (2)
Pattern What does it match?
bag bag
^bag bag at the beginning of line
bag$ bag at the end of line
^bag$ only bag on the line
[Bb]ag Bag or bag
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n
Example (3)
Pattern What does it match?
b[aeiou]g second letter is vowel
b[^aeiou]g second letter is consonant (or uppercase, symbol)
b.g second letter is any
^\. any line begins with a dot
^\.[a-z][a-z] ?
^[^\.] ?
bugs* bug, bugs, bugss,etc.
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n
Example (4)
Pattern What does it match?
"word" ?
"*word"* ?
[A-Z][A-Z]* one or more uppercase letters
^\. any line begins with a dot
^\.[a-z][a-z] ?
^[^\.] ?
bugs* bug, bugs, bugss,etc.
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n
Example (5)
Pattern What does it match?
floating point number
java identifier
java simple arithmetic expression
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n
Replacing patterns (1)
Character Pattern
\ turn off special meaning
\n reuse the text matched by the nth subpattern previously saved by \( and \). Numbered from 1 to 9
& text match search pattern
~, % reuse previous replacement pattern
\u convert first character of replacement pattern to uppercase
\U convert entire replacement pattern to uppercase
\l, \L same; to lowercase
\e turn off previous \u or \l
\E turn off previous \U or \L
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n
Example (6)
Command Result
s/.*/( & )/ add space and parentheses
s/.*/mv & &.old/ ?
/^$/d delete blank lines (vi, g/^$/d for all lines)
%s/ */ /g turn one or more spaces into one space
%s/.*/\L&/ lowercase entire file
%s/yes/No/g replace yes to No
%s/Yes/~/g replace yes to No (previous replacement)
s/\(F\)\(ORTRAN\)/\1\L\2/g
FORTRAN to Fortran
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n
Applications
Pattern Matched text
grab a specific HTML tag
[0-9]\{1,3\}\. ???? IP address
Email address
Valid dates (day-month-year)
WeWe, does not match Wee
<TAG\b[^>]*>\(.*?\)</TAG>
[A-Z0-9._%-]+@[A-Z0-9.-]+\.[A-Z]{2,4}
Dr. Tran, Van Hoai2007
Re
gu
lar
Exp
res
sio
n text processing utilities
is NEXT