Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages....
-
Upload
alexina-oconnor -
Category
Documents
-
view
217 -
download
0
Transcript of Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages....
![Page 1: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/1.jpg)
Regular Expression
Mohsen Mollanoori
![Page 2: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/2.jpg)
What is RegeX ? “A notation to describe regular languages.” “Not necessarily (and not usually) regular” “A Powerful String Processing Tool” “A pattern that can be matched against a
string” “A Language But Not A Language”
![Page 3: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/3.jpg)
What RegeX Does ? String Processing
Matching Strings against a Specific Pattern Split Strings Change Substrings Extract Substrings
![Page 4: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/4.jpg)
What Programming Languages Support RegeX ? Almost All of Them
Perl Java .Net (C#, VB.Net, …) PHP Ruby Java Script …
And even Many IDEs & Editors & Utilities grep eclipse Visual Studio .Net vim emacs …
![Page 5: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/5.jpg)
The NotationSymbol Meaning Example
.Any Single Char /.at/ matches
“cat”, “bat”, “pat”, “mat”
*Zero or More occurrence
of preceding Char/a*b/ matches “b”,
“aaaaab”
+ One or More occurrence of preceding Char
/a+b/ matches “ab”, “aaaaab”
? Zero or One occurrence of preceding Char
/a?b/ matches “ab” and “b”
![Page 6: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/6.jpg)
Example 1
String: “Term”, “Term1”, “Term2”Pattern: /Term./Result: “Term1”, “Term2”
![Page 7: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/7.jpg)
Example 2
String: “Term”, “Term1”, “Term2”Pattern: /Term.?/Result: “Term”, “Term1”, “Term2”
![Page 8: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/8.jpg)
Example 3
String: “Term”, “Term1”, “Term2”Pattern: /Term1?/Result: “Term”, “Term1”
![Page 9: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/9.jpg)
Example 4
String: “Term1”, “Term11”, “Term2”, “Term”
Pattern: /Term1+/Result: “Term1”, “Term11”
![Page 10: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/10.jpg)
Example 5
String: “Term1”, “Term11”, “Term2”, “Term”
Pattern: /Term1*/Result: “Term1”, “Term11”, “Term”
![Page 11: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/11.jpg)
Character ClassesExample Meaning
[pnm] “p” or “n” or “m”
[Qq] “Q” or “q”
[A-Z] Upper Case Letters
[A-Za-z] Letters
[^A-Z] Every char EXCEPT A-Z
[A-Z&&[^C-E]] A-Z but NOT C-E
![Page 12: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/12.jpg)
Example 6
String: “CAT”, “Cat”, “cat”Pattern: /[Cc]at/Result: “Cat”, “cat”
![Page 13: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/13.jpg)
Example 7
String: “CAT”, “Cat”, “cat”Pattern: /[Cc][Aa][Tt]/Result: “CAT”, “Cat”, “cat”
![Page 14: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/14.jpg)
Example 8
String: “Term”, “Term1”, “Term2”Pattern: /[A-Za-z]+/Result: “Term”
![Page 15: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/15.jpg)
Example 9
String: “Term”, “Term1”, “Term222”Pattern: /.*[0-9]+/Result: “Term1”, “Term222”
![Page 16: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/16.jpg)
Example 10
String: “Term”, “Term1”, “Term222”Pattern: /[^0-9]+/Result: “Term”
![Page 17: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/17.jpg)
Repeating Chars (Intervals)Example Description
a{3} Matches “aaa”
a{3,5} Matches “aaa”, “aaaa”, “aaaaa”
a{3,} Matches “aaa”, “aaaa”, …
![Page 18: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/18.jpg)
Predefined Character Classes
Class Description\d Digit
\D Non Digit
\s Space
\S Non Space
\w Alphanumeric
\W Non Alphanumeric
\b Word Boundary
\B Non Word Boundary
\A The beginning of the input
\z The end of the input
![Page 19: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/19.jpg)
Example 11String: “This is some text !”Pattern: /is/Result: “This is some text !”
![Page 20: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/20.jpg)
Example 12String: “This is some text !”Pattern: /\bis\b/Result: “This is some text !”
![Page 21: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/21.jpg)
Example 13Variable Names
Pattern: /[A-Za-z]\w{0,15}/
![Page 22: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/22.jpg)
Groupsemail addresses:
/[A-Za-z0-9_]+@.+\.\w+/
/([A-Za-z0-9_]+)@(.+)\.(\w+)/
$1Username
$2Server
$3Domain
![Page 23: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/23.jpg)
RegeX & Perlopen (IN, “File.txt”); # open file
while ($line = <IN>) # read line by line{ if($line =~ /([A-Za-z0-9_])@(.+)\.(\w+)/) {
print ‘User =’, $1, “, Server =“, $2}
}
close(IN);
![Page 24: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/24.jpg)
RegeX & Ruby
open('in.txt', 'r').readlines.each do |line|
puts line if line =~ /^([a-z0-9_]+)@(.+)\.(.+)$/i
end
![Page 25: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/25.jpg)
RegeX & Java java.util.regex.Pattern java.util.regex.Matcher
java.util.Scanner
java.lang.String replaceAll(regex, replacement) replaceFirst(regex, replacement) matches(regex) split(regex)
![Page 26: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/26.jpg)
Example 16
String email = readEmailFromSomewhere();
if (email.matches("([A-Za-z0-9_]+)@(.+)\\.(\\w+)")) { System.out.println("valid email");} else { System.out.println("invalid email");}
![Page 27: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/27.jpg)
Example 17
String str = "098 123-456-789";
String[] nums = str.split("[\\s-]");
for (String num : nums) {
System.out.println(num);
}
![Page 28: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/28.jpg)
Example 18
// Remove Tags from HTML
String html = “<html><head><title>This is a title.</title></head>” +“<body>This is <b>body</b> of a <i>HTML</i> file” + “!</body></html>”;
String text = html.replaceAll("<[^>]+>", " ");String normalizedText = text.replaceAll("\\s+", " ");
System.out.println(normalizedText);
![Page 29: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/29.jpg)
Example 19// hyperlik urls
String html = "<html>Please Visit http://myhomepage.com</html>";
html = html.replaceAll("https?://([-.A-Za-z]+)“,"<a href='$0'>$1</a>");
System.out.println(html);
![Page 30: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/30.jpg)
Example 20Convert MixedCase to underlined_format
String MixedCase = "ThisIsSomeTextInMixedCaseFormat";
String temp = MixedCase.replaceAll("([a-z])([A-Z])", "$1_$2");
String underlined_format = temp.toLowerCase();
System.out.println(underlined_format);
// result: this_is_some_text_in_mixed_case_format
![Page 31: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/31.jpg)
Convert underlined_format to MixedCase
?
![Page 32: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/32.jpg)
Example 21The Pipe Sign Find Strings of 0s & 1s that have even
number of 1s or even number of 0s
str = ‘110100101'
puts str =~ /^(1*(01*0)*1*|0*(10*1)*0*)$/ ? 'Yes' : 'No'
![Page 33: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/33.jpg)
Example 22Finding Unintentionally Repeated Words
text = 'hello, this is some some text!'
?
![Page 34: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/34.jpg)
Back References \i references to iths matched group
Example: /(.)\1/ matches against “aa”, “bb”, “11”, “##”
![Page 35: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/35.jpg)
Example 22Finding Unintentionally Repeated Words
text = 'hello, this is some some text!'
if text =~ /(\b\w+\b)\W+\1/
puts $1 + " is repeated more than once"
end
# some is repeated more than once
![Page 36: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/36.jpg)
You even needn't write code An Editor that
supports RegeX
eclipse find/replace dialog box
eclipse find/replace dialog box
![Page 37: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/37.jpg)
Microsoft VS.NET Quick Replace
Use Regular ExpressionUse Regular Expression
Extracting Timestamps From a log file
Extracting Timestamps From a log file
![Page 38: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/38.jpg)
Some Rewriting System!rewrite(input) temp = input do before = temp temp = rewrite temp using rule1 temp = rewrite temp using rule2
...
after = temp while(before != after) return temp
![Page 39: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/39.jpg)
XML
<Students >
<Student faculty="Computer Engineering" student-id="8017024">
<Name first="Mohsen" last="Mollanoori"/>
<Terms >
<Term num="1">
<Lesson name="Statistics" mark="10"/>
<Lesson name"Math" mark="10"/>
</Term>
</Terms>
</Student>
</Students>
![Page 40: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/40.jpg)
MML
@Students
{
@Student(faculty="Computer Engineering" student-id="8017024")
{
@Name(first="Mohsen" last="Mollanoori");
@Terms
{
@Term(num="1")
{
@Lesson(name=“Statistics” mark="10");
@Lesson(name"Math1“ mark="10");
}
}
}
}
![Page 41: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/41.jpg)
Example 23MML 2 XML
do {
before = mml;
mml = mml.replaceAll(
"@([A-Za-z]+)(\\(([^)]*)\\))?;",
"<$1 $3/>“);
mml = mml.replaceAll(
"@([A-Za-z]+)(\\(([^)]*)\\))?\\{([^\\{\\}]*)\\}",
"<$1 $3>$4</$1>“);
after = mml;
} while (!before.equals(after));
![Page 42: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/42.jpg)
Example 24Remove Text from XML(Keep Tags Only)Is this Correct ?
String xml = “<b><em>Text Here</em></b>”
xml = xml.replaceAll(“>[^<]*<”, “”);
Match: “<b><em>Text Here</em></b>”
Result: “<b><em/em></b>”
![Page 43: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/43.jpg)
Look Ahead & Look Behind
String xml = “<b><em>Text Here</em></b>”
xml = xml.replaceAll(“(?<=>)[^<]*(?=<)”, “”);
Look Behind using ?<= to
see a ‘>’
Looking Ahead using ?= to see a ‘<’
![Page 44: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/44.jpg)
Example 25Over Matchingxml = “<a> aaa </a><b> bbb </b>”;
xml = xml.replaceFirst(“>.*<”, “”);
Match: xml = “<a> aaa </a><b> bbb </b>”;
Result: xml = “<a/b>”;
![Page 45: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/45.jpg)
Greedy & Non GreedyGreedy Non Greedy
* *?
+ +?
? ??
{a,b} {a,b}?
![Page 46: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/46.jpg)
Example 26Solution to Over Matchingxml = “<a> aaa </a><b> bbb </b>”;
xml = xml.replaceFirst(“>.*?<”, “”);
Match: xml = “<a> aaa </a><b> bbb </b>”;
Result: xml = “<a/a><b> bbb </b>”;
![Page 47: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/47.jpg)
Example 27String xml = "aabb";xml = xml.replaceAll(".{2,3}", "-");System.out.println(xml); // result = ‘-b’
String xml = "aabb";xml = xml.replaceAll(".{2,3}?", "-");System.out.println(xml);// result = ‘--’
![Page 48: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/48.jpg)
Example 28String xml = "aabb";xml = xml.replaceAll(".?", "-");System.out.println(xml);// result: -----
String xml = "aabb";xml = xml.replaceAll(".??", "-");System.out.println(xml);// result: -a-a-b-b-
![Page 49: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/49.jpg)
Further Reading & Works “Teach Yourself Regular Expressions in 10
Minutes”, Sams Publishing, February 28, 2004, ISBN: 0-672-32566-7
“Mastering Regular Expressions, 3rd Edition”, By Jeffrey E. F. Friedl, O'Reilly, August 2006, ISBN :0-596-52812-4
Java Regular Expression Documents
Practice, Practice, Practice
![Page 50: Regular Expression Mohsen Mollanoori. What is RegeX ? “ A notation to describe regular languages. ” “ Not necessarily (and not usually) regular ”](https://reader030.fdocuments.in/reader030/viewer/2022032709/56649ec55503460f94bd07e3/html5/thumbnails/50.jpg)
TANX