Outline
1. Regex vocabulary
2. Segmentation rules
3. Regex tagger
4. Regex text filter
5. Auto-translatables
(?<!(,|\.|\d|\d\s|\d'|\d’))([-|\u2212]?[\d]{2,3})(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,)([\d]{1,2}|[\d]{4,})(?!(,\d|\.\d|\d|\s\d|'\d|’\d))
Wildcards...
Wildcards used in regular search:• * – any text string• ? – any single character
...but somewhat different.
Regular expressions
• . – any character (or symbol, digit...)• [ ] – a range
[123] – digit 1 or 2 or 3[1-3] – any digit from 1 to 3[A-Za-z] – any letter[^A] – any character except „A”
• | – or1|2|3 – 1 or 2 or 3
Ranges
• Both [ ] and | means „or”. What is the difference?
• [USDEUR]matches U or S or D or E or U or R
• USD|EURmatches USD or EUR
Special symbols
• \ – modifier (”escape” character) . any character, but \. means dot \\ matches backslash
• \d – digit [0-9]• \s – white space• \w – any ”word” character [A-Za-z0-9_]• \u#### – unicode character, e.g. \u2212: –
Quantifiers
• ? – 0 or 1 \d? means zero or one digit
• * – 0 or more \d* means zero or more digits
• + – 1 or more \d+ meands at least one digit
• *? – zero or as little as possible• +? – one or as little as possible
• greedy
• lazy
Quantifiers cont.
• {num} – value or range \d{4} = 4 digits, \d{2,4} = 2, 3 or 4 digits \d{,4} = from 1 to 4 digits \d{4,} = 4 or more
Groups
• ( ) – creates a group ($num recalls it)
• (?: ) – passive group (not numbered)
Assertions
• (?= ) – look ahead assertion
memo(?=Q) will match „memo” in memoQ, but not in memory
• (?! ) – negative look ahead assertion
memo(?!Q) will match „memo” in memory, but not in memoQ
• (?<! ) – negative look back assertion
(?<!s)and will match „and” in band, but not in sand
#lists#
A list contains variables:
#currency#
(EUR|USD|GBP|HUF)
#cap#
(A|B|C|D) = [ABCD]
Regular expressions in memoQ
• Segmentation rules
• Regexp tagger
• Regexp text filter
• Auto-translatables
Segmentation rules
• #end##!#[\s]+#cap#• #end##!#[\s]+[\d]• #end##!#[\s]+#lpar#[\s]*#cap#• #end##!#[\s]+#lpar#[\s]*[\d]• #end#[\s]*#rpar##!#[\s]+#cap#• #end#[\s]*#rpar##!#[\s]+[\d]
• #end##!#[\s]+#cap#• #end##!#[\s]+[\d]• #end##!#[\s]+#lpar#[\s]*#cap#• #end##!#[\s]+#lpar#[\s]*[\d]• #end#[\s]*#rpar##!#[\s]+#cap#• #end#[\s]*#rpar##!#[\s]+[\d]
#end##!#[\s]+#cap#=
[:\!\?\.]#!#\s+[A-Z]
• #end##!#[\s]+#cap#Unless:
• #abbr_long##!#[\s]+#cap#• [\s]#abbr_short##!#[\s]+#cap#• \s#cap#\.#!#[\s]+#cap#
Regex tagger
<c:0xFF00FFFF>
\ <C: .* \>
0990-4905 / N537-0392
\d{4} -\d{4}
[A-Z] \d{3} - \d{4}
ERR_GRP_NO_SAMPLE
[A-Z]+ _[A-Z]+( )+
Tip: Regex tagger without regex
Regexp text filter
*Popup "Putty" "c:\util\putty.exe"
\s* \* (.*)
*Popup .icon="$IconDir$\Fav_Star.ico" "Quick" "!DynamicFolder:$QuickLaunch$*.lnk"
\w+(\s+\w+)*" "
\w = [A-Za-z0-9_]
Auto-translatables
Rule for EN/DE/FRHU number format conversion
(?<!(,|\.|\d|\d\s|\d'|\d’))([-|\u2212]?[\d]{2,3})(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,)([\d]{1,2}|[\d]{4,})(?!(,\d|\.\d|\d|\s\d|'\d|’\d))
$2 $3,$4
(?<!(,|\.|\d|\d\s|\d'|\d’))([-|\u2212]?[\d]{2,3})(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,)([\d]{1,2}|[\d]{4,})(?!(,\d|\.\d|\d|\s\d|'\d|’\d))
$2 $3,$4
(?<!(,|\.|\d|\d\s|\d'|\d’))([-|\u2212]?[\d]{2,3})(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,)([\d]{1,2}|[\d]{4,})(?!(,\d|\.\d|\d|\s\d|'\d|’\d))
$2 $3,$4
(?<!(,|\.|\d|\d\s|\d'|\d’))([-|\u2212]?[\d]{2,3})(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,)([\d]{1,2}|[\d]{4,})(?!(,\d|\.\d|\d|\s\d|'\d|’\d))
$2 $3,$4
(?<!(,|\.|\d|\d\s|\d'|\d’))([-|\u2212]?[\d]{2,3})(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,)([\d]{1,2}|[\d]{4,})(?!(,\d|\.\d|\d|\s\d|'\d|’\d))
$2 $3,$4
(?<!(,|\.|\d|\d\s|\d'|\d’))([-|\u2212]?[\d]{2,3})(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,)([\d]{1,2}|[\d]{4,})(?!(,\d|\.\d|\d|\s\d|'\d|’\d))
$2 $3,$4
(?<!(,|\.|\d|\d\s|\d'|\d’))([-|\u2212]?[\d]{2,3})(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,)([\d]{1,2}|[\d]{4,})(?!(,\d|\.\d|\d|\s\d|'\d|’\d))
$2 $3,$4
(?<!(,|\.|\d|\d\s|\d'|\d’))([-|\u2212]?[\d]{2,3})(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,)([\d]{1,2}|[\d]{4,})(?!(,\d|\.\d|\d|\s\d|'\d|’\d))
$2 $3,$4
12 345,67
12 345,67
12 345,67
12 345,67
12 345,67
12 345,67
12 345,67
12 345,67
12,345,67
12,345.67
12.345,67
12.345.67
12 345,67
12 345.67
12’345,67
12’345.67
.12,345,67
,12,345.67
0 12.345,67
0’12.345.67
12 345,67,0
12 345.67.0
12’345,67 0
12’345.67’0
(?<!(,|\.|\d|\d\s|\d'|\d’))([-|\u2212]?[\d]{2,3})(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,)([\d]{1,2}|[\d]{4,})(?!(,\d|\.\d|\d|\s\d|'\d|’\d))
$2 $3,$4
Red elements are not necessary:
(?<!(,|\.|\d|\d\s|\d'|\d’))([-|\u2212]?[\d]{2,3})(?:\.|,|\s|'|’)(\d\d\d)(?:\.|,)([\d]{1,2}|[\d]{4,})(?!(,\d|\.\d|\d|\s\d|'\d|’\d))
$1 $2,$3
The same rule for ENHU only
(?<!\d,|\d\.|\d)([-–]?\d{2,3}),(\d{3})\.(\d+)(?!,\d|\.\d|\d)
12,345.67 12 345,67
(?<!\d,|\d\.|\d)([-–]?\d{2,3}),(\d{3})\.(\d+)(?!,\d|\.\d|\d)
12,345.67 12 345,67
Day of the week,
Month
Day number (st, nd, rd, th)
Year
day of the week
day number.
month
year
(#day#),?\s(#month#)\s(\d{1,2})(?:st|nd|rd|th)?\s(\d{4})
$1 $3. $2 $4
(#day#),?\s(#month#)\s(\d{1,2})(?:st|nd|rd|th)?\s(\d{4})
#day#:Friday piątek ($1)
#month#: May maja ($2)
11th 11 ($3)
2012 2012 ($4)
$1 $3. $2 $4
• http://www.cheatography.com/davechild/cheat-sheets/regular-expressions/
• http://www.regular-expressions.info/tutorial.html
• http://regexlib.com