Pedagogic application of regular expressions
Transcript of Pedagogic application of regular expressions
John Blake Japan Advanced Institute of Science and Technology
Pedagogic application of regular expressions
/\bbetween\W+(?:\w+\W+){1,2}?to\b/gi;
Overview
02
Introduction
• Probabilistic parsing
• Rule-based pattern matching
• Regular expressions
Pedagogic applications
• Modality detector
• Error detector
• Other: tagged corpora, pronunciation of “ed”
Probabilistic parsing
03
• Dynamic algorithms
• Machine learning
• Training sets
(e.g. Stanford POS parser)
Extremely powerful, but requires significant knowledge of computational linguistics and huge time investment so…
Rule-based pattern matching
04
1. There is a man on your left. T / F
If true, a man is on your left. Stop.
If false, proceed to 2.
2. There is a woman on your left. T / F
If true, there is a woman on your left. Stop.
If false, there is nobody on your left. Stop.
True/false statements
Rule-based pattern matching
05
Decision-tree algorithm
There is a man on your left.
There is a woman on your left.
No.Yes. STOP
Yes. STOP No.
There is nobody on your left. STOP
Assumptions:1. Only adults are present2. There is no third gender
Rule-based pattern matching
06
There is a man. /\bman\b/;
There is a woman. /\bwoman\b/;
Regular expressions (regexp|regex)
The discrete words “man” and “woman” will be identified, generating a “true” result.
Regular expressions (Regex)
07
e.g. /\bmaybe\b/gi;
\ – escape (from normal characters)i – case insensitiveb – boundaryg – greedy
1. I think that maybe he can understand. T/F2. He may be able to understand T/F3. Maybe, he can understand. T/F4. Maybelline is a company name. T/F5. Maybe, he said maybe. T/F
Pedagogic applications
08
Modality detector
Online error detectors - Common error detector (Morrall, 2000-14)- Corpus-based error detector (Blake, 2012-15)
Other applications- Annotation highlighter- Ideas for pronunciation, grammar and vocab
09
SituationApp. 1
Studentsgraduate students, researchers
Aimwrite research articles
Problems lack of familiarity of genre, lack of language, lack of content.
10
Tentative language & approximation
Type Examples
Modal verbs may, might, would, can
Lexical verbs seem, appear, suggest
Modal adverbs perhaps, probably, possibly,
Modal adjectives probable, possible, uncertain
Modal nouns assumption, claim, possibility
# Approximation
49% Almost a half, nearly 50%, less than 1 in 2
App. 1
11
Material mismatch
Students from different faculties studying tentative language (hedging) and approximation in academic writing use generic materials prepared by teacher.
App. 1
12
Lack of face validity
Some students do not want to “waste time” dealing with materials not appropriate to their major. They expect materials tailored to their exact needs.
App. 1
13
Solution: Modality detectorApp. 1
14
Solution: Modality detector
Individualized instruction• Student selects appropriate text• Student inputs relevant text• Regex identifies hedges & approximation• Execute command labels & highlights
App. 1
15
Warning: False positives More complex regex reduce false positives
App. 1
16
Piles of unmarked homework
Responding to written work takes too much time, and is repetitive since many students make the same surface-level mistakes.
App. 2
17
No time to respond
Teachers are expected to:
• Identify the location of errors• Explain the errors (if necessary)• Correct the errors (if necessary)
All of which take lots of time.
App. 2
18
Solution: Error detector
IdentificationStudent inputs own workRegex identifies expected errors
ExplanationExecute command selects and displaysprepared explanation
CorrectionStudent corrects work and submits improved version
App. 2
19
Error classificationApp. 2
Type Description
Accuracy factual and language errors
Brevity too many words
Clarity vague or ambiguous terms
Objectivity emotive language
Formality abbreviations, contractions, & informal terms
An ethnographic survey of the literature on writing scientific research articles revealed five key criteria (Blake & Blake, 2015)
20
App. 2
21
Specific example
Error• One of the + singular noun
Regex• /\bone of the\b/gi;
Execute
• Check that the phrase one of the is followed by a plural noun
App. 2
22
False positives harnessed in learning process by forcing student engagement
App. 2
23
Difficult-to-read tags
Introduction Purpose Method Results Discussion<segment features='problem;introduction;rhetorical_moves' state='active'>We address the problem of model-based object recognition.</segment> <segment features='purpose;rhetorical_moves' state='active'>Our aim is to localize and recognize road vehicles from monocular images or videos in calibrated traffic scenes.</segment> <segment features='method;rhetorical_moves' state='active'>A 3-D deformable vehicle model with 12 shape parameters is set up as prior information, and its pose is determined by three parameters, which are its position on the ground plane and its orientation about the vertical axis under ground-plane constraints.</segment> <segment features='purpose;rhetorical_moves' state='active'>An efficient local gradient-based method is proposed to evaluate the fitness between the projection of the vehicle model and image data, which is combined into a novel evolutionary computing framework to estimate the 12 shape parameters and three pose parameters by iterative evolution.</segment> <segment features='background;introduction;rhetorical_moves' state='active'>The recovery of pose parameters achieves vehicle localization, whereas the shape parameters are used for vehicle recognition.</segment> <segment features='method;rhetorical_moves' state='active'>Numerous experiments are conducted in this paper to demonstrate the performance of our
App. 3
24
Difficult-to-read tags
Introduction Purpose Method Results Discussion<segment features='problem;introduction;rhetorical_moves' state='active'>We address the problem of model-based object recognition.</segment> <segment features='purpose;rhetorical_moves' state='active'>Our aim is to localize and recognize road vehicles from monocular images or videos in calibrated traffic scenes.</segment> <segment features='method;rhetorical_moves' state='active'>A 3-D deformable vehicle model with 12 shape parameters is set up as prior information, and its pose is determined by three parameters, which are its position on the ground plane and its orientation about the vertical axis under ground-plane constraints.</segment> <segment features='purpose;rhetorical_moves' state='active'>An efficient local gradient-based method is proposed to evaluate the fitness between the projection of the vehicle model and image data, which is combined into a novel evolutionary computing framework to estimate the 12 shape parameters and three pose parameters by iterative evolution.</segment> <segment features='background;introduction;rhetorical_moves' state='active'>The recovery of pose parameters achieves vehicle localization, whereas the shape parameters are used for vehicle recognition.</segment> <segment features='method;rhetorical_moves' state='active'>Numerous experiments are conducted in this paper to demonstrate the performance of our
App. 3
25
Easy-to-read tags
Introduction Purpose Method Results Discussion
http://www.jaist.ac.jp/~johnb/Movehighlighter.html
App. 3
26
Ideas for you and your students
Pronunciation: Regular “ed”
• Regular “ed” /t/, /d/, /id/
• th [voiced or voiceless]
Grammar:
• Tenses: e.g. perfect continuous: been + ing
• Quantifiers : [U] much, little; [C] many, few; [U/C] lots of , a lot of
Vocabulary:
• Colours: red, blue crimson red, cobalt blue,
• Body parts: hand, eyes, leg hand out, eye up, leg it
27
Regular “ed”
False positives:• learned /d/ /id/
Pron Preceeding sound Potential regex
/id/ d, t /\(d|t)ed\b/gi;
/t/ voiceless consonants /\(s|f)ed\b/gi;
/d/ voiced consonants /\(z|v)ed\b/gi;
/d/ Vowel /\(ow|i|ay)ed\b/gi;
Pronunciation of “ed” is dictated by the sound of the preceeding letter(s).
| – Boolean “or” so x|y means either x or y
d|ted means d or ted but by adding brackets(d|t)ed means ded or ted
28
Pronunciation of “th”
Pron Feature Potential regex
/𝜹/ Voiced initial th /\btha(n|t|) \b/gi;/\bthe(\b|ir|m|re|se|y) \b/gi;/\bthis\b/gi;/\btho(se|ugh|) \b/gi;/\bthus\b/gi;
/𝜽/ Voiceless initial th /\bth/gi;
/t/ th pronounced as t /\bthomas|thames|thyme/gi;
Pronunciation of “th” can be predicted by the law that for function words the initial th is pronounced as a voiced sound.
References
29
Blake, J. (2012, November 28-30). Corpus-based academic written error detector. Conference proceedings of the 20th International Conference on Computers in Education. Nanyang Technological University, Singapore.
Blake, X. and Blake, J. (2015, January 29-31). Academic literacy: Mentor and mentee perspectives. Poster presented at 35th International Conference of ThaiTESOL, Bangkok, Thailand.
Morrall, A. (2000-2014). Common Error Detector. [Online tool] http://www2.elc.polyu.edu.hk/cill/errordetector.htm
Any questions, comments or suggestions?