Pedagogic application of regular expressions

30
John Blake Japan Advanced Institute of Science and Technology Pedagogic application of regular expressions /\bbetween\W+(?:\w+\W+){1,2}?to\b/gi;

Transcript of Pedagogic application of regular expressions

Page 1: Pedagogic application of regular expressions

John Blake Japan Advanced Institute of Science and Technology

Pedagogic application of regular expressions

/\bbetween\W+(?:\w+\W+){1,2}?to\b/gi;

Page 2: Pedagogic application of regular expressions

Overview

02

Introduction

• Probabilistic parsing

• Rule-based pattern matching

• Regular expressions

Pedagogic applications

• Modality detector

• Error detector

• Other: tagged corpora, pronunciation of “ed”

Page 3: Pedagogic application of regular expressions

Probabilistic parsing

03

• Dynamic algorithms

• Machine learning

• Training sets

(e.g. Stanford POS parser)

Extremely powerful, but requires significant knowledge of computational linguistics and huge time investment so…

Page 4: Pedagogic application of regular expressions

Rule-based pattern matching

04

1. There is a man on your left. T / F

If true, a man is on your left. Stop.

If false, proceed to 2.

2. There is a woman on your left. T / F

If true, there is a woman on your left. Stop.

If false, there is nobody on your left. Stop.

True/false statements

Page 5: Pedagogic application of regular expressions

Rule-based pattern matching

05

Decision-tree algorithm

There is a man on your left.

There is a woman on your left.

No.Yes. STOP

Yes. STOP No.

There is nobody on your left. STOP

Assumptions:1. Only adults are present2. There is no third gender

Page 6: Pedagogic application of regular expressions

Rule-based pattern matching

06

There is a man. /\bman\b/;

There is a woman. /\bwoman\b/;

Regular expressions (regexp|regex)

The discrete words “man” and “woman” will be identified, generating a “true” result.

Page 7: Pedagogic application of regular expressions

Regular expressions (Regex)

07

e.g. /\bmaybe\b/gi;

\ – escape (from normal characters)i – case insensitiveb – boundaryg – greedy

1. I think that maybe he can understand. T/F2. He may be able to understand T/F3. Maybe, he can understand. T/F4. Maybelline is a company name. T/F5. Maybe, he said maybe. T/F

Page 8: Pedagogic application of regular expressions

Pedagogic applications

08

Modality detector

Online error detectors - Common error detector (Morrall, 2000-14)- Corpus-based error detector (Blake, 2012-15)

Other applications- Annotation highlighter- Ideas for pronunciation, grammar and vocab

Page 9: Pedagogic application of regular expressions

09

SituationApp. 1

Studentsgraduate students, researchers

Aimwrite research articles

Problems lack of familiarity of genre, lack of language, lack of content.

Page 10: Pedagogic application of regular expressions

10

Tentative language & approximation

Type Examples

Modal verbs may, might, would, can

Lexical verbs seem, appear, suggest

Modal adverbs perhaps, probably, possibly,

Modal adjectives probable, possible, uncertain

Modal nouns assumption, claim, possibility

# Approximation

49% Almost a half, nearly 50%, less than 1 in 2

App. 1

Page 11: Pedagogic application of regular expressions

11

Material mismatch

Students from different faculties studying tentative language (hedging) and approximation in academic writing use generic materials prepared by teacher.

App. 1

Page 12: Pedagogic application of regular expressions

12

Lack of face validity

Some students do not want to “waste time” dealing with materials not appropriate to their major. They expect materials tailored to their exact needs.

App. 1

Page 13: Pedagogic application of regular expressions

13

Solution: Modality detectorApp. 1

Page 14: Pedagogic application of regular expressions

14

Solution: Modality detector

Individualized instruction• Student selects appropriate text• Student inputs relevant text• Regex identifies hedges & approximation• Execute command labels & highlights

App. 1

Page 15: Pedagogic application of regular expressions

15

Warning: False positives More complex regex reduce false positives

App. 1

Page 16: Pedagogic application of regular expressions

16

Piles of unmarked homework

Responding to written work takes too much time, and is repetitive since many students make the same surface-level mistakes.

App. 2

Page 17: Pedagogic application of regular expressions

17

No time to respond

Teachers are expected to:

• Identify the location of errors• Explain the errors (if necessary)• Correct the errors (if necessary)

All of which take lots of time.

App. 2

Page 18: Pedagogic application of regular expressions

18

Solution: Error detector

IdentificationStudent inputs own workRegex identifies expected errors

ExplanationExecute command selects and displaysprepared explanation

CorrectionStudent corrects work and submits improved version

App. 2

Page 19: Pedagogic application of regular expressions

19

Error classificationApp. 2

Type Description

Accuracy factual and language errors

Brevity too many words

Clarity vague or ambiguous terms

Objectivity emotive language

Formality abbreviations, contractions, & informal terms

An ethnographic survey of the literature on writing scientific research articles revealed five key criteria (Blake & Blake, 2015)

Page 20: Pedagogic application of regular expressions

20

App. 2

Page 21: Pedagogic application of regular expressions

21

Specific example

Error• One of the + singular noun

Regex• /\bone of the\b/gi;

Execute

• Check that the phrase one of the is followed by a plural noun

App. 2

Page 22: Pedagogic application of regular expressions

22

False positives harnessed in learning process by forcing student engagement

App. 2

Page 23: Pedagogic application of regular expressions

23

Difficult-to-read tags

Introduction Purpose Method Results Discussion<segment features='problem;introduction;rhetorical_moves' state='active'>We address the problem of model-based object recognition.</segment> <segment features='purpose;rhetorical_moves' state='active'>Our aim is to localize and recognize road vehicles from monocular images or videos in calibrated traffic scenes.</segment> <segment features='method;rhetorical_moves' state='active'>A 3-D deformable vehicle model with 12 shape parameters is set up as prior information, and its pose is determined by three parameters, which are its position on the ground plane and its orientation about the vertical axis under ground-plane constraints.</segment> <segment features='purpose;rhetorical_moves' state='active'>An efficient local gradient-based method is proposed to evaluate the fitness between the projection of the vehicle model and image data, which is combined into a novel evolutionary computing framework to estimate the 12 shape parameters and three pose parameters by iterative evolution.</segment> <segment features='background;introduction;rhetorical_moves' state='active'>The recovery of pose parameters achieves vehicle localization, whereas the shape parameters are used for vehicle recognition.</segment> <segment features='method;rhetorical_moves' state='active'>Numerous experiments are conducted in this paper to demonstrate the performance of our

App. 3

Page 24: Pedagogic application of regular expressions

24

Difficult-to-read tags

Introduction Purpose Method Results Discussion<segment features='problem;introduction;rhetorical_moves' state='active'>We address the problem of model-based object recognition.</segment> <segment features='purpose;rhetorical_moves' state='active'>Our aim is to localize and recognize road vehicles from monocular images or videos in calibrated traffic scenes.</segment> <segment features='method;rhetorical_moves' state='active'>A 3-D deformable vehicle model with 12 shape parameters is set up as prior information, and its pose is determined by three parameters, which are its position on the ground plane and its orientation about the vertical axis under ground-plane constraints.</segment> <segment features='purpose;rhetorical_moves' state='active'>An efficient local gradient-based method is proposed to evaluate the fitness between the projection of the vehicle model and image data, which is combined into a novel evolutionary computing framework to estimate the 12 shape parameters and three pose parameters by iterative evolution.</segment> <segment features='background;introduction;rhetorical_moves' state='active'>The recovery of pose parameters achieves vehicle localization, whereas the shape parameters are used for vehicle recognition.</segment> <segment features='method;rhetorical_moves' state='active'>Numerous experiments are conducted in this paper to demonstrate the performance of our

App. 3

Page 25: Pedagogic application of regular expressions

25

Easy-to-read tags

Introduction Purpose Method Results Discussion

http://www.jaist.ac.jp/~johnb/Movehighlighter.html

App. 3

Page 26: Pedagogic application of regular expressions

26

Ideas for you and your students

Pronunciation: Regular “ed”

• Regular “ed” /t/, /d/, /id/

• th [voiced or voiceless]

Grammar:

• Tenses: e.g. perfect continuous: been + ing

• Quantifiers : [U] much, little; [C] many, few; [U/C] lots of , a lot of

Vocabulary:

• Colours: red, blue crimson red, cobalt blue,

• Body parts: hand, eyes, leg hand out, eye up, leg it

Page 27: Pedagogic application of regular expressions

27

Regular “ed”

False positives:• learned /d/ /id/

Pron Preceeding sound Potential regex

/id/ d, t /\(d|t)ed\b/gi;

/t/ voiceless consonants /\(s|f)ed\b/gi;

/d/ voiced consonants /\(z|v)ed\b/gi;

/d/ Vowel /\(ow|i|ay)ed\b/gi;

Pronunciation of “ed” is dictated by the sound of the preceeding letter(s).

| – Boolean “or” so x|y means either x or y

d|ted means d or ted but by adding brackets(d|t)ed means ded or ted

Page 28: Pedagogic application of regular expressions

28

Pronunciation of “th”

Pron Feature Potential regex

/𝜹/ Voiced initial th /\btha(n|t|) \b/gi;/\bthe(\b|ir|m|re|se|y) \b/gi;/\bthis\b/gi;/\btho(se|ugh|) \b/gi;/\bthus\b/gi;

/𝜽/ Voiceless initial th /\bth/gi;

/t/ th pronounced as t /\bthomas|thames|thyme/gi;

Pronunciation of “th” can be predicted by the law that for function words the initial th is pronounced as a voiced sound.

Page 29: Pedagogic application of regular expressions

References

29

Blake, J. (2012, November 28-30). Corpus-based academic written error detector. Conference proceedings of the 20th International Conference on Computers in Education. Nanyang Technological University, Singapore.

Blake, X. and Blake, J. (2015, January 29-31). Academic literacy: Mentor and mentee perspectives. Poster presented at 35th International Conference of ThaiTESOL, Bangkok, Thailand.

Morrall, A. (2000-2014). Common Error Detector. [Online tool] http://www2.elc.polyu.edu.hk/cill/errordetector.htm

Page 30: Pedagogic application of regular expressions

Any questions, comments or suggestions?

[email protected]