1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008...
-
Upload
aria-berringer -
Category
Documents
-
view
219 -
download
2
Transcript of 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008...
![Page 1: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/1.jpg)
1
Jason Eisner Noah A. SmithJohns Hopkins Carnegie Mellon
TeachCL Workshop @ ACL – June 20, 2008
Competitive Grammar Writing
VP
![Page 2: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/2.jpg)
2
every summersince 2002
1. Welcome to the lab exercise! Please form teams of ~3 people … Programmers, get a linguist on your team
And vice-versa Undergrads, get a grad student on your team
And vice-versa
We always run this exercise on the 1st dayof the Johns Hopkins Summer School
in Human Language Technologythank you JHU,NSF, and NAACL …
We’ve also run variants in our JHU & CMU classes
![Page 3: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/3.jpg)
3
2. Okay, team, please log in
The 3 of you should use adjacent workstations Log in as individuals Your secret team directory:
cd …/03-turbulent-kiwi You can all edit files there Publicly readable & writeable No one else knows the secret directory name
Minimizes permissions fuss
![Page 4: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/4.jpg)
4
3. Now write a grammar of English You have 2 hours.
Actually, as the deadline approaches, the teams
usually vote to stay an extra hour
![Page 5: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/5.jpg)
5
3. Now write a grammar of English
What’s a grammar?
1 S1 NP VP .
1 VP VerbT NP
20 NP Det N’ 1 NP Proper
20 N’ Noun 1 N’ N’ PP
1 PP Prep NP
Here’s one to start with. You have 2 hours.
![Page 6: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/6.jpg)
6
3. Now write a grammar of English
1 Noun castle 1 Noun king … 1 Proper Arthur 1 Proper Guinevere
… 1 Det a 1 Det every
… 1 VerbT covers 1 VerbT rides
… 1 Misc that 1 Misc bloodier 1 Misc does
…
Plus initial terminal rules.
1 S1 NP VP .
1 VP VerbT NP
20 NP Det N’ 1 NP Proper
20 N’ Noun 1 N’ N’ PP
1 PP Prep NP
Here’s one to start with.
Any PCFG is okay
![Page 7: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/7.jpg)
7
Sample a sentence on the blackboard Any PCFG is okay
3. Now write a grammar of English Here’s one to start with.
S1 1
NP VP .
1 S1 NP VP .
1 VP VerbT NP
20 NP Det N’ 1 NP Proper
20 N’ Noun 1 N’ N’ PP
1 PP Prep NP
![Page 8: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/8.jpg)
8
Sample a sentence on the blackboard Any PCFG is okay
3. Now write a grammar of English Here’s one to start with.
S1
NP VP .
Det N’20/21
1/21
1 S1 NP VP .
1 VP VerbT NP
20 NP Det N’ 1 NP Proper
20 N’ Noun 1 N’ N’ PP
1 PP Prep NP
![Page 9: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/9.jpg)
9
Sample a sentence on the blackboard Arbitrary PCFG is okay
3. Now write a grammar of English Here’s one to start with.
S1
NP VP .
Det N’
Nounevery
castle
drinks [[Arthur [across the [coconut in the castle]]] [above another chalice]]
1 S1 NP VP .
1 VP VerbT NP
20 NP Det N’ 1 NP Proper
20 N’ Noun 1 N’ N’ PP
1 PP Prep NP
![Page 10: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/10.jpg)
10
4. Okay – go!
How will we be tested
on this?
![Page 11: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/11.jpg)
11
4. Okay – go!
How will we be tested
on this?
5. Evaluation procedure We’ll sample 20 random sentences
from your PCFG. Human judges will vote on whether
each sentence is grammatical. By the way, y’all will be the judges
(double-blind).
You probably want to use the sampling script to keep testing your grammar along the way.
this is educational
![Page 12: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/12.jpg)
12
Ok, we’re done!All our sentences
are already grammatical.
We’ll sample 20 random sentences from your PCFG.
Human judges will vote on whether each sentence is grammatical.
You’re right: This only tests precision.
How about recall?
5. Evaluation procedure
1 S1 NP VP .
1 VP VerbT NP
20 NP Det N’ 1 NP Proper
20 N’ Noun 1 N’ N’ PP
1 PP Prep NP
![Page 13: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/13.jpg)
13
questions, movement,(free) relatives, clefts,agreement, subcat frames, conjunctions, auxiliaries, gerunds, sentential subjects, appositives …
Development setYou might want your grammar to generate …
Arthur is the king . Arthur rides the horse near the castle . riding to Camelot is hard . do coconuts speak ? what does Arthur ride ? who does Arthur suggest she carry ? why does England have a king ? are they suggesting Arthur ride to Camelot ? five strangers are at the Round Table . Guinevere might have known . Guinevere should be riding with Patsy . it is Sir Lancelot who knows Zoot ! either Arthur knows or Patsy does . neither Sir Lancelot nor Guinevere will speak of it .
We provide a fileof 27 sample sentencesillustrating a range ofgrammatical phenomena
covered by initial grammar
![Page 14: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/14.jpg)
14
questions, movement,(free) relatives, clefts,agreement, subcat frames, conjunctions, auxiliaries, gerunds, sentential subjects, appositives …
Development setYou might want your grammar to generate …
the Holy Grail was covered by a yellow fruit . Zoot might have been carried by a swallow . Arthur rode to Camelot and drank from his chalice . they migrate precisely because they know they will grow . do not speak ! Arthur will have been riding for eight nights . Arthur , sixty inches , is a tiny king . Arthur knows Patsy , the trusty servant . Arthur and Guinevere migrate frequently . he knows what they are covering with that story . Arthur suggested that the castle be carried . the king drank to the castle that was his home . when the king drinks , Patsy drinks .
![Page 15: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/15.jpg)
15
What we could have done: Cross-entropy on a similar, held-out test set
5’. Evaluation of recall (= productivity!
!)
How should we parse sentences with OOV words?
No OOVs allowedin the test set.
Fixed vocabulary.
every coconut of his that the swallow dropped sounded like a horse .
![Page 16: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/16.jpg)
16
What we could have done: Cross-entropy on a similar, held-out test set
5’. Evaluation of recall (= productivity!
!)
You should try togenerate sentences thatyour opponentscan’t parse.
What we actually did, to heighten competition & creativity:Test set comes from the participants!
In Boggle, you getpoints for findingwords that youropponents don’t find.
Use the fixed vocabulary creatively.
What we could have done (good for your class?):Cross-entropy on a similar, held-out test set
![Page 17: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/17.jpg)
17
1 Noun castle 1 Noun king … 1 Proper Arthur 1 Proper Guinevere
… 1 Det a 1 Det every
… 1 VerbT covers 1 VerbT rides
… 1 Misc that 1 Misc bloodier 1 Misc does
…
Initial terminal rules
Use the fixed vocabulary creatively.
The initial grammar sticksto 3rd-person singular transitive present-tense forms. All grammatical.
But we provide 183 Misc words (not accessible from initial grammar) that you’re free to work into your grammar …
![Page 18: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/18.jpg)
18
1 Misc that 1 Misc bloodier 1 Misc does
…
Initial terminal rules
Use the fixed vocabulary creatively.
The initial grammar sticksto 3rd-person singular transitive present-tense forms. All grammatical.
But we provide 183 Misc words (not accessible from initial grammar) that you’re free to work into your grammar …
pronouns (various cases),plurals,
various verb forms,non-transitive verbs,
adjectives (various forms),adverbs & negation,
conjunctions & punctuation,wh-words,
…
![Page 19: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/19.jpg)
19
In Boggle, you getpoints for findingwords that youropponents don’t find.
5’. Evaluation of recall (= productivity!
!)
You should try togenerate sentences thatyour opponentscan’t parse.
What we could have done (good for your class?):Cross-entropy on a similar, held-out test set
What we actually did, to heighten competition & creativity:Test set comes from the participants!
![Page 20: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/20.jpg)
20
5’. Evaluation of recall (= productivity!
!)
You should try togenerate sentences thatyour opponentscan’t parse.
We’ll score your cross-entropywhen you try to parse the sentences
that the other teams generate.
(Only the ones judged grammatical.)
What we could have done (good for your class?):Cross-entropy on a similar, held-out test set
What we actually did, to heighten competition & creativity:Test set comes from the participants!
You probably want to use the parsing script to keep testing your grammar along the way.
![Page 21: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/21.jpg)
21
What we actually did, to heighten competition & creativity:Test set comes from the participants!
5’. Evaluation of recall (= productivity!
!)What we could have done (you could too):
Cross-entropy on a similar, held-out test set
We’ll score your cross-entropywhen you try to parse the sentences
that the other teams generate.
(Only the ones judged grammatical.)
What if my grammar can’t parse
one of the testsentences?
0 probability??You get the
infinite penalty.
So don’t do that.
![Page 22: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/22.jpg)
22
S2 S2 _Noun S2 _Misc _Noun Noun _Noun Noun _Noun _Noun Noun _Misc _Misc Misc _Misc Misc _Noun _Misc Misc _Misc
(etc.)
Use a backoff grammarInitial backoff grammar
: Bigram POS HMM
_Verb
Verb _Misc
Misc _Punc
Punc _Noun
Noun
S2
i.e., something that starts with a Verb
rides
‘s
!
swallow
i.e., something that starts with a Misc . . .
_Verb
Verb _Misc
Misc
![Page 23: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/23.jpg)
23
S2 S2 _Noun S2 _Misc _Noun Noun _Noun Noun _Noun _Noun Noun _Misc _Misc Misc _Misc Misc _Noun _Misc Misc _Misc
(etc.)
S1 NP VP .
VP VerbT NP
NP Det N’ NP Proper
N’ Noun N’ N’ PP
PP Prep NP
Use a backoff grammarInit. linguistic grammar Initial backoff grammar
: Bigram POS HMM
![Page 24: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/24.jpg)
24
S2 S2 _Noun S2 _Misc _Noun Noun _Noun Noun _Noun _Noun Noun _Misc _Misc Misc _Misc Misc _Noun _Misc Misc _Misc
(etc.)
S1 NP VP .
VP VerbT NP
NP Det N’ NP Proper
N’ Noun N’ N’ PP
PP Prep NP
Use a backoff grammar
Init. linguistic grammar Initial backoff grammar
: Bigram POS HMM
START S1 START S2
Initial master grammar
Choose these weights wisely!
Mixturemodel
![Page 25: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/25.jpg)
25
6. Discussion What did you do? How? Was CFG expressive enough?
How would you improve the formalism? Would it work for other languages?
How should one pick the weights? And how could you build a better backoff grammar? Is grammaticality well-defined? How is it related to probability?
What if you had 36 person-months to do it right? What other tools or data do you need? What would the resulting grammar be good for? What evaluation metrics are most important?
features, gapping
![Page 26: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/26.jpg)
26
7. Winners announced
![Page 27: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/27.jpg)
27
7. Winners announced Of course, no one finishes their ambitious plans.
Alternative: Allow 2 weeks (see paper) …
Anyway, a lot of work! Helps to
favor backoff grammar
yay
unreachable
![Page 28: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/28.jpg)
28
What did they do? (see paper) More fine-grained parts of speech do-support for questions & negation Movement using gapped categories X-bar categories (following the initial grammar) Singular/plural features Pronoun case Verb forms Verb subcategorization; selectional restrictions (“location”) Comparative vs. superlative adjectives Appositives (must avoid double comma) A bit of experimentation with weights One successful attempt to game scoring system (ok with
us!)
![Page 29: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/29.jpg)
29
Why do we recommend this lesson? Good opening activity
No programming Only very simple probability No background beyond linguistic intuitions
Though w/ time constraints, helps to have a linguist on the team
Works great with diverse teams Social, intense, good mixer, sets the pace
http://www.clsp.jhu.edu/grammar-writing
![Page 30: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/30.jpg)
30
Good opening activity
Why do we recommend this lesson? Good opening activity Introduces many topics – touchstone for later teaching
Grammaticality Grammaticality judgments, formal grammars, parsers Specific linguistic phenomena Desperate need for features, morphology, gap-passing
Generative probability models: PCFGs and HMMs Backoff, inside probability, random sampling, … Recovering latent variables: Parse trees and POS taggings
Evaluation (sort of) Annotation, precision, recall, cross-entropy, … Manual parameter tuning
Why learning would be valuable, alongside expert knowledge
http://www.clsp.jhu.edu/grammar-writing
![Page 31: 1 Jason EisnerNoah A. Smith Johns HopkinsCarnegie Mellon TeachCL Workshop @ ACL – June 20, 2008 Competitive Grammar Writing VP.](https://reader035.fdocuments.in/reader035/viewer/2022062320/56649c9e5503460f9495ef56/html5/thumbnails/31.jpg)
31
A final thought The CS curriculum starts with programming
Accessible and hands-on Necessary to motivate or understand much of CS
In CL, the equivalent is grammar writing It was the traditional (pre-statistical) introduction
Our contributions: competitive game, statistics, finite-state backoff, reusable instructional materials
Much of CL work still centers around grammar formalisms We design expressive formalisms for linguistic data Solve linguistic problems within these formalisms Enrich them with probabilities Process them with algorithms Learn them from data Connect them to other modules in the pipeline
Akin toprogramming
languages