Evaluation Measure in Language...
Transcript of Evaluation Measure in Language...
![Page 1: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/1.jpg)
Evaluation Measure inLanguage Acquisition
Charles YangUniversity of Pennsylvania
LING50MIT
![Page 2: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/2.jpg)
From LSLT to Aspects: A Psychological Turn
![Page 3: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/3.jpg)
From LSLT to Aspects: A Psychological Turn
• LSLT
• “any simplification along these lines is immediately reflected in the length of the grammar” (§26: MDL)
• “defined the best analysis as the one that minimizes information per word in the generated language of grammatical discourses” (§35.4: entropy)
• 23.771 Mathematical Backgrounds for Communication (Hall/Partee)
![Page 4: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/4.jpg)
From LSLT to Aspects: A Psychological Turn
• LSLT
• “any simplification along these lines is immediately reflected in the length of the grammar” (§26: MDL)
• “defined the best analysis as the one that minimizes information per word in the generated language of grammatical discourses” (§35.4: entropy)
• 23.771 Mathematical Backgrounds for Communication (Hall/Partee)
• Aspects
• “... theories require supplementation by an evaluation measure if language acquisition is to be accounted for ... such a measure is not given a priori, in some manner. Rather, any proposal concerning such a measure is an empirical hypothesis about the nature of language” (p37)
![Page 5: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/5.jpg)
Three Psychological Conditions
![Page 6: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/6.jpg)
Three Psychological Conditions
• Formal sufficiency: Does the evaluation measure choose the “correct”grammar?
• Distributional methods: Harris (1951), Fowler (1952), Holt (1953)
![Page 7: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/7.jpg)
Three Psychological Conditions
• Formal sufficiency: Does the evaluation measure choose the “correct”grammar?
• Distributional methods: Harris (1951), Fowler (1952), Holt (1953)
• Ecological validity: Does the evaluation measure operate under reasonable assumptions about the learning data and mechanisms?
![Page 8: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/8.jpg)
Three Psychological Conditions
• Formal sufficiency: Does the evaluation measure choose the “correct”grammar?
• Distributional methods: Harris (1951), Fowler (1952), Holt (1953)
• Ecological validity: Does the evaluation measure operate under reasonable assumptions about the learning data and mechanisms?
• Developmental compatibility: Does the evaluation measure employed by the learner produce similar developmental patterns in language acquisition?
![Page 9: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/9.jpg)
MDL: An Evaluation Measure
DL(D,G) = |G|+ log
1
p(D|G)
![Page 10: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/10.jpg)
MDL: An Evaluation Measure
• MDL is similar with (or equivalent to) many other approaches such as Bayesian inference
DL(D,G) = |G|+ log
1
p(D|G)
![Page 11: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/11.jpg)
MDL: An Evaluation Measure
• MDL is similar with (or equivalent to) many other approaches such as Bayesian inference
• A method for hypothesis selection rather than hypothesis proposing
• Three conditions for choosing alternative methods, e.g., reinforcement learning, Fourier transform
DL(D,G) = |G|+ log
1
p(D|G)
![Page 12: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/12.jpg)
MDL: An Evaluation Measure
• MDL is similar with (or equivalent to) many other approaches such as Bayesian inference
• A method for hypothesis selection rather than hypothesis proposing
• Three conditions for choosing alternative methods, e.g., reinforcement learning, Fourier transform
• The composition of data
DL(D,G) = |G|+ log
1
p(D|G)
![Page 13: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/13.jpg)
MDL: An Evaluation Measure
• MDL is similar with (or equivalent to) many other approaches such as Bayesian inference
• A method for hypothesis selection rather than hypothesis proposing
• Three conditions for choosing alternative methods, e.g., reinforcement learning, Fourier transform
• The composition of data
• Subset Principle (Berwick 1985): the first Evaluation Measure to influence empirical work in language acquisition
DL(D,G) = |G|+ log
1
p(D|G)
![Page 14: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/14.jpg)
Evaluation Measuring Parameters
![Page 15: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/15.jpg)
Evaluation Measuring Parameters
• Aspects: “We want the hypotheses compatible with fixed data to be “scattered” in value, so that choice among them can be made relatively easily” (p61)
![Page 16: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/16.jpg)
Evaluation Measuring Parameters
• Aspects: “We want the hypotheses compatible with fixed data to be “scattered” in value, so that choice among them can be made relatively easily” (p61)
• Parameters can be viewed as low dimensional description of syntactic variation, or MDL
![Page 17: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/17.jpg)
Evaluation Measuring Parameters
• Aspects: “We want the hypotheses compatible with fixed data to be “scattered” in value, so that choice among them can be made relatively easily” (p61)
• Parameters can be viewed as low dimensional description of syntactic variation, or MDL
• CUNY CoLAG Parameter Domain (Sakas & J.D.Fodor in press): 13 parameters, 3072 grammars, 48086 distinct degree-0 sentences
• Most parameters are favorable for the learner and can be set independently (thus “scattered” well)
![Page 18: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/18.jpg)
Evaluation Measuring Parameters
• Aspects: “We want the hypotheses compatible with fixed data to be “scattered” in value, so that choice among them can be made relatively easily” (p61)
• Parameters can be viewed as low dimensional description of syntactic variation, or MDL
• CUNY CoLAG Parameter Domain (Sakas & J.D.Fodor in press): 13 parameters, 3072 grammars, 48086 distinct degree-0 sentences
• Most parameters are favorable for the learner and can be set independently (thus “scattered” well)
• Evidence for parameters in language acquisition
![Page 19: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/19.jpg)
MDL in Action
![Page 20: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/20.jpg)
MDL in Action
fly
bring
blow
think
catch
draw
Add -dwalk(Regulars)
flew
brought
blew
thought
caught
drew
walked(Regulars)
Stem Past tense
fly
bring
blow
think
catch
draw
Add -dwalk(Regulars)
flew
brought
blew
thought
caught
drew
walked(Regulars)
Stem Past tense
associations
![Page 21: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/21.jpg)
MDL in Action
![Page 22: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/22.jpg)
MDL in Action
![Page 23: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/23.jpg)
MDL in Action
flybringblowthinkcatchdraw
add -t & Rime !/a/
add -ø & Rime !/u/
Add -dwalk(Regulars)
flewbroughtblewthoughtcaughtdrewwalked(Regulars)
Stem Past tense
![Page 24: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/24.jpg)
MDL in Action
flybringblowthinkcatchdraw
add -t & Rime !/a/
add -ø & Rime !/u/
Add -dwalk(Regulars)
flewbroughtblewthoughtcaughtdrewwalked(Regulars)
Stem Past tense
“ ... the acceptance of these Laws (Grimm’s and Verner’s) as historicalfact is based wholly on considerations of simplicity”
Halle (1961: On the role of simplicity in linguistic descriptions)
![Page 25: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/25.jpg)
Evidence from Children• Largest past tense analysis to date (Gorman & Yang 2011)
• Free-rider effect: verbs belonging to larger rules learned better
![Page 26: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/26.jpg)
Evidence from Children• Largest past tense analysis to date (Gorman & Yang 2011)
• Free-rider effect: verbs belonging to larger rules learned better
Spearman ρ Kendall τ G-K γ☞Abstract rules 0.276 0.191 0.202
Surface rules 0.267 0.180 0.190Words only 0.128 0.133 0.140
![Page 27: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/27.jpg)
Evidence from Children• Largest past tense analysis to date (Gorman & Yang 2011)
• Free-rider effect: verbs belonging to larger rules learned better
Spearman ρ Kendall τ G-K γ☞Abstract rules 0.276 0.191 0.202
Surface rules 0.267 0.180 0.190Words only 0.128 0.133 0.140
Results and discussion • The effects of verb and pattern frequency on CUR (aggregated across all subjects) can be compared with rank correlation coefficients
• Similarly, verbs with low verb frequency and high pattern frequency verbs have significantly higher CURs than verbs with high verb frequency/low pattern frequency (two-tailed Mann-Whitney W = 193, p = 0.035)
• Both these results indicate that pattern frequency trumps verb frequency
•The relative effects of verb and pattern frequency are finally analyzed by fitting a logistic mixed effects regression model with the maximal random effects structure; as verb and pattern frequency are closely correlated (R = 0.635), log verb frequency is first residualized [14] against log pattern frequency
• Increases in both pattern and verb frequency result in fewer overregularizations
• The results are consistent with linguistic models [3, 6, 7, 15] which derive irregular pasts by stem change and suffixation, but storage models have no locus for the free-rider effect
• Future work will use this data to evaluate similarity-based models [5, 15, 16]
English irregular past tense errors and the free-rider effect
[1] S. Kuczaj. 1977. JoVL&VB 16. [2] S. Prasada, S. Pinker. 1993. L&CP 8 [3] M. Halle, K.P. Mohanan. 1985. LI 16. [4] J. Bybee, D. Slobin. 1982. Lg 58. [5] J. McClelland, K. Patterson. 2002. TiCS 6. [6] L. Stockall, A. Marantz. 2006. Mental Lexicon 1. [7] C. Yang. 2002. Knowledge and learning in natural language. Oxford: OUP. [8] R. Brown. 1973. A first language: The early stages. Cambridge: HUP. [9] K. Demuth, J. Culbertson, J. Alter. 2006. L&S 49. [10] W. Hall, W. Nagy, R. Linn. 1984. Spoken words: Effects of situation and social group on oral word usage and frequency. Hillsdale, NJ: Erlbaum. [11] G. Marcus, S. Pinker, M. Ullman, M. Hollander, J. Rosen, F. Xu. 1992. Overregularization in language acquisition. Chicago: UCP. [12] R. Maslen, A. Theakston, E. Lieven, M. Tomasello. 2004. JoSL&HR 47. [13] E. Brill. 1995. Computational Linguistics 21. [14] K. Gorman. 2010. In PWPL 16.2. [15] A. Albright and B. Hayes. 2003. Cognition 90. [16] S. Chandler. 2010. Cognitive Lingustics 21.
Kyle Gormanad • David Fabera • Elika Bergelsonbd • Charles Yangabcd
aDept. of Linguistics • bDept. of Psychology • cDept. of Computer & Information Science • dInstitute for Research in Cognitive Science
The debate • SOME BACKGROUND ON THE SUBJECT
• Storage accounts [1, 2]
• Other accounts, both symbolic [3, 4] and connectionist [5]
•
The free-rider effect • (definition of overregularization w/ example)
• (say we’ll ignore *broughted and similar errors)
• Yang [7] identifies the free-rider effect: children are quick to master low-frequency irregulars (e.g, wake/woke, forget/forgot) if they follow the same pattern as high-frequency irregulars (break/broke, get/got)
• This requires some representation to be shared by wake and break, and thus is antithetical to the storage account
• Recent experiments on adults also provide support for the derivational account: irregular past tense verbs (e.g., woke) fully prime their base (e.g., wake) in lexical decision [6]
• This is the largest ever study of children’s overregularization errors
Methods • The raw data are CHILDES transcripts of the spontaneous speech of 50 children ages 2-5 who are acquiring monolingual English and have no known cognitive or hearing deficits [8-12]
• A part-of-speech tagger [13] is used to locate verbs in the transcripts; correct and incorrect past tense tokens of irregular verbs are automatically tabulated and then verified by hand; however, immediate repetitions of a verb by a child are counted only one time
• A separate script aggregates the speech of parents and computes verb input frequency
• This coding is validated by comparing the resulting per-verb correct usage rate (CUR) for each of the children studied by Marcus et al. [11] using the Spearman rank correlation coefficient; the results indicates a close match (Adam: ρ = 0.92, Eve: ρ = 0.99, Sarah: ρ = 0.97)
•The procedure yields 11,735 tokens of 97 irregular verb types, and 839 overregularization tokens across 75 verb types (CUR = 92.9%)
• Irregular verbs are grouped into 22 patterns according to the stem change and/or suffix found in the past tense (relative to the present)
• Unlike prior attempts to categorize the English irregular verbs [4, 7, 11], this larger set of patterns uniquely determines the observed past tense from the present form
Pattern Example N Pattern Example N No change burst/burst 13 [ɩ] slide/slid 3
[oʊ] drive/drove 10 [ɔ] wear/wore 3
[ʌ] dig/dug 9 [oʊ...-d] sell/sold 2
[æ] sit/sat 8 [aʊ] find/found 2
[u:] blow/blew 7 [ɛ...-d] say/said 1
[ɛ] breed/bred 7 [ɔ...-t] lose/lost 1
[ɛ...-t] keep/kept 7 [ɛɚ...-d] hear/heard 1
[-t] build/built 5 was be/was 1
[eɩ] eat/ate 5 went go/went 1
rime→[ɔt] buy/bought 5 make make/made 1
[ɑ] shoot/shot 3 stood stand/stood 1
Spearman ρ Kendall τb G-K γ Pattern frequency 0.267 0.180 0.190
Verb frequency 0.128 0.133 0.144
Estimate Std. err. p-value Log pattern frequency 0.488 0.160 0.002
Resid. log verb frequency 0.413 0.081 4.2e-7
< avg. rule freq. > avg. verb freq. > avg. rule freq. < avg. verb freq.
two-tailed Mann-Whitney W=156.5, p=0.019
![Page 28: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/28.jpg)
Yesterday he ____
![Page 29: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/29.jpg)
Yesterday he ____
![Page 30: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/30.jpg)
Yesterday he ____
• The forgotten Wug test (Berko 1958)
• Only one out of 86 children produced bing-bang, gling-glang
![Page 31: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/31.jpg)
Yesterday he ____
• The forgotten Wug test (Berko 1958)
• Only one out of 86 children produced bing-bang, gling-glang
• Children over-regularize: 8-10%
![Page 32: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/32.jpg)
Yesterday he ____
• The forgotten Wug test (Berko 1958)
• Only one out of 86 children produced bing-bang, gling-glang
• Children over-regularize: 8-10%
• Children over-irregularize: <0.2%
![Page 33: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/33.jpg)
Yesterday he ____
• The forgotten Wug test (Berko 1958)
• Only one out of 86 children produced bing-bang, gling-glang
• Children over-regularize: 8-10%
• Children over-irregularize: <0.2%
• Children’s Evaluation Measure produces a binary outcome: productive or lexical
• probability spreading insufficient
![Page 34: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/34.jpg)
Exceptions in Evaluation Measure
![Page 35: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/35.jpg)
Exceptions in Evaluation Measure
“core”, “basic word order”, “default case”, “unmarked form”vs.
“periphery”, “lexical listing”, “exceptional marking”, “diacritics”
![Page 36: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/36.jpg)
Exceptions in Evaluation Measure
• SPE: “Clearly, we must design our linguistic theory in such a way that the existence of exceptions does not prevent the systematic formulation of those regularities that remain ... Finally, an overriding consideration is that the evaluation measure must be designed in such a way that the wider and more varied the class of exceptions to a rule, the less highly valued is the grammar” (p172)
“core”, “basic word order”, “default case”, “unmarked form”vs.
“periphery”, “lexical listing”, “exceptional marking”, “diacritics”
![Page 37: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/37.jpg)
Exceptions in Evaluation Measure
• SPE: “Clearly, we must design our linguistic theory in such a way that the existence of exceptions does not prevent the systematic formulation of those regularities that remain ... Finally, an overriding consideration is that the evaluation measure must be designed in such a way that the wider and more varied the class of exceptions to a rule, the less highly valued is the grammar” (p172)
• But majority doesn’t rule: 90% of English words in speech are stress initial (Cutler & Carter 1987); Legate & Yang poster
“core”, “basic word order”, “default case”, “unmarked form”vs.
“periphery”, “lexical listing”, “exceptional marking”, “diacritics”
![Page 38: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/38.jpg)
Measuring Rules
![Page 39: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/39.jpg)
Measuring Rules
![Page 40: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/40.jpg)
Measuring Rules
• Optimization again: instead of space, it’s time
![Page 41: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/41.jpg)
Measuring Rules
• Optimization again: instead of space, it’s time
• Exceptions = Real time processing slowdown
• “kicked the bucket” faster than “lifted the bucket” by 51ms (Swinney & Cutler 1979)
• Production: German irregular past participle (-n) faster than regular (-t) by 38ms (Clahsen & Fleischhauer 2011)
• Lexical decision: English irregular verbs faster than regulars by 19ms (English Lexicon Project; Lignos 2011)
![Page 42: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/42.jpg)
Measuring Rules
• Optimization again: instead of space, it’s time
• Exceptions = Real time processing slowdown
• “kicked the bucket” faster than “lifted the bucket” by 51ms (Swinney & Cutler 1979)
• Production: German irregular past participle (-n) faster than regular (-t) by 38ms (Clahsen & Fleischhauer 2011)
• Lexical decision: English irregular verbs faster than regulars by 19ms (English Lexicon Project; Lignos 2011)
• Exceptions delay rule computation
![Page 43: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/43.jpg)
Measuring Rules
• Optimization again: instead of space, it’s time
• Exceptions = Real time processing slowdown
• “kicked the bucket” faster than “lifted the bucket” by 51ms (Swinney & Cutler 1979)
• Production: German irregular past participle (-n) faster than regular (-t) by 38ms (Clahsen & Fleischhauer 2011)
• Lexical decision: English irregular verbs faster than regulars by 19ms (English Lexicon Project; Lignos 2011)
• Exceptions delay rule computation
•Exception 1•Exception 2 •Exception 3•...•Rule
![Page 44: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/44.jpg)
Tolerance Principle
![Page 45: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/45.jpg)
Tolerance Principle
N
lnN
To be productive, the maximum exceptions to a rule/process applicable to N items is
![Page 46: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/46.jpg)
Tolerance Principle
N
lnN
To be productive, the maximum exceptions to a rule/process applicable to N items is
• If English has 150 irregular verbs, we need 900 regulars to have a productive -ed rule: 1050/ln(1050) = 150
![Page 47: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/47.jpg)
Tolerance Principle
N
lnN
To be productive, the maximum exceptions to a rule/process applicable to N items is
• If English has 150 irregular verbs, we need 900 regulars to have a productive -ed rule: 1050/ln(1050) = 150
• Children start over-regularization when they reach the tipping point
![Page 48: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/48.jpg)
Tolerance Principle
N
lnN
To be productive, the maximum exceptions to a rule/process applicable to N items is
• If English has 150 irregular verbs, we need 900 regulars to have a productive -ed rule: 1050/ln(1050) = 150
• Children start over-regularization when they reach the tipping point
• N (e.g., vocabulary size) and the number of exceptions may vary from speaker to speaker, accounting for certain individual patterns in language acquisition and sociolinguistic variation
![Page 49: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/49.jpg)
A Birth-er Problem
![Page 50: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/50.jpg)
A Birth-er Problem
• The suffix -er is productive and segmented in real time even for broth-er (Rastle, Davis & New 2004) resulting in slowdown (Lignos 2011)
![Page 51: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/51.jpg)
A Birth-er Problem
• The suffix -er is productive and segmented in real time even for broth-er (Rastle, Davis & New 2004) resulting in slowdown (Lignos 2011)
• While some -er’s are real (hunt-hunter), some are not (corn-corner, cent-center, sock-soccer): children need to learn -er despite exceptions
![Page 52: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/52.jpg)
A Birth-er Problem
• The suffix -er is productive and segmented in real time even for broth-er (Rastle, Davis & New 2004) resulting in slowdown (Lignos 2011)
• While some -er’s are real (hunt-hunter), some are not (corn-corner, cent-center, sock-soccer): children need to learn -er despite exceptions
• English Lexicon Project (Balota et al. 2007)
• hunt-hunter type: 94, cent-center type: 18
• The suffix -er is productive: 18 < 112/ln(112)=24
![Page 53: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/53.jpg)
A Birth-er Problem
• The suffix -er is productive and segmented in real time even for broth-er (Rastle, Davis & New 2004) resulting in slowdown (Lignos 2011)
• While some -er’s are real (hunt-hunter), some are not (corn-corner, cent-center, sock-soccer): children need to learn -er despite exceptions
• English Lexicon Project (Balota et al. 2007)
• hunt-hunter type: 94, cent-center type: 18
• The suffix -er is productive: 18 < 112/ln(112)=24
• The suffix -th fails to reach productivity: warmth, width, depth etc. overwhelmed by tooth, booth, filth, forth, ...
![Page 54: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/54.jpg)
stride-strode-??? N
lnN
![Page 55: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/55.jpg)
stride-strode-???
• Mere majority is not sufficient; filibuster proof majority required
N
lnN
![Page 56: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/56.jpg)
stride-strode-???
• Mere majority is not sufficient; filibuster proof majority required
• [-lexical insertion] (Halle 1972, esp. fn1)
• gaps only arise in unproductive corners of morphology
N
lnN
![Page 57: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/57.jpg)
stride-strode-???
• Mere majority is not sufficient; filibuster proof majority required
• [-lexical insertion] (Halle 1972, esp. fn1)
• gaps only arise in unproductive corners of morphology
• 102 out of 161 irregular verbs (36%) show preterite and past participle syncretism
• Tolerance Principle only allows 1/ln(161)=20% exceptions
• *forwent, *sightsaw, *stridden
N
lnN
![Page 58: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/58.jpg)
Evaluation Metrics
![Page 59: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/59.jpg)
Evaluation Metrics
• What not to do: Computer chess
![Page 60: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/60.jpg)
Evaluation Metrics
• What not to do: Computer chess
• Resource bounded optimization
![Page 61: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/61.jpg)
Evaluation Metrics
• What not to do: Computer chess
• Resource bounded optimization
• Convergence of methods and disciplines
![Page 62: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/62.jpg)
Evaluation Metrics
• What not to do: Computer chess
• Resource bounded optimization
• Convergence of methods and disciplines
• Simple theories are usually right ones
![Page 63: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed](https://reader030.fdocuments.in/reader030/viewer/2022040617/5f1fffec2267f62dce74d175/html5/thumbnails/63.jpg)
Thank you, to my teachers
• Bob Berwick
• Noam Chomsky
• Morris Halle