MorphoQuantics: researching complex words in spoken ...
Transcript of MorphoQuantics: researching complex words in spoken ...
Workshop on Derivational Morphology
and Spoken Language
MorphoQuantics: researching complex
words in spoken English and its relevance
to educational and clinical applications
Jacqueline Laws Department of English Language & Applied Linguistics
University of Reading, United Kingdom
• Derivational Morphemes:
• what types are there?
• how many are there?
• Aims of the Project and the BA’s contribution
• MorphoQuantics: Database of complex words
• A Case Study: Verb-forming Suffixation
• Applications:
• Sociolinguistics, Psycholinguistics, Morphology
• First and second Language acquisition
• Educational and Clinical settings
2
Overview
Affixes:
• Word-initial affixes un-kind, re-play
• Word-final affixes peer-age, south-ern
Combining Forms (neoclassical):
• Word-initial combining forms aristo-crat, mono-logue
• Word-final combining forms micro-cosm, philo-sophy __________________________________________________________________________
Neutral affixes (native):
• Germanic origins un-kind, delight-ful
o Transparent, no stress change, more informal
Non-neutral affixes (non-native):
• Latinate and Greek origins re-play, atom-ic
o Opaque, stress /sound change, more formal
Types of Derivational Morphemes
3
Word-Initial
DMs
Word-Final
DMs Total
Quinion (2002) 836 453 1,289
Stein (2007) 547 296 843
Dixon (2014) 100 119 219
Totals: all sources
containing
Combining Forms
974 525 1,499
OED 1,947 930 2,877
How many Derivational Morphemes?
4
MorphoQuantics 554 281 847
Combining Forms: 419 141 560
Affixes: 141 146 287
• Acquisition of derivational morphology in children:
• CHILDES database: children aged 2, 3, 4, and 5
• 100,000 tokens/group: 400K + 985K adult; 85 suffixes
The History MorphoQuantics
5
6
0
20
40
60
80
100
120
140
160
20H 30H 40H 50H
Su
ffix
Ty
pe
Fre
qu
ency
-ie2/-y4 (Prop-N)
-y1 (Adjective)
-ie1/-y3 (Noun)
-er2/-or2 (inanimate)
-er1/-or1 (animate)
-ly (Adverb)
-ette/-et
-ion (all types)
-ic (Adj & N)
-ous
-an (Adj & N)
-ful (Adj & N)
***
*
**
*
*
** p<0.05
** p<0.01
*** p<0.001
**** p<0.0001
Significant
Type increase
2 – 5 years
From: Laws, J.V. (in preparation)
• Acquisition of derivational morphology in children:
• CHILDES database: children aged 2, 3, 4, and 5
• 100,000 tokens/group: 400K + 985K adult; 85 suffixes
• Comparison with adult speech:
• Spoken component of the British National Corpus
• From 10,409,858 tokens MorphoQuantics
• First database of complex words in any language
• Website released in 2014 (http://morphoquantics.co.uk)
• 18,000 complex word types
• 1,093,000 complex word tokens
• Licence agreement with OED for etymology in 2015
• Users in Finland, Spain, Germany, Japan, France,
Slovakia, Mexico, New Zealand, UK, Korea
The History MorphoQuantics
7
Target Set of Derivational Morphemes
10
Combining Forms
Word-Initial
419
Combining Forms
Word-Final
141
Prefixes
141
Suffixes
146
How many Combining Forms / Affixes occurred in BNC?
Number of Target Combining Forms Observed in the Spoken BNC
11
36%
44% of
target
CFs
occurred
in BNC
68%
419
141 151
96
Number of Target Affixes Observed in the Spoken BNC
12
87% 89% of
target
affixes
occurred
in BNC
90%
141 146
123 131
Initial Results for Spoken Language
13
Combining Forms
Word-Initial
36%
Combining Forms
Word-Final
68%
Prefixes
87%
Suffixes
90%
To what degree does register affect complex word use?
14
Register and Distribution of Affixed Lexemes
Adapted from Schmid (2011): Percentage of prefixed
tokens
Suffixes
15
Register and Distribution of Affixed Lexemes
Adapted from Schmid (2011): Percentage of suffixed noun
tokens
Suffixes
16
Register and Distribution of Affixed Lexemes
Adapted from Schmid (2011): Percentage of suffixed
adjective tokens
Suffixes
17
Register and Distribution of Affixed Lexemes
Adapted from Schmid (2011): Percentage of suffixed verb
tokens
Suffixes
18
Register and Distribution of Affixed Lexemes
Adapted from Schmid (2011): Percentage of suffixed
adverb tokens
Suffixes
19
Register and Distribution of Affixed Lexemes
Adapted from Schmid (2011): Percentage of affixed tokens
20
Derivational Morphology & Register Previous studies: • Biber’s (1988) Multidimensional Analysis:
• Grammatical, lexical, discourse linguistic features
• Nominalisation (-ion, -ment, -ness, -ity) for abstract
concepts is the only DM marker of academic
language
• Biber at al. (1999); Schmid (2011): A selection of N, Adj,
Adv and V-forming suffixes compared between:
Conversation Written registers:
- Personal letters (Sch only)
- Fiction (B & Sch)
- Newspapers (B & Sch)
- Academic prose (B & Sch)
Split the BNC into the two spoken Components:
• Demographically Sampled (DS): everyday speech
• Context Governed (CG): speech in formal contexts
• Only 2 studies have compared DS/CG for 16 suffixes
Funding Application to British Academy (July 2015):
• Small Grant from September 2015 to July 2016:
• Split dataset into DS and CG components
• Workshop on Derivational Morphology:
• Morphology, Sociolinguistics, Corpus Linguistics
• Clinical and Educational applications
Refinement of MorphoQuantics
21
Two Spoken Components of the BNC
22
Context-governed (CG)
Categorized by domain:
Demographically Sampled
(DS) Sampled according to:
• Educational & informative: (e.g., educational demonstrations,
news commentaries)
• Respondent age
• Business: (e.g., company talks, interviews and
sales demos)
• Respondent gender
• Public or Institutional: (e.g., political speeches, sermons)
• Respondent social class
• Leisure: (e.g., sports commentaries, club
meetings, chat shows)
• Geographical region
Total CG token count:
6,175,896 words
Total DS token count:
4,233,962 words
Initial Results
23
Combining Forms
Word-Initial
419 151
Combining Forms
Word-Final
141 96
Prefixes
141 123
Suffixes
146 131
MQ DS CG
Total Prefixes 141 71% 87% NS
Normalised Types 436 607 p<0.0001
From: Laws & Ryder (under review)
• Focus on Suffixes:
• Suffixation Usage Patterns across word classes
• Case Study of verb-forming suffixes & register
Initial Results
24
Combining Forms
Word-Initial
419 151
Combining Forms
Word-Final
141 96
Prefixes
141 123
Suffixes
146 131
25
Semantics of Verb-forming Suffixes Semantic
Category
Meaning /
Paraphrase
Examples
-ate -en -ify -ize Locative put in(to) X codify containerize
Ornative provide with X chlorinate glorify patinize
Causative make (more) X darken
(trans)
diversify stabilize
(trans)
Resultative make into X gelate yuppify crystallize
Inchoative become X darken
(intrans)
calcify stabilize
(intrans)
Performative perform X speechify economize
Similative act like X Shelleyfy Powellize,
26
RQs: Verb-forming Suffixation
1) To what extent does speech formality affect suffix
category diversity (number of types) and suffix category
density (number of tokens)?
2) To what extent do the register-based vocabulary sets
overlap for each verb-forming suffix?
27
Category diversity and density across DS/CG
Types /
million
Types
CG/
DS
Tokens /
million
Tok’s
CG/
DS Suffix Ex’ple DS CG Sig DS CG Sig
-ize criticize 16 27 *** 1.71 263 640 **** 2.44
-en frighten 11 9 NS 0.80 81 119 **** 1.14
-ify classify 8 8 NS 0.94 35 275 **** 7.87
-ate activate 2 4 (*) 1.68 6 43 **** 7.60
Totals 37 47 * 1.27 385 1,077 **** 2.80
*p<0.05; *** p<0.001; **** p<0.0001 From: Laws & Ryder (under review)
Summary of Results
28
Register effects stronger than anticipated
• Formality affects category diversity and density
• Use of verbal derivatives increases speech formality
Each verbal category has individual profile re: RV
• -ize is most productive
• relevance of -ify and -ate most relevant (token frequencies)
in more formal contexts
• -en relatively unaffected by speech register
• No relationship between polysemy and diversity/density
• BNC spoken corpus disguises strong register effects
DM and the National Curriculum
29
Year 1 – Statutory Requirements:
• Adding prefixes and suffixes. Using un-
Year 2 – Statutory Requirements:
• Add suffixes to spell longer words, including -ment, -ness,
-ful, -less, -ly
Years 3 and 4 – Statutory Requirements:
• Apply knowledge of root words, prefixes and suffixes
(etymology and morphology) to understand meaning.
More affixes: dis-, mis- and in-; -ation, -sion, -ly, -ous, -cian
Years 5 and 6 – Statutory Requirements:
• Use knowledge of etymology and morphology in spelling.
More affixes: de-, over-, re-, -ible, -ate, -ize, -ify, -er, -ish
Characteristics of DM categories:
• Transparency of stem and affix(es)
• Polysemy of common affixes
• Impact of register on experimental design
• What norms to use when developing stimuli /
materials for participants who are:
• Children at Key Stages 1-5
• L2 learners of English (proficiency levels)
• Clinical populations
Implications of Findings
30
MQ provides a benchmark for comparing:
• Changes in language over time - BNC2014: Diachronic
comparisons over 20 years (CR)
• Register variation: spoken vs. written language (CR/JL)
Cross-disciplinary aspects of DM research:
• Sociolinguistic aspects of spoken language (TS)
• Child language development research (LB)
• Second Language acquisition (LD)
• Literacy programmes in academic writing (JC)
DM and continuing research
31
References
32
Biber, D. (1988). Variation across speech and writing. Cambridge: CUP.
Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman
grammar of spoken and written English. London: Longman.
BNCweb (CQP-Edition) Version 4.3, November 2013. https://bncweb.lancs.ac.uk.
British National Corpus, version 3, 2007. http://www.natcorp.ox.ac.uk.
Dixon, R.M.W. (2014). Making new words: morphological derivation in English. Oxford: OUP.
Laws, J.V. & Ryder, C. (under review). Register variation in spoken language: The case
of verb-forming suffixation. International Journal of Corpus Linguistics.
Laws, J.V. (in preparation). The order of acquisition of derivational morphology in 2-5
year old children with normally developing language.
Quinion, M. (2002). Ologies and isms: word beginnings and endings. Oxford: OUP.
Schmid, H-J. (2011). English morphology and word-formation. An introduction. Berlin: Erich
Schmidt Verlag.
Stein, G. (2007). A Dictionary of English Affixes: Their Function and Meaning. Munich:
Lincom Europa.