MorphoQuantics: researching complex words in spoken ...

32
Workshop on Derivational Morphology and Spoken Language MorphoQuantics: researching complex words in spoken English and its relevance to educational and clinical applications Jacqueline Laws Department of English Language & Applied Linguistics University of Reading, United Kingdom

Transcript of MorphoQuantics: researching complex words in spoken ...

Workshop on Derivational Morphology

and Spoken Language

MorphoQuantics: researching complex

words in spoken English and its relevance

to educational and clinical applications

Jacqueline Laws Department of English Language & Applied Linguistics

University of Reading, United Kingdom

• Derivational Morphemes:

• what types are there?

• how many are there?

• Aims of the Project and the BA’s contribution

• MorphoQuantics: Database of complex words

• A Case Study: Verb-forming Suffixation

• Applications:

• Sociolinguistics, Psycholinguistics, Morphology

• First and second Language acquisition

• Educational and Clinical settings

2

Overview

Affixes:

• Word-initial affixes un-kind, re-play

• Word-final affixes peer-age, south-ern

Combining Forms (neoclassical):

• Word-initial combining forms aristo-crat, mono-logue

• Word-final combining forms micro-cosm, philo-sophy __________________________________________________________________________

Neutral affixes (native):

• Germanic origins un-kind, delight-ful

o Transparent, no stress change, more informal

Non-neutral affixes (non-native):

• Latinate and Greek origins re-play, atom-ic

o Opaque, stress /sound change, more formal

Types of Derivational Morphemes

3

Word-Initial

DMs

Word-Final

DMs Total

Quinion (2002) 836 453 1,289

Stein (2007) 547 296 843

Dixon (2014) 100 119 219

Totals: all sources

containing

Combining Forms

974 525 1,499

OED 1,947 930 2,877

How many Derivational Morphemes?

4

MorphoQuantics 554 281 847

Combining Forms: 419 141 560

Affixes: 141 146 287

• Acquisition of derivational morphology in children:

• CHILDES database: children aged 2, 3, 4, and 5

• 100,000 tokens/group: 400K + 985K adult; 85 suffixes

The History MorphoQuantics

5

6

0

20

40

60

80

100

120

140

160

20H 30H 40H 50H

Su

ffix

Ty

pe

Fre

qu

ency

-ie2/-y4 (Prop-N)

-y1 (Adjective)

-ie1/-y3 (Noun)

-er2/-or2 (inanimate)

-er1/-or1 (animate)

-ly (Adverb)

-ette/-et

-ion (all types)

-ic (Adj & N)

-ous

-an (Adj & N)

-ful (Adj & N)

***

*

**

*

*

** p<0.05

** p<0.01

*** p<0.001

**** p<0.0001

Significant

Type increase

2 – 5 years

From: Laws, J.V. (in preparation)

• Acquisition of derivational morphology in children:

• CHILDES database: children aged 2, 3, 4, and 5

• 100,000 tokens/group: 400K + 985K adult; 85 suffixes

• Comparison with adult speech:

• Spoken component of the British National Corpus

• From 10,409,858 tokens MorphoQuantics

• First database of complex words in any language

• Website released in 2014 (http://morphoquantics.co.uk)

• 18,000 complex word types

• 1,093,000 complex word tokens

• Licence agreement with OED for etymology in 2015

• Users in Finland, Spain, Germany, Japan, France,

Slovakia, Mexico, New Zealand, UK, Korea

The History MorphoQuantics

7

8

9

Target Set of Derivational Morphemes

10

Combining Forms

Word-Initial

419

Combining Forms

Word-Final

141

Prefixes

141

Suffixes

146

How many Combining Forms / Affixes occurred in BNC?

Number of Target Combining Forms Observed in the Spoken BNC

11

36%

44% of

target

CFs

occurred

in BNC

68%

419

141 151

96

Number of Target Affixes Observed in the Spoken BNC

12

87% 89% of

target

affixes

occurred

in BNC

90%

141 146

123 131

Initial Results for Spoken Language

13

Combining Forms

Word-Initial

36%

Combining Forms

Word-Final

68%

Prefixes

87%

Suffixes

90%

To what degree does register affect complex word use?

14

Register and Distribution of Affixed Lexemes

Adapted from Schmid (2011): Percentage of prefixed

tokens

Suffixes

15

Register and Distribution of Affixed Lexemes

Adapted from Schmid (2011): Percentage of suffixed noun

tokens

Suffixes

16

Register and Distribution of Affixed Lexemes

Adapted from Schmid (2011): Percentage of suffixed

adjective tokens

Suffixes

17

Register and Distribution of Affixed Lexemes

Adapted from Schmid (2011): Percentage of suffixed verb

tokens

Suffixes

18

Register and Distribution of Affixed Lexemes

Adapted from Schmid (2011): Percentage of suffixed

adverb tokens

Suffixes

19

Register and Distribution of Affixed Lexemes

Adapted from Schmid (2011): Percentage of affixed tokens

20

Derivational Morphology & Register Previous studies: • Biber’s (1988) Multidimensional Analysis:

• Grammatical, lexical, discourse linguistic features

• Nominalisation (-ion, -ment, -ness, -ity) for abstract

concepts is the only DM marker of academic

language

• Biber at al. (1999); Schmid (2011): A selection of N, Adj,

Adv and V-forming suffixes compared between:

Conversation Written registers:

- Personal letters (Sch only)

- Fiction (B & Sch)

- Newspapers (B & Sch)

- Academic prose (B & Sch)

Split the BNC into the two spoken Components:

• Demographically Sampled (DS): everyday speech

• Context Governed (CG): speech in formal contexts

• Only 2 studies have compared DS/CG for 16 suffixes

Funding Application to British Academy (July 2015):

• Small Grant from September 2015 to July 2016:

• Split dataset into DS and CG components

• Workshop on Derivational Morphology:

• Morphology, Sociolinguistics, Corpus Linguistics

• Clinical and Educational applications

Refinement of MorphoQuantics

21

Two Spoken Components of the BNC

22

Context-governed (CG)

Categorized by domain:

Demographically Sampled

(DS) Sampled according to:

• Educational & informative: (e.g., educational demonstrations,

news commentaries)

• Respondent age

• Business: (e.g., company talks, interviews and

sales demos)

• Respondent gender

• Public or Institutional: (e.g., political speeches, sermons)

• Respondent social class

• Leisure: (e.g., sports commentaries, club

meetings, chat shows)

• Geographical region

Total CG token count:

6,175,896 words

Total DS token count:

4,233,962 words

Initial Results

23

Combining Forms

Word-Initial

419 151

Combining Forms

Word-Final

141 96

Prefixes

141 123

Suffixes

146 131

MQ DS CG

Total Prefixes 141 71% 87% NS

Normalised Types 436 607 p<0.0001

From: Laws & Ryder (under review)

• Focus on Suffixes:

• Suffixation Usage Patterns across word classes

• Case Study of verb-forming suffixes & register

Initial Results

24

Combining Forms

Word-Initial

419 151

Combining Forms

Word-Final

141 96

Prefixes

141 123

Suffixes

146 131

25

Semantics of Verb-forming Suffixes Semantic

Category

Meaning /

Paraphrase

Examples

-ate -en -ify -ize Locative put in(to) X codify containerize

Ornative provide with X chlorinate glorify patinize

Causative make (more) X darken

(trans)

diversify stabilize

(trans)

Resultative make into X gelate yuppify crystallize

Inchoative become X darken

(intrans)

calcify stabilize

(intrans)

Performative perform X speechify economize

Similative act like X Shelleyfy Powellize,

26

RQs: Verb-forming Suffixation

1) To what extent does speech formality affect suffix

category diversity (number of types) and suffix category

density (number of tokens)?

2) To what extent do the register-based vocabulary sets

overlap for each verb-forming suffix?

27

Category diversity and density across DS/CG

Types /

million

Types

CG/

DS

Tokens /

million

Tok’s

CG/

DS Suffix Ex’ple DS CG Sig DS CG Sig

-ize criticize 16 27 *** 1.71 263 640 **** 2.44

-en frighten 11 9 NS 0.80 81 119 **** 1.14

-ify classify 8 8 NS 0.94 35 275 **** 7.87

-ate activate 2 4 (*) 1.68 6 43 **** 7.60

Totals 37 47 * 1.27 385 1,077 **** 2.80

*p<0.05; *** p<0.001; **** p<0.0001 From: Laws & Ryder (under review)

Summary of Results

28

Register effects stronger than anticipated

• Formality affects category diversity and density

• Use of verbal derivatives increases speech formality

Each verbal category has individual profile re: RV

• -ize is most productive

• relevance of -ify and -ate most relevant (token frequencies)

in more formal contexts

• -en relatively unaffected by speech register

• No relationship between polysemy and diversity/density

• BNC spoken corpus disguises strong register effects

DM and the National Curriculum

29

Year 1 – Statutory Requirements:

• Adding prefixes and suffixes. Using un-

Year 2 – Statutory Requirements:

• Add suffixes to spell longer words, including -ment, -ness,

-ful, -less, -ly

Years 3 and 4 – Statutory Requirements:

• Apply knowledge of root words, prefixes and suffixes

(etymology and morphology) to understand meaning.

More affixes: dis-, mis- and in-; -ation, -sion, -ly, -ous, -cian

Years 5 and 6 – Statutory Requirements:

• Use knowledge of etymology and morphology in spelling.

More affixes: de-, over-, re-, -ible, -ate, -ize, -ify, -er, -ish

Characteristics of DM categories:

• Transparency of stem and affix(es)

• Polysemy of common affixes

• Impact of register on experimental design

• What norms to use when developing stimuli /

materials for participants who are:

• Children at Key Stages 1-5

• L2 learners of English (proficiency levels)

• Clinical populations

Implications of Findings

30

MQ provides a benchmark for comparing:

• Changes in language over time - BNC2014: Diachronic

comparisons over 20 years (CR)

• Register variation: spoken vs. written language (CR/JL)

Cross-disciplinary aspects of DM research:

• Sociolinguistic aspects of spoken language (TS)

• Child language development research (LB)

• Second Language acquisition (LD)

• Literacy programmes in academic writing (JC)

DM and continuing research

31

References

32

Biber, D. (1988). Variation across speech and writing. Cambridge: CUP.

Biber, D., Johansson, S., Leech, G., Conrad, S., & Finegan, E. (1999). The Longman

grammar of spoken and written English. London: Longman.

BNCweb (CQP-Edition) Version 4.3, November 2013. https://bncweb.lancs.ac.uk.

British National Corpus, version 3, 2007. http://www.natcorp.ox.ac.uk.

Dixon, R.M.W. (2014). Making new words: morphological derivation in English. Oxford: OUP.

Laws, J.V. & Ryder, C. (under review). Register variation in spoken language: The case

of verb-forming suffixation. International Journal of Corpus Linguistics.

Laws, J.V. (in preparation). The order of acquisition of derivational morphology in 2-5

year old children with normally developing language.

Quinion, M. (2002). Ologies and isms: word beginnings and endings. Oxford: OUP.

Schmid, H-J. (2011). English morphology and word-formation. An introduction. Berlin: Erich

Schmidt Verlag.

Stein, G. (2007). A Dictionary of English Affixes: Their Function and Meaning. Munich:

Lincom Europa.