1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe...

44
1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsi nki.fi Department of General Linguistics University of Helsinki Juhani Järvikivi juhani.jarvikivi@jo ensuu.fi General Linguistics University of Joensuu

Transcript of 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe...

Page 1: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

1

Verbal Synonymy in Practice: Combining Corpus-based and

Psycholinguistic EvidenceAntti Arppe

[email protected]

Department of General Linguistics

University of Helsinki

Juhani Järvikivi

[email protected]

General Linguistics

University of Joensuu

Page 2: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

2

Table of contents

• Background

• Goals of this research

• Some words about synonymy

• Corpus-based results

• Psycholinguistic test results

• Combining and interpreting the evidence

• Conclusion

Page 3: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

3

Background: traditional descriptions of synonyms and

their usage• lexical descriptions that contain information about

synonyms, i.e. general dictionaries or specific synonym dictionaries/thesauri, rarely provide extensive and/or explicit information on the usage or contextual limitations of these synonyms or their interchangability

• synonyms are actually used to describe each other• Examples – cognitive verbs ~ think/ponder:

– Collins Cobuild English dictionary (2001)• corpus-based

– Comprehensive dictionary of Finnish (i.e. PSK 1990/1997)• word-card corpus-based

Page 4: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

4

Collins– ponder

• If you ponder something, you think about it carefully• I found myself constantly pondering the question: ’How could

anyone do these things’ ... The prime minister pondered on when to go to the polls .. I’m continually pondering how to improve the team

• V n | V on/over n | V wh | ALSO V

– deliberate • [3/3] if you deliberate, you think about something carefully,

especially before making a very important decision• She deliberated over the decision for a good few years before

she finally made up her mind ... The six-person jury deliberated about two hours before returning with the verdict ... The Court of Appeals has been deliberating his case for almost two weeks

• V prep | V | V n

Page 5: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

5

What are the indicated differences between ponder and

deliberate• frequency

– ponder vs. deliberate • description

– if you deliberate/ponder, you think about something carefully ...

• deliberate: ... especially before making a very important decision

• syntax– common: V n | V– ponder: V on/over n | V wh– deliberate: V prep

Page 6: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

6

PSK

– [1/2] miettiä• ajatella, harkita, pohtia, punnita, tuumia, aprikoida, järkeillä,

mietiskellä

• Mitä mietit? ... Asiaa täytyy vielä miettiä .. Mietin juuri, kannattaako ollenkaan lähteä ... Vastasi sen enempää miettimättä. ... Mietti päänsä puhki.

– pohtia• ajatella jotakin perusteellisesti, eri mahdollisuuksia arvioiden,

harkita, miettiä, tuumia, ajatella, järkeillä, punnita, aprikoida

• Pohtia arvoitusta, ongelmaa ... Pohtia kysymystä joka puolelta ... Pohtia keinoja asian auttamiseksi.

Page 7: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

7

A rough English approximation of the PSK examples for pohtia

and miettiä– miettiä ~ M-think

• think, consider, ponder, weigh, muse, wonder, think rationally, contemplate

• What are you thinking about? ... One still has to think about the issue ... I’m thinking right now, is it any worth going at all ... Answered withing any further thought ... Pondered his head ”off”

– pohtia ~ ponder• consider something thoroughly, evaluating every possibility,

consider, M-think, muse, think, think rationally, weigh, wonder

• ponder a puzzle, problem ... Consider the issue from every angle ... Consider ways to improve the situation

Page 8: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

8

What are the differences between miettiä and pohtia

• descriptions– common: ajatella ~ think, harkita ~ consider, tuumia ~

muse, järkeillä ~ think rationally, punnita ~ weigh, aprikoida ~ wonder

– miettiä: mietiskellä ~ contemplate, meditate

– pohtia: ajatella jotakin perusteellisesti, eri mahdollisuuksia arvioiden ~ consider something thoroughly, evaluating the different possibilities

• no differences indicated in grammatical usage

Page 9: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

9

Background: Linguistic studies on synonym usage

• numerous studies have shown that a wide range of factors influence which word in a synonym group is actually chosenSynonyms are not as fully interchangable as

they have been naively interpreted

These studies are typically corpus-based

Page 10: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

10

Linguistic studies cont’d

• These factors include e.g.– register, intended style, situation (Zgusta 1971, Biber

1998)– lexical and syntactic context (Biber 1998)– functional context (Atkins 199x)– (word-internal) morphological context, i.e. inflected

form (Arppe 2002) Sinclair (1991) has further argued, that each inflected

form of a lexeme could in principle have independent usage contexts, e.g. concerning collocatess

Page 11: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

11

Goals of this study

• The factors that have been noted to influence the selection and usage of synonyms have been observed mainly using large corpora Do the corpus-based results on differences in the usage

of synonyms match the linguistic intuitions of native speakers, i.e. subjective acceptability ratings

How could combining two types of linguistic evidence be used to enhance existing lexicographical descriptions of word usage

Page 12: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

12

A few words on synonymy

• as a premise absolute synonymy, i.e. full interchangeability in all possible contexts, is not expected to exist in practice or to be found in the corpora or otherwise

• on a naive level synonymy is believed to exist, as speakers of a language feel that some words can be interchanged with each other without an essential change in the meaning and connotations of an utterance

• synonymy is interpreted as near-synonymy in this study

Page 13: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

13

A description of synonymy (Cruse 2000: 156-160)

• ”based on empirical, contextual evidence”• ”synonyms are words

1) whose semantic similarities are more salient than their differences

2) that do not primarily contrast with each other; and

3) whose permissible differences must in general be either minor, backgrounded, or both”

Page 14: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

14

The corpus-based study

• A refinement of Arppe (2002)• based on lexicographical sources (descriptions,

examples) and frequency information a pair of Finnish cognitive verbs had been chosen miettiä and pohtia ~ think, consider, ponder

• approximately 2 million words of Finnish newspaper text

• automatically morphosyntactically analyzed using Connexor’s Functional dependency (FDG) parser

Page 15: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

15

The corpus-based study (cont’d)

• all instances of the selected two verbs and selected argument types (agent) were manually identified and the analyses were corrected if necessary

• the agents were manually semantically classified according to WordNet (Miller et al. 1991)

• t-score (Church et alii 1991) is used to highlight the differences in the frequency of contextual features morpho-syntactic features considered similar to lexemes (that Church et alii observed)

Page 16: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

16

Judgements in synonymy: pohtia

• Hallitus pohtii lähiviikkoina, pitääkö se kiinni lupauksestaan painaa valtion menot vuonna 1995 reaalisesti vuoden 1991 tasolle.

The government is considering in the coming weeks whether it will keep its promise to push public spending in 1995 down to the level of 1991.

• ??? Hallitus miettii lähiviikkoina, pitääkö se kiinni lupauksestaan painaa valtion menot vuonna 1995 reaalisesti vuoden 1991 tasolle.

Page 17: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

17

Judgements in synonymy: pohtia

• Työryhmässä oli erillinen jaos, joka pohti moottorikelkkailua Lapin läänissä.

There is a separate subgroup in the working group which was considering motor-sledding in the province of Lapland

• ??? Työryhmässä oli erillinen jaos, joka mietti moottorikelkkailua Lapin läänissä.

Page 18: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

18

Judgements in synonymy: pohtia

• Nato pohtii laajentamiskysymystä kokouksessaan Brysselissä.

Nato is considering the issue of expansion in its meeting in Brussels.

• ??? Nato miettii laajentamiskysymystä kokouksessaan Brysselissä.

Page 19: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

19

Judgements in synonymy: miettiä

• Mietin muuttoa pari vuotta, laskin yhteen plussia ja miinuksia.

I considered moving for a couple of years, I counted together the plusses and minuses.

• ??? Pohdin muuttoa pari vuotta, laskin yhteen plussia ja miinuksia.

Page 20: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

20

Judgements in synonymy: miettiä

• Aina kun mietin, että synnyttäisin lapsen, ajatus tuntui mahdottomalle.

Always when I’m considering that I would give birth to a child, the thought seems inconceivable.

• Aina kun pohdin, että synnyttäisin lapsen, ajatus tuntui mahdottomalle.

Page 21: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

21

Obvious conclusions?

• pohtia is tilted toward collective human subjects such as eduskunta ’parliament’, jaos ’subdivision’ or Nato ’NATO’

• miettiä is tilted towards individual, personal subjects, as in the 1st person singular

Page 22: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

22

The corpus strikes back I

• ... miksi Suomessa jopa eduskunta miettii milloin kaupan ovi saa olla auki?

... why in Finland even the Parliament is considering when a shop can have its doors open?

• MTK miettii ehtoja tänään.

MTK is considering its negotiation terms today.

Page 23: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

23

The corpus strikes back II

• Liikenneministeriön työryhmä miettii parhaillaan, miten tunnuksettomia

puheluita pitäisi kohdella.

A working group in the Transport Ministry is presently considering how non-prefixed calls should be treated.

• Yhtä kuitenkin pohdin.

There is one issue that I’m considering.

Page 24: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

24

Preliminary conclusions

• the two verbs are more interchangeable, i.e. synonymous, than one would suspect at first

collective human subjects can be used also with miettiä

individual, personal subject can be used also with pohtia

Page 25: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

25

Data on the occurrences of the two verbs

• 410 occurrences of miettiä 49 unique word forms

• 445 occurrences of pohtia representing 45 unique word forms

• 25 of the morphological analyses were common• active indicative present tense third person

singular was the most frequent form 85 occurrences of miettii145 occurrences of pohtii

Page 26: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

26

Corpus-based results – morphological preferences

t-score Fisher’s exact test Verb nfeature,verb/ nfeature,total

Morpho-syntactic feature

2.358 1.000000 miettiä 24/26 0_SG1 2.148 1.000000 pohtia 206/336 0_SG3 -2.705 0.000013 miettiä 130/336 0_SG3 -8.170 0.000001 pohtia 2/26 0_SG1

Page 27: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

27

Corpus-based results – preferences of agent types

t-score Fisher’s exact test

Verb nfeature,verb/nfeature,total

Semantic category of subject/agent

1.908 1.0000 pohtia 34/44 SEM_HUMAN_GROUP 1.844 1.0000 pohtia 155/254 SEM_HUMAN_INDIVIDUAL 0.679 1.0000 pohtia 2/2 SEM_COGNITION 0.560 0.9089 miettiä 4/6 SEM_LOCATION 0.480 1.0000 pohtia 1/1 SEM_ACTIVITY 0.0 0.2700 miettiä 0/2 SEM_COGNITION 0.0 0.5199 miettiä 0/1 SEM_ACTIVITY -0.791 0.3067 pohtia 2/6 SEM_LOCATION -2.307 0.0004 miettiä 99/254 SEM_HUMAN_INDIVIDUAL -3.518 0.0004 miettiä 10/44 SEM_HUMAN_GROUP

Page 28: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

28

Corpus-based results - summary

• there seemed to be statistically significant differences in the preferences of either verb according to the person and countability of the agent

• 1st person singular frames prefer miettiä

• 3rd person singular collective human frames prefer pohtia

Page 29: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

29

Psycholinguistic Experiments

• Two off-line experiments– Forced Choice– Acceptability Rating

• Hypotheses based on the corpus-based results– 1st person singular agents (1SG) prefer miettiä– 3rd person collective agents (3COLL) prefer

pohtia

Page 30: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

30

XP 1: Forced choice

Materials• 31 sentence triplets with 31 sentence frames and

three different verbs for each triplet, e.g.,• Anu Joutsasta pohti hetken ~ Anu from Joutsa thought for a

moment• Anu Joutsasta mietti hetken• Anu Joutsasta ajatteli hetken

• The materials were constucted by using (slightly edited) natural instances with either experimental verb as the sentence frame for the other(s) the source of the natural instances was the same corpus

as in the corpus-based study

Page 31: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

31

Forced choice (cont’d)

• The two experimental verbs (pohtia vs. miettiä) and the fillers (ajatella) were presented semi-randomized within each triplet in the appropriate inflected form.

• The participants were instructed to select the most natural sentence from each triplet and check the appropriate box on the experimental sheet.

• 21 Finnish native speakers participated in the Experiment

Page 32: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

32

Results: XP 1(1) (N=520)

% miettiä pohtia

1sg 45.0 10.4

3sg 35.8 31.9

3coll 19.2 57.70,0

10,0

20,0

30,0

40,0

50,0

60,0

1sg 3sg 3coll

%

Miettiä

Pohtia

Page 33: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

33

Results: XP1(2)

• The overall distribution of responses differed significantly from chance (2 , p < .0001)

• The 1SG agent clearly preferred the verb miettiä (2 , p < .001)

• The 3SG-COLLECTIVE agent had a clear preference for the verb pohtia (2 , p < .001)

• There was no preference either way in the 3SG (non-collective) category (2 , n.s.)

Page 34: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

34

XP 2: Acceptability rating• Sentence frames with each Agent Type (1SG,

3SG & 3COLL) – 21 frames each – were used to construct the experimental sentences with both the verbs miettiä and pohtia as well as the closely related verb filler ajatella ~ think (generic) The sentence frames were based on natural instances

extracted from the corpus used in the corpus-based study

1/3 of the sentences had the original verb in the corpus, 2/3 had another verb in the corresponding form

• this amounted to 63 test sentences per test subject• 40 filler sentences were constructed with the verbs

käsittää and ymmärtää ~ understand (20 + 20)

Page 35: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

35

Acceptability rating (cont’d)

• The experimental sentences as well as the sentences with ajatella were counter-balanced over three experimental lists

• Each list included the same 40 filler sentences• Altogether 103 sentences were presented

randomized on three experimental sheets• The verbs were presented in angle brackets, e.g.,

Anu Joutsasta <ajatteli> hetken

Page 36: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

36

Acceptability rating (cont’d)

• The three sheets were distributed to 54 Finnish native speakers (as) evenly (as possible)

• The participants were instructed to evaluate the acceptability of each verb in the sentence frame on a scale of 1-7 by checking the appropriate box on the sheet.

Page 37: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

37

Mean Acceptability Scores XP2

miettiä pohtia1SG 5.6 5.2

3SG 5.3 5.6

3COLL 4.5 5.4

Page 38: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

38

Mean Acceptability Scores XP2

3,0

3,54,0

4,5

5,05,5

6,0

1sg 3sg 3coll

MAS miettiä

pohtia

Page 39: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

39

• Significant main effect of Agent Type

• Significant interaction of Agent Type and Verb

Agent Type significant with miettiä but not with pohtia

• miettiä: 3COLL significantly less acceptable than either 1SG or 3SG (p<.001), no difference between 1SG and 3SG (p>.2)

Page 40: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

40

• Within the three Agent Types:

– SG1: miettiä significantly more acceptable than pohtia (p < .01)

– SG3: no significant difference (p > .1)

– 3COLL: miettiä significantly less acceptable than pohtia (p < .001)

Page 41: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

41

Discussion

• both the corpus-based evidence and the psycholinguistic test results converge

• the psycholinguistic test results deepen the picture that the corpus provides and give an explanation for the mechanism that drives the selection of either verb in a particular context/frameA word can be selected simply because the

alternative is not preferred

Page 42: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

42

Relationships between the different types of evidence

• the forced choice tests reflect normal actual usage situations (~ performance) and thus mirror the corpus-based results

• the acceptability tests reflect the general linguistic insights about what is considered possible and what is not (~ competence) sounds like building blocks for generative

descriptions

Page 43: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

43

Conclusions

• the two types of empirical definitely show that the two near-synonymous verbs differ in usage regarding the studied features

• combining two types of empirical linguistic evidence can be used to enhance and enrich lexical descriptions

Page 44: 1 Verbal Synonymy in Practice: Combining Corpus-based and Psycholinguistic Evidence Antti Arppe antti.arppe@helsinki.fi Department of General Linguistics.

Questions, Comments, Critique, Discussion