CoreLex: Systematic Polysemy, Underspecification and Coercion Paul Buitelaar Unit for Natural...

Post on 17-Dec-2015

215 views 2 download

Transcript of CoreLex: Systematic Polysemy, Underspecification and Coercion Paul Buitelaar Unit for Natural...

CoreLex: Systematic Polysemy, Underspecification and Coercion

Paul Buitelaar

Unit for Natural Language ProcessingDigital Enterprise Research Institute - National University of Ireland, Galway

Copyright 2010 Digital Enterprise Research Institute. All rights reserved, Paul Buitelaar

Lexical semanticsAnalysis & representation of word meaning

A generative model of lexical semanticsRepresentation of word meaning that enables dynamic

creation of word meanings (‘senses’) on demand

An empirical foundation of the generative modelAnalysis of sense distribution across a large-scale semantic

lexicon

An ontological view of lexical semanticsReasoning over the ontology enables sense derivation

What is this talk about?

Lexical Semanticsword meaning, senseslexical semantic ambiguitysystematic polysemytype coercion, metonymy, bridging

Word MeaningWhat is the meaning of ‘ball’? (as a noun)http://dictionary.reference.com/browse/ball

ball1

1.a spherical or approximately spherical body or shape. He rolled the piece of paper into a ball. 2.a round or roundish body, of various sizes and materials, either hollow or solid, for use in games, as baseball, football, tennis, or golf.3.a game played with a ball, esp. baseball: The boys are out playing ball. 4.Military. a. a solid, usually spherical projectile for a cannon, rifle, pistol, etc., as distinguished from a shell. b. projectiles, esp. bullets, collectively. 5.Horticulture. a compact mass of soil covering the roots of an uprooted tree or other plant. 6.Literary. a planetary or celestial body, esp. the earth. 7.Mathematics. (in a metric space) the set of points whose distance from the zero element is less than, or less than or equal to, a specified number.

ball2

1.a large, usually lavish, formal party featuring social dancing and sometimes given for a particular purpose, as to introduce debutantes or benefit a charitable organization. 2.Informal. a thoroughly good time: Have a ball on your vacation!

Word MeaningWhat is the meaning of ‘ball’? (as a noun)http://dictionary.reference.com/browse/ball

spherical body or shape

lavish, formal party featuring social dancing

Lexical Semantic Ambiguity

ArtifactThe ball went over the fence

EventThe ball went on into the late hours

unrelated senses – “homonomy”

Lexical Semantic Ambiguity

ArtifactThe ball went over the fence

EventThe boys are out playing ball

related senses – “systematic polysemy”

Systematic Polysemy

Building The Boston office has been newly decorated

Organization The Boston office was founded in 1985

Group-of-PeopleThe Boston office called

related senses – “systematic polysemy”

Systematic Polysemy

Referred to in the literature as:‘regular polysemy’ (Apresjan 1973)

‘logical polysemy’ (Pustejovsky 1991, 1995 )’systematic polysemy’ (Nunberg & Zaenen 1992)

Systematic PolysemyBierwisch, Manfred: 1983, ‘Semantische und konzeptuelle Repraesentation lexikalischer Einheiten’, in R. Ruzicka and W.Motsch (eds.), Untersuchungen zur Semantik (Studia Grammatika 22), pp. 61–99. Akademie Verlag, Berlin

A group of peopleThe school went for an outing

A learning process

School starts at 8:30

An institution

The school was founded in 1910

A building

The school has a new roof

Systematic PolysemyHobbs, J. R. (1992). Metaphor and abduction. In A. Ortony, J. Salck, O. Stock (eds.) Communication from an Artificial Intelligence Perspective: Theoretical and Applied Issues, p35–58. Springer, Berlin

The Boston office called.

office:Organization coerced-into office:Group-of-People

Type Coercion & MetonymyThe Boston office called.

Coerce type of ‘office’ from Organization into Person

Metonymy – interpret a part as representing the whole

Person works-at Organization (person part-of office)

Coercion & Discourse Analysis

The Boston office called. They signed a new contract.

Co-reference resolution between ‘office’ and ‘they’

Coerce referent of ‘they’ to metonymic person of ‘office’

BridgingPeter bought a car. The engine runs well.

Accommodation of ‘the engine’ to ‘a car’

Lexical semantic inference: engine part-of car

A long book heavily weighted with military technicalities, in this edition it is neither so long nor so technical as it was originally.

[A long book heavily weighted with military technicalities]NP:event-physical_object-content

Event „a long book...“, „it is neither so long...“

> takes long to read – not physical length

Physical-object „heavily weighted...“

> the physical weight of the book

Content „ military technicalities...“, „nor so technical...“

> the content is technical

Underspecified Discourse Referents

Generative Lexical Semantics‘Generative Lexicon Theory’

‘Qualia Structure’

I began the book

Type coercion: direct-object of ‘begin’ requires an event

Infer an event from the lexical semantics of “book” as represented by its ‘Qualia Structure’ (Pustejovsky 1995)

For example: I began (reading) the book

“there is a system of relations that characterizes the semantics of nominals very much like the argument structure of a verb … Essentially the qualia structure of a noun determines its meaning as much as the list of arguments determines a verb’s meaning.” (Pustejovsky 1989)

Type Coercion

Formal (inheritance: is-a, hyponymy)physical-object, content, …

Constitutive (modification: part-of, meronymy)section, …

Telic (purpose: ‘what is the object used for’)read, …

Agentive (causality: ‘how did the object originate’)

write, …

Qualia Structure for ‘book’

Qualia Structure for ‘book’

book

section

Constitutive

Agentive„book“

write

Telicread

content

Formal

phys-obj

Problematic Issues - Formal

book

section

Constitutive

Agentive„book“

write

Telicread

Formal

contentphys-obj

Problematic Issues - Constitutive

book

section

Constitutive

Agentive„book“

write

Telicread

… …

content

Formal

phys-obj

Problematic Issues – Telic/Agentive

book

section

Constitutive

Agentive„book“

write

Telicread

event

? Formal

content

Formal

phys-obj

Treat QS as a ‘condensed ontology’ QS provides a gateway in meaning potential QS roles as ‘shortcuts’ for ontology inference paths

Condense QS even further into a ‘complex class’ Aggregate all types that can be reached through the QS

(ontology) into a ‘systematic polysemous class’ Each systematic polysemous class introduces a set of

underspecified lexical semantic objects CoreLex approach (‘sense clustering’)

Two Approaches

QS as ‘Condensed Ontology’

book

communication

isa

phys-obj

FormalFormal

QS as ‘Condensed Ontology’

book contenthasPart

cover

hasPart

pages

lining

titlechapter index

isa

communication

isa

phys-object

isa

Constitutive

QS as ‘Condensed Ontology’

Agentive & Telic

book

communication

isa

phys-obj

readingisa

event

writing

isUsedFor

hasCreationProcess

printinghasProductionProcess

QS as ‘Condensed Ontology’

Agentive & Telic

book

communication

isa

phys-obj

readingisa

event

writing

isUsedFor

hasCreationProcess

printinghasProductionProcess

“They printed some very interesting books.”

QS as ‘Complex Class’

book

cover

Constitutive

Agentive„book“

write

Telicread

section

communication

Formal

phys-obj eventphys-obj_communication_event

QS as ‘Complex Class’

book

cover

Constitutive

Agentive„book“

write

Telicread

sectionFormal

phys-obj_communication_event

Empirical foundation of generative modelCoreLex, WordNet

Systematic Polysemous Classes

Basic Types

Automatic Qualia Structure Acquisition CoreLex is an attempt to automatically acquire underspecified

lexical semantic representations that reflect systematic polysemy

These representations can be viewed as shallow Qualia Structures

Sense Distribution in WordNet Systematic polysemy can be empirically studied in WordNet by

observing sense distributions

>> If more than two words share the same sense distribution (i.e. have the same set of senses), then this may indicate a pattern of systematic polysemy (adapted from Apresjan 1973)

CoreLex (PhD thesis, Buitelaar 1998)

Lexical Semantic ResourceSemantic Lexicon

– Maps words to meanings (senses)

Lexical Database– Machine readable (has a formal structure)

Freely availablehttp://wordnet.princeton.edu/

WordNet

“In 1985 a group of psychologists and linguists at Princeton University undertook to develop a lexical database … The initial idea was to provide an aid to use in searching dictionaries conceptually, rather than merely alphabetically … WordNet instantiates hypotheses based on results of psycholinguistic research … expose such hypotheses to the full range of the common vocabulary”

In anomic aphasia, there is a specific inability to name objects. When confronted with an apple, say, patients may be unable to utter ‘‘apple,’’ even though they will reject such suggestions as shoe or banana, and will recognize that apple is correct when it is provided. (Caramazza/Berndt 1978)

Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller. Introduction to WordNet: an on-line lexical database. In: International Journal of Lexicography 3 (4), 1990, pp. 235 - 244.

WordNet - Origins

WordNet is organized around word meaning (not word forms as with traditional lexicons) Word meaning is represented by “synsets” Synset is a “Set of Synonyms”

Example {board, plank}

– Piece of lumber

{board, committee}– Group of people

WordNet Synsets

Synsets and Senses

Synsets represent word meaning Words that occur in several synsets have a

corresponding number of meanings (senses)

Synsets are organized in hierarchies Defines:

– generalization (hypernymy)– specialization (hyponymy)

Example

{entity}

{whole, unit}

{building material}

{lumber, timber}

{board, plank}

Synset Hierarchy

hyponymyhypernymy

Hierarchy Example (WordNet 2.1)

Noun1 Nounn

Basic Type1 Basic Type1

Systematic Polysemous Class1

Systematic Polysemous Classn

From WordNet to CoreLex

book 1.{publication} => artifact2.{product, production} => artifact 3.{fact} =>

communication 4.{dramatic_composition, dramatic_work} =>

communication 5.{record} =>

communication 6.{section, subdivision} =>

communication 7.{journal} => artifact

From Synsets to Basic Types

“artifact communication”

amulet annals armband arrow article ballad bauble beacon bible birdcall blank blinker boilerplate book bunk cachet canto catalog catalogue chart chevron clout compact compendium convertible copperplate copy cordon corker ... guillotine homophony horoscope indicator journal laurels lay ledger loophole marker memorial nonsense novel obbligato obelisk obligato overture pamphlet pastoral paternoster pedal pennant phrase platform portrait prescription print puzzle radiogram rasp recap riddle rondeau … statement stave stripe talisman taw text tocsin token transcription trophy trumpery wand well whistle wire wrapper yardstick

… to Systematic Polysemous Classes

animal natural_objectalligator broadtail chamois ermine leopard muskrat ...

natural_object plantalgarroba almond anise baneberry butternut candlenut...

action artifact group_socialartillery assembly band dance gathering institution ...

action attribute event psychologicalappearance decision deviation impulse outrage …

possession quantity_definitecent centime dividend gross penny real shilling

Other Examples

Problematic Cases

Non-Systematic Classes

action animal artifactbat drill fly hobby ruff solitaire spat

Partly-Systematic Classes

action geographical_locationbolivia caliphate charleston chicago clearing emirate michigan prefecture repair ...

Systematic clearing, repair, wheeling

Homonyms bolivia, charleston, chicago, michigan

?? caliphate, emirate, prefecture

8283 art artifact artefact6606 act act human action human activity6303 hum person individual someone mortal human soul4933 grb biological group4137 atr attribute 3456 psy psychological feature3336 com communication2703 anm animal animate being beast brute creature fauna2311 plt plant flora plant life2266 sta state1541 fod food nutrient 1282 log region geographical location1277 nat natural object water land1189 sub substance matter 1082 evt event 992 prt part piece 940 grs social group people 777 qud definite quantity 773 pro process 699 chm compound chemical compound chemical element element 628 tme time period period period of time amount of time time unit unit of time time 624 agt causal agent cause causal agency 571 pos possession 567 loc location any other location 506 rel relation 420 frm shape form 345 grp group grouping any other group 342 phm phenomenon 295 qui indefinite quantity 186 pho object inanimate object physical object 178 mic microorganism 100 lme linear measure long measure 61 lfr life form organism being living thing 57 cel cell 38 mea measure quantity amount quantum 28 ent entity 21 con consequence effect outcome result upshot 21 spc space 8 abs abstraction

Basic Types

acp act attribute process psychological-feature stateacr act attribute event relation stateacs act stateaes act event stateaev act eventage act causal agentagh causal agent humanagl causal agent locationage causal agent animalagp causal agent psychological-featureagt causal agentanf animal foodann animal artifact natural-objectanp animal psychological-featureaqu artifact quantity-definite quantity-indefiniteara artifact attribute psychological-feature statearg artifact grouparh artifact humanarp artifact psychological-feature stateart artifact stateatc attribute communication phenomenon psychological-feature state…

Systematic Polysemous Classes

CoreLex vs. WordNet

animal artifact natural-object

act state substance contamination dirt; dilution emanation infusion kindling lick packing rime rinse; alloy carbuncle impurity plasma soil

CoreLex is available from http://www.cs.brandeis.edu/~paulb/CoreLex/

corelex.html

Provides a coarse-grained semantic lexicon Covers around 40.000 nouns, assigned to 126

underspecified semantic classes

Allows for coarse-grained semantic tagging 126 underspecified semantic tags vs. 60.000 synset-

based senses

CoreLex Semantic Lexiocn

Ontologies and lexical semanticsontology-driven sense derivation

ontological/semantic & linguistic/lexical structure

integration of ontologies and lexicons

‘lemon’ : lexicon model for ontologies

Ontology-driven sense derivation

book

communication

isa

phys-obj

„book“

Ontology-driven sense derivation

book

communication

isa

phys-obj

„book“

Ontology-driven sense derivation

book

communication

isa

phys-obj

readingisa

event

writing

isUsedFor

hasCreationProcess

printinghasProductionProcess

„book“

Ontology-driven sense derivation

book

communication

isa

phys-obj

readingisa

event

writing

isUsedFor

hasCreationProcess

printinghasProductionProcess

„book“

Ontology-driven sense derivation

book

communication

isa

phys-obj

readingisa

event

writing

isUsedFor

hasCreationProcess

printinghasProductionProcess

„book“

Ontology-driven sense derivation

book

communication

isa

phys-obj

reading

event

writing

isUsedFor

hasCreationProcess

printinghasProductionProcess

„book“

Ontology-driven sense derivation

„office“

office

organization

building

person

has-address

located-at

representation-of

works-at

works-for

Coercion (Metonymy)

„The Boston office called. They asked for a new price.“

Ontology-driven sense derivation

„office“

office

organization

building

person

has-address

located-at

representation-of

works-at

works-for

Term-1

call

hasOrthographicForm

EN

hasLang

hasMorphSynInfo

WordForm-1

V

hasPoS

Arg-1

hasArg

Arg-2

hasArg

SUBJ

hasGramFunc

CALL

NP

hasPhraseType

PERSON

hasAgent

ORGANIZATIONworksFor

OFFICE

isa

“The Boston office called.”

Mapping Lexical to Semantic Structure

Term-1

call

hasOrthographicForm

EN

hasLang

hasMorphSynInfo

WordForm-1

V

hasPoS

Arg-1

hasArg

Arg-2

hasArg

SUBJ

hasGramFunc

CALL

NP

hasPhraseType

PERSON

hasAgent

ORGANIZATIONworksFor

OFFICE

isa

Type Coercion

Mapping Lexical to Semantic Structure

“The Boston office called.”

Term-1

call

hasOrthographicForm

EN

hasLang

hasMorphSynInfo

WordForm-1

V

hasPoS

Arg-1

hasArg

Arg-2

hasArg

SUBJ

hasGramFunc

CALL

NP

hasPhraseType

PERSON

hasAgent

ORGANIZATIONworksFor

OFFICE

isa

Type Coercion

(Hobbs: ‘abductive reasoning’)

Mapping Lexical to Semantic Structure

“The Boston office called.”

Term-1

call

hasOrthographicForm

EN

hasLang

hasMorphSynInfo

WordForm-1

V

hasPoS

Arg-1

hasArg

Arg-2

hasArg

hasGramFunc

LingInfo

instanceOf

CALL

hasLingInfo

NP

hasPhraseType

PERSON

hasAgent

ORGANIZATIONworksFor

SCHOOL

isa

Mapping Lexical to Semantic Structure

SUBJ

Connect Ontological and Lexical

Structure

Lexicon Model for Ontologies

Acknowledgements & Further Info CoreLex

http://pages.cs.brandeis.edu/~paulb/CoreLex/corelex.html

lemon (http://lexinfo.net/)

http://greententacle.techfak.uni-bielefeld.de/drupal/sites/default/files/ontologies/lemon.owl

http://greententacle.techfak.uni-bielefeld.de/drupal/sites/default/files/lemon-cookbook.pdf

Grant support

Science Foundation Ireland Grant No. SFI/08/CE/I1380 for Lion-2 http://nlp.deri.ie/

EU FP7 Grant No. 248458 for the Monnet project on Multilingual Ontologies for Networked Knowledge http://www.monnet-project.eu