Natural Language Processing (CSEP 517): Predicate-Argument ... · I Fillmore (1968), among others,...

Post on 08-Jul-2020

4 views 0 download

Transcript of Natural Language Processing (CSEP 517): Predicate-Argument ... · I Fillmore (1968), among others,...

Natural Language Processing (CSEP 517):Predicate-Argument and Compositional Semantics

Noah Smithc© 2017

University of Washingtonnasmith@cs.washington.edu

May 8, 2017

1 / 113

To-Do List

I Online quiz: due Sunday

I Jurafsky and Martin (2016); (Jurafsky and Martin, 2008, ch. 18), Steedman(1996)

I A4 due May 14 (Sunday)

2 / 113

Semantics vs. Syntax

Syntactic theories and representations focus on the question of which strings in V† arein the language.

Semantics is about understanding what a string in V† means.

Sidestepping a lengthy and philosophical discussion of what “meaning” is, we’llconsider two meaning representations:

I Predicate-argument structures, also known as event frames

I Truth conditions represented in first-order logic

3 / 113

Motivating Example: Who did What to Who(m)?

I Warren bought the stock.

I They sold the stock to Warren.

I The stock was bought by Warren.

I The purchase of the stock by Warren surprised no one.

I Warren’s stock purchase surprised no one.

4 / 113

Motivating Example: Who did What to Who(m)?

I Warren bought the stock.

I They sold the stock to Warren.

I The stock was bought by Warren.

I The purchase of the stock by Warren surprised no one.

I Warren’s stock purchase surprised no one.

5 / 113

Motivating Example: Who did What to Who(m)?

I Warren bought the stock.

I They sold the stock to Warren.

I The stock was bought by Warren.

I The purchase of the stock by Warren surprised no one.

I Warren’s stock purchase surprised no one.

6 / 113

Motivating Example: Who did What to Who(m)?

I Warren bought the stock.

I They sold the stock to Warren.

I The stock was bought by Warren.

I The purchase of the stock by Warren surprised no one.

I Warren’s stock purchase surprised no one.

7 / 113

Motivating Example: Who did What to Who(m)?

I Warren bought the stock.

I They sold the stock to Warren.

I The stock was bought by Warren.

I The purchase of the stock by Warren surprised no one.

I Warren’s stock purchase surprised no one.

In this buying/purchasing event/situation, Warren played the role of the buyer, andthere was some stock that played the role of the thing purchased.

8 / 113

Motivating Example: Who did What to Who(m)?

I Warren bought the stock.

I They sold the stock to Warren.

I The stock was bought by Warren.

I The purchase of the stock by Warren surprised no one.

I Warren’s stock purchase surprised no one.

In this buying/purchasing event/situation, Warren played the role of the buyer, andthere was some stock that played the role of the thing purchased.

Also, there was presumably a seller, only mentioned in one example.

9 / 113

Motivating Example: Who did What to Who(m)?

I Warren bought the stock.

I They sold the stock to Warren.

I The stock was bought by Warren.

I The purchase of the stock by Warren surprised no one.

I Warren’s stock purchase surprised no one.

In this buying/purchasing event/situation, Warren played the role of the buyer, andthere was some stock that played the role of the thing purchased.

Also, there was presumably a seller, only mentioned in one example.

In some examples, a separate “event” involving surprise did not occur.

10 / 113

Semantic Roles: Breaking

I Jesse broke the window.

I The window broke.

I Jesse is always breaking things.

I The broken window testified to Jesse’s malfeasance.

11 / 113

Semantic Roles: Breaking

I Jesse broke the window.

I The window broke. ?

I Jesse is always breaking things.

I The broken window testified to Jesse’s malfeasance.

A breaking event has a Breaker and a Breakee.

12 / 113

Semantic Roles: Eating

I Eat!

I We ate dinner.

I We already ate.

I The pies were eaten up quickly.

I Our gluttony was complete.

13 / 113

Semantic Roles: Eating

I Eat! (you, listener) ?

I We ate dinner.

I We already ate. ?

I The pies were eaten up quickly. ?

I Our gluttony was complete. ?

An eating event has an Eater and Food, neither of which needs to be mentionedexplicitly.

14 / 113

Abstraction?

Breaker?= Eater

15 / 113

Abstraction?

Breaker?= Eater

Both are actors that have some causal responsibility for changes in the world aroundthem.

16 / 113

Abstraction?

Breaker?= Eater

Both are actors that have some causal responsibility for changes in the world aroundthem.

Breakee?= Food

17 / 113

Abstraction?

Breaker?= Eater

Both are actors that have some causal responsibility for changes in the world aroundthem.

Breakee?= Food

Both are greatly affected by the event, which “happened to” them.

18 / 113

Thematic Roles(Jurafsky and Martin, 2016, with modifications)

Agent The waiter spilled the soup.

Experiencer John has a headache.

Force The wind blows debris from the mall into our yards.

Theme Jesse broke the window

Result The city built a regulation-size baseball diamond .

Content Mona asked, “ You met Mary Ann at a supermarket? ”

Instrument He poached catfish, stunning them with a shocking device .

Beneficiary Ann Callahan makes hotel reservations for her boss .

Source I flew in from Boston .

Goal I drove to Portland .

19 / 113

Verb Alternation Examples: Breaking and Giving

Breaking:

I Agent/subject; Theme/object; Instrument/PPwith

I Instrument/subject; Theme/object

I Theme/subject

Giving:

I Agent/subject; Goal/object; Theme/second-object

I Agent/subject; Theme/object; Goal/PPto

Levin (1993) codified English verbs into classes that share patterns (e.g., verbs ofthrowing: throw/kick/pass).

20 / 113

Remarks

I Fillmore (1968), among others, argued for semantic roles in linguistics.

I By now, it should be clear that the expressiveness of NL (at least English) makessemantic analysis rather distinct from syntax.

I General challenges to analyzing semantic roles:I What are the predicates/events/frames/situations?I What are the roles/participants for each one?I What algorithms can accurately identify and label all of them?

21 / 113

Remarks

I Fillmore (1968), among others, argued for semantic roles in linguistics.

I By now, it should be clear that the expressiveness of NL (at least English) makessemantic analysis rather distinct from syntax.

I General challenges to analyzing semantic roles:I What are the predicates/events/frames/situations?I What are the roles/participants for each one?I What algorithms can accurately identify and label all of them?

22 / 113

Remarks

I Fillmore (1968), among others, argued for semantic roles in linguistics.

I By now, it should be clear that the expressiveness of NL (at least English) makessemantic analysis rather distinct from syntax.

I General challenges to analyzing semantic roles:I What are the predicates/events/frames/situations?I What are the roles/participants for each one?I What algorithms can accurately identify and label all of them?

23 / 113

Semantic Role Labeling

Input: a sentence x

Output:I A collection of predicates, each consisting of:

I a label, sometimes called the frameI a spanI a set of arguments, each consisting of:

I a label, usually called the roleI a span

In principle, spans might have gaps, though in most conventions they usually do not.

24 / 113

The Importance of Lexicons

Like syntax, any annotated dataset is the product of extensive development ofconventions.

Many conventions are specific to particular words, and this information is codified instructured objects called lexicons.

You should think of every semantically annotated dataset as both the data and thelexicon.

We consider two examples.

25 / 113

PropBank(Palmer et al., 2005)

I Frames are verb senses (later extended, though)

I Lexicon maps verb-sense-specific roles onto a small set of abstract roles (e.g.,Arg0, Arg1, etc.)

I Annotated on top of the Penn Treebank, so that arguments are alwaysconstituents.

26 / 113

fall.01 (move downward)

I arg1: logical subject, patient, thing falling

I arg2: extent, amount fallen

I arg3: starting point

I arg4: ending point

I argM-loc: medium

I Sales fell to $251.2 million from $278.8 million.

I The average junk bond fell by 4.2%.

I The meteor fell through the atmosphere, crashing into Palo Alto.

27 / 113

fall.01 (move downward)

I arg1: logical subject, patient, thing falling

I arg2: extent, amount fallen

I arg3: starting point

I arg4: ending point

I argM-loc: medium

I Sales fell to $251.2 million from $278.8 million.

I The average junk bond fell by 4.2%.

I The meteor fell through the atmosphere, crashing into Palo Alto.

28 / 113

fall.01 (move downward)

I arg1: logical subject, patient, thing falling

I arg2: extent, amount fallen

I arg3: starting point

I arg4: ending point

I argM-loc: medium

I Sales fell to $251.2 million from $278.8 million.

I The average junk bond fell by 4.2%.

I The meteor fell through the atmosphere, crashing into Palo Alto.

29 / 113

fall.01 (move downward)

I arg1: logical subject, patient, thing falling

I arg2: extent, amount fallen

I arg3: starting point

I arg4: ending point

I argM-loc: medium

I Sales fell to $251.2 million from $278.8 million.

I The average junk bond fell by 4.2%.

I The meteor fell through the atmosphere, crashing into Palo Alto.

30 / 113

fall.01 (move downward)

I arg1: logical subject, patient, thing falling

I arg2: extent, amount fallen

I arg3: starting point

I arg4: ending point

I argM-loc: medium

I Sales fell to $251.2 million from $278.8 million.

I The average junk bond fell by 4.2%.

I The meteor fell through the atmosphere, crashing into Palo Alto.

31 / 113

fall.01 (move downward)

I arg1: logical subject, patient, thing falling

I arg2: extent, amount fallen

I arg3: starting point

I arg4: ending point

I argM-loc: medium

I Sales fell to $251.2 million from $278.8 million.

I The average junk bond fell by 4.2%.

I The meteor fell through the atmosphere, crashing into Palo Alto.

32 / 113

fall.08 (fall back, rely on in emergency)

I arg0: thing falling back

I arg1: thing fallen back on

I World Bank president Paul Wolfowitz has fallen back on his last resort.

33 / 113

fall.08 (fall back, rely on in emergency)

I arg0: thing falling back

I arg1: thing fallen back on

I World Bank president Paul Wolfowitz has fallen back on his last resort.

34 / 113

fall.08 (fall back, rely on in emergency)

I arg0: thing falling back

I arg1: thing fallen back on

I World Bank president Paul Wolfowitz has fallen back on his last resort.

35 / 113

fall.10 (fall for a trick; be fooled by)

I arg1: the fool

I arg2: the trick

I Many people keep falling for the idea that lowering taxes on the rich benefitseveryone.

36 / 113

fall.10 (fall for a trick; be fooled by)

I arg1: the fool

I arg2: the trick

I Many people keep falling for the idea that lowering taxes on the rich benefitseveryone.

37 / 113

fall.10 (fall for a trick; be fooled by)

I arg1: the fool

I arg2: the trick

I Many people keep falling for the idea that lowering taxes on the rich benefitseveryone.

38 / 113

FrameNet(Baker et al., 1998)

I Frames can be any content word (verb, noun, adjective, adverb)

I About 1,000 frames, each with its own roles

I Both frames and roles are hierarchically organized

I Annotated without syntax, so that arguments can be anything

https://framenet.icsi.berkeley.edu

39 / 113

change position on a scale

I Item: entity that has a position on the scale

I Attribute: scalar property that the Item possesses

I Difference: distance by which an Item changes its position

I Final state: Item’s state after the change

I Final value: position on the scale where Item ends up

I Initial state: Item’s state before the change

I Initial value: position on the scale from which the Item moves

I Value range: portion of the scale along which values of Attribute fluctuate

I Duration: length of time over which the change occurs

I Speed: rate of change of the value

I Group: the group in which an Item changes the value of an Attribute

40 / 113

FrameNet Example

Attacks on civilians decreased over the last four monthschange position on a scale

Item

Duration

The Attribute is left unfilled but is understood from context (i.e., “frequency”).

41 / 113

change position on a scale

Verbs: advance, climb, decline, decrease, diminish, dip, double, drop, dwindle, edge,explode, fall, fluctuate, gain, grow, increase, jump, move, mushroom, plummet, reach,rise, rocket, shift, skyrocket, slide, soar, swell, swing, triple, tumble

Nouns: decline, decrease, escalation, explosion, fall, fluctuation, gain, growth, hike,increase, rise, shift, tumble

Adverb: increasingly

42 / 113

change position on a scale

event

birth scenario . . . change position on a scale

change of temperature proliferating in number

. . . waking up

(birth scenario also inherits from sexual reproduction scenario.)

43 / 113

Semantic Role Labeling Tasks

The paper that started it all: Gildea and Jurafsky (2002) used FrameNet lexicon(which includes prototypes, not really a corpus).

I When FrameNet started releasing corpora, the task was reformulated. Exampleopen-source system: SEMAFOR (Das et al., 2014).

44 / 113

Semantic Role Labeling Tasks

The paper that started it all: Gildea and Jurafsky (2002) used FrameNet lexicon(which includes prototypes, not really a corpus).

I When FrameNet started releasing corpora, the task was reformulated. Exampleopen-source system: SEMAFOR (Das et al., 2014).

The PropBank corpus is used directly for training/testing.

45 / 113

Semantic Role Labeling Tasks

The paper that started it all: Gildea and Jurafsky (2002) used FrameNet lexicon(which includes prototypes, not really a corpus).

I When FrameNet started releasing corpora, the task was reformulated. Exampleopen-source system: SEMAFOR (Das et al., 2014).

The PropBank corpus is used directly for training/testing.

Conference on Computational Natural Language Learning (CoNLL) shared task in2004, 2005, 2008, 2009, all PropBank-based.

I In 2008 and 2009, the task was cast as a kind of dependency parsing.

I In 2009, seven languages were included in the task.

46 / 113

Methods

Boils down to labeling spans (with frames and roles).

It’s mostly about features.

47 / 113

Example: Path Features

S

NP-SBJ

DT

The

NNP

San

NNP

Francisco

NNP

Examiner

VP

VBD

issued

NP

DT

a

JJ

special

NN

edition

PP-TMP

IN

around

NN

noon

NP-TMP

NN

yesterday

48 / 113

Example: Path Features

S

NP-SBJ

DT

The

NNP

San

NNP

Francisco

NNP

Examiner

VP

VBD

issued

NP

DT

a

JJ

special

NN

edition

PP-TMP

IN

around

NN

noon

NP-TMP

NN

yesterday

Path fromNP-SBJ

The San Francisco Examiner

to issued: NP↑S↓VP↓VBD

49 / 113

Example: Path Features

S

NP-SBJ

DT

The

NNP

San

NNP

Francisco

NNP

Examiner

VP

VBD

issued

NP

DT

a

JJ

special

NN

edition

PP-TMP

IN

around

NN

noon

NP-TMP

NN

yesterday

Path fromNP

a special edition

to issued: NP↑VP↓VBD

50 / 113

Methods: Beyond Features

The span-labeling decisions interact a lot!

I Presence of a frame increases the expectation of certain roles

I Roles for the same predicate shouldn’t overlap

I Some roles are mutually exclusive or require each other (e.g., “resemble”)

51 / 113

Methods: Beyond Features

The span-labeling decisions interact a lot!

I Presence of a frame increases the expectation of certain roles

I Roles for the same predicate shouldn’t overlap

I Some roles are mutually exclusive or require each other (e.g., “resemble”)

Ensuring well-formed outputs:

I Using syntax as a scaffold allows efficient prediction; you’re essentially labeling theparse tree (Toutanova et al., 2008).

I Others have formulated the problem as constrained, discrete optimization(Punyakanok et al., 2008).

I Also greedy methods (Bjorkelund et al., 2010) and joint methods for syntacticand semantic dependencies (Henderson et al., 2013).

52 / 113

Methods: Beyond FeaturesThe span-labeling decisions interact a lot!

I Presence of a frame increases the expectation of certain roles

I Roles for the same predicate shouldn’t overlap

I Some roles are mutually exclusive or require each other (e.g., “resemble”)

Ensuring well-formed outputs:

I Using syntax as a scaffold allows efficient prediction; you’re essentially labeling theparse tree (Toutanova et al., 2008).

I Others have formulated the problem as constrained, discrete optimization(Punyakanok et al., 2008).

I Also greedy methods (Bjorkelund et al., 2010) and joint methods for syntacticand semantic dependencies (Henderson et al., 2013).

Current work:

I Some recent attempts to merge FrameNet and PropBank have shown promise(FitzGerald et al., 2015; Kshirsagar et al., 2015)

53 / 113

Related Problems in “Relational” Semantics

I Coreference resolution: which mentions (within or across texts) refer to thesame entity or event?

I Entity linking: ground such mentions in a structured knowledge base (e.g.,Wikipedia)

I Relation extraction: characterize the relation among specific mentions

Information extraction: transform text into a structured knowledge representation

I Classical IE starts with a predefined schema

I “Open” IE includes the automatic construction of the schema; seehttp://ai.cs.washington.edu/projects/open-information-extraction

54 / 113

General Remarks

Criticisms of semantic role labeling:

I Semantic roles are just “syntax++” since they don’t allow much in the way ofreasoning (e.g., question answering).

I Lexicon building is slow and requires expensive expertise. Can we do this for every(sub)language?

55 / 113

General Remarks

Criticisms of semantic role labeling:

I Semantic roles are just “syntax++” since they don’t allow much in the way ofreasoning (e.g., question answering).

I Lexicon building is slow and requires expensive expertise. Can we do this for every(sub)language?

56 / 113

General Remarks

Criticisms of semantic role labeling:

I Semantic roles are just “syntax++” since they don’t allow much in the way ofreasoning (e.g., question answering).

I Lexicon building is slow and requires expensive expertise. Can we do this for every(sub)language?

We’ve now had a taste of two branches of semantics:

I Lexical semantics (e.g., supersense tagging)

I Relational semantics (e.g., semantic role labeling)

57 / 113

General Remarks

Criticisms of semantic role labeling:

I Semantic roles are just “syntax++” since they don’t allow much in the way ofreasoning (e.g., question answering).

I Lexicon building is slow and requires expensive expertise. Can we do this for every(sub)language?

We’ve now had a taste of two branches of semantics:

I Lexical semantics (e.g., supersense tagging)

I Relational semantics (e.g., semantic role labeling)

Next up, a third:

I Compositional semantics

58 / 113

Bridging the Gap between Language and the World

In order to link NL to a knowledge base, we might want to design a formal way torepresent meaning.Desiderata for a meaning representation language:

I represent the state of the world, i.e., a knowledge base

I query the knowledge base (e.g., verify that a statement is true, or answer aquestion)

I handle ambiguity, vagueness, and non-canonical formsI “I wanna eat someplace that’s close to UW”I “something not too spicy”

I support inference and reasoningI “can Karen eat at Schultzy’s?”

59 / 113

Bridging the Gap between Language and the World

In order to link NL to a knowledge base, we might want to design a formal way torepresent meaning.Desiderata for a meaning representation language:

I represent the state of the world, i.e., a knowledge base

I query the knowledge base (e.g., verify that a statement is true, or answer aquestion)

I handle ambiguity, vagueness, and non-canonical formsI “I wanna eat someplace that’s close to UW”I “something not too spicy”

I support inference and reasoningI “can Karen eat at Schultzy’s?”

60 / 113

Bridging the Gap between Language and the World

In order to link NL to a knowledge base, we might want to design a formal way torepresent meaning.Desiderata for a meaning representation language:

I represent the state of the world, i.e., a knowledge base

I query the knowledge base (e.g., verify that a statement is true, or answer aquestion)

I handle ambiguity, vagueness, and non-canonical formsI “I wanna eat someplace that’s close to UW”I “something not too spicy”

I support inference and reasoningI “can Karen eat at Schultzy’s?”

61 / 113

Bridging the Gap between Language and the World

In order to link NL to a knowledge base, we might want to design a formal way torepresent meaning.Desiderata for a meaning representation language:

I represent the state of the world, i.e., a knowledge base

I query the knowledge base (e.g., verify that a statement is true, or answer aquestion)

I handle ambiguity, vagueness, and non-canonical formsI “I wanna eat someplace that’s close to UW”I “something not too spicy”

I support inference and reasoningI “can Karen eat at Schultzy’s?”

62 / 113

Bridging the Gap between Language and the World

In order to link NL to a knowledge base, we might want to design a formal way torepresent meaning.Desiderata for a meaning representation language:

I represent the state of the world, i.e., a knowledge base

I query the knowledge base (e.g., verify that a statement is true, or answer aquestion)

I handle ambiguity, vagueness, and non-canonical formsI “I wanna eat someplace that’s close to UW”I “something not too spicy”

I support inference and reasoningI “can Karen eat at Schultzy’s?”

63 / 113

Bridging the Gap between Language and the WorldIn order to link NL to a knowledge base, we might want to design a formal way torepresent meaning.Desiderata for a meaning representation language:

I represent the state of the world, i.e., a knowledge base

I query the knowledge base (e.g., verify that a statement is true, or answer aquestion)

I handle ambiguity, vagueness, and non-canonical formsI “I wanna eat someplace that’s close to UW”I “something not too spicy”

I support inference and reasoningI “can Karen eat at Schultzy’s?”

Eventually (but not today):

I deal with non-literal meanings

I expressiveness across a wide range of subject matter

64 / 113

A (Tiny) World Model

I Domain: Adrian, Brook, Chris, Donald, Schultzy’s Sausage, Din Tai Fung,Banana Leaf, American, Chinese, Thai

I Property: Din Tai Fung has a long wait, Schultzy’s is noisy; Alice, Bob, andCharles are human

I Relations: Schultzy’s serves American, Din Tai Fung serves Chinese, and BananaLeaf serves Thai

Simple questions are easy:

I Is Schultzy’s noisy?

I Does Din Tai Fung serve Thai?

65 / 113

A (Tiny) World Model

I Domain: Adrian, Brook, Chris, Donald, Schultzy’s Sausage, Din Tai Fung,Banana Leaf, American, Chinese, Thaia, b, c, d, ss, dtf , bl , am, ch, th

I Property: Din Tai Fung has a long wait, Schultzy’s is noisy; Alice, Bob, andCharles are humanLongwait = {dtf },Noisy = {ss},Human = {a, b, c}

I Relations: Schultzy’s serves American, Din Tai Fung serves Chinese, and BananaLeaf serves ThaiServes = {(ss, am), (dtf , ch), (bl , th)},Likes = {(a, ss), (a, dtf ), . . .}

Simple questions are easy:

I Is Schultzy’s noisy?

I Does Din Tai Fung serve Thai?

66 / 113

A Quick Tour of First-Order Logic

I Term: a constant (ss) or a variableI Formula: defined inductively . . .

I If R is an n-ary relation and t1, . . . , tn are terms, then R(t1, . . . , tn) is a formula.I If φ is a formula, then its negation, ¬φ, is a formula.I If φ and ψ are formulas, then binary logical connectives can be used to create

formulas:I φ ∧ ψI φ ∨ ψI φ⇒ ψI φ⊕ ψ

I If φ is a formula and v is a variable, then quantifiers can be used to create formulas:I Universal quantifier: ∀v, φI Existential quantifier: ∃v, φ

Note: Leaving out functions, because we don’t need them in a single lecture on FOLfor NL.

67 / 113

Translating Between FOL and NL

1. Schultzy’s is not loud

2. Some human likes Chinese

3. If a person likes Thai, then they aren’t friends with Donald

4. ∀x,Restaurant(x) ⇒ (Longwait(x) ∨ ¬Likes(a, x))5. ∀x, ∃y,¬Likes(x, y)6. ∃y,∀x,¬Likes(x, y)

68 / 113

Translating Between FOL and NL

1. Schultzy’s is not loud ¬Noisy(ss)2. Some human likes Chinese

3. If a person likes Thai, then they aren’t friends with Donald

4. ∀x,Restaurant(x) ⇒ (Longwait(x) ∨ ¬Likes(a, x))5. ∀x, ∃y,¬Likes(x, y)6. ∃y,∀x,¬Likes(x, y)

69 / 113

Translating Between FOL and NL

1. Schultzy’s is not loud ¬Noisy(ss)2. Some human likes Chinese ∃x,Human(x ) ∧ Likes(x, ch)

3. If a person likes Thai, then they aren’t friends with Donald

4. ∀x,Restaurant(x) ⇒ (Longwait(x) ∨ ¬Likes(a, x))5. ∀x, ∃y,¬Likes(x, y)6. ∃y,∀x,¬Likes(x, y)

70 / 113

Translating Between FOL and NL

1. Schultzy’s is not loud ¬Noisy(ss)2. Some human likes Chinese ∃x,Human(x ) ∧ Likes(x, ch)

3. If a person likes Thai, then they aren’t friends with Donald∀x,Human(x ) ∧ Likes(x, th) ⇒ ¬Friends(x, d)

4. ∀x,Restaurant(x) ⇒ (Longwait(x) ∨ ¬Likes(a, x))5. ∀x,∃y,¬Likes(x, y)6. ∃y,∀x,¬Likes(x, y)

71 / 113

Translating Between FOL and NL

1. Schultzy’s is not loud ¬Noisy(ss)2. Some human likes Chinese ∃x,Human(x ) ∧ Likes(x, ch)

3. If a person likes Thai, then they aren’t friends with Donald∀x,Human(x ) ∧ Likes(x, th) ⇒ ¬Friends(x, d)

4. ∀x,Restaurant(x) ⇒ (Longwait(x) ∨ ¬Likes(a, x))Every restaurant has a long wait or is disliked by Adrian.

5. ∀x,∃y,¬Likes(x, y)6. ∃y,∀x,¬Likes(x, y)

72 / 113

Translating Between FOL and NL

1. Schultzy’s is not loud ¬Noisy(ss)2. Some human likes Chinese ∃x,Human(x ) ∧ Likes(x, ch)

3. If a person likes Thai, then they aren’t friends with Donald∀x,Human(x ) ∧ Likes(x, th) ⇒ ¬Friends(x, d)

4. ∀x,Restaurant(x) ⇒ (Longwait(x) ∨ ¬Likes(a, x))Every restaurant has a long wait or is disliked by Adrian.

5. ∀x,∃y,¬Likes(x, y)Everybody has something they don’t like.

6. ∃y,∀x,¬Likes(x, y)

73 / 113

Translating Between FOL and NL

1. Schultzy’s is not loud ¬Noisy(ss)2. Some human likes Chinese ∃x,Human(x ) ∧ Likes(x, ch)

3. If a person likes Thai, then they aren’t friends with Donald∀x,Human(x ) ∧ Likes(x, th) ⇒ ¬Friends(x, d)

4. ∀x,Restaurant(x) ⇒ (Longwait(x) ∨ ¬Likes(a, x))Every restaurant has a long wait or is disliked by Adrian.

5. ∀x,∃y,¬Likes(x, y)Everybody has something they don’t like.

6. ∃y,∀x,¬Likes(x, y)There exists something that nobody likes.

74 / 113

Logical Semantics(Montague, 1970)

The denotation of a NL sentence is the set of conditions that must hold in the (model)world for the sentence to be true.

Every restaurant has a long wait or Adrian doesn’t like it.

is true if and only if

∀x,Restaurant(x) ⇒ (Longwait(x) ∨ ¬Likes(a, x))

is true.

This is sometimes called the logical form of the NL sentence.

75 / 113

The Principle of Compositionality

The meaning of a NL phrase is determined by the meanings of its sub-phrases.

76 / 113

The Principle of Compositionality

The meaning of a NL phrase is determined by the meanings of its sub-phrases.

I.e., semantics is derived from syntax.

77 / 113

The Principle of Compositionality

The meaning of a NL phrase is determined by the meanings of its sub-phrases.

I.e., semantics is derived from syntax.

We need a way to express semantics of phrases, and compose them together!

78 / 113

λ-Calculus

(Much more powerful than what we’ll see today; ask your PL professor!)

Informally, two extensions:I λ-abstraction is another way to “scope” variables.

I If φ is a FOL formula and v is a variable, then λv.φ is a λ-term, meaning: anunnamed function from values (of v) to formulas (usually involving v)

I application of such functions: if we have λv.φ and ψ, then [λv.φ](ψ) is aformula.

I It can be reduced by substituting ψ in for every instance of v in φ.

79 / 113

λ-Calculus

(Much more powerful than what we’ll see today; ask your PL professor!)

Informally, two extensions:I λ-abstraction is another way to “scope” variables.

I If φ is a FOL formula and v is a variable, then λv.φ is a λ-term, meaning: anunnamed function from values (of v) to formulas (usually involving v)

I application of such functions: if we have λv.φ and ψ, then [λv.φ](ψ) is aformula.

I It can be reduced by substituting ψ in for every instance of v in φ.

Example:λx.Likes(x, dtf ) maps things to statements that they like Din Tai Fung

80 / 113

λ-Calculus

(Much more powerful than what we’ll see today; ask your PL professor!)

Informally, two extensions:I λ-abstraction is another way to “scope” variables.

I If φ is a FOL formula and v is a variable, then λv.φ is a λ-term, meaning: anunnamed function from values (of v) to formulas (usually involving v)

I application of such functions: if we have λv.φ and ψ, then [λv.φ](ψ) is aformula.

I It can be reduced by substituting ψ in for every instance of v in φ.

Example:[λx.Likes(x, dtf )](c) reduces to Likes(c, dtf )

81 / 113

λ-Calculus

(Much more powerful than what we’ll see today; ask your PL professor!)

Informally, two extensions:I λ-abstraction is another way to “scope” variables.

I If φ is a FOL formula and v is a variable, then λv.φ is a λ-term, meaning: anunnamed function from values (of v) to formulas (usually involving v)

I application of such functions: if we have λv.φ and ψ, then [λv.φ](ψ) is aformula.

I It can be reduced by substituting ψ in for every instance of v in φ.

Example:λx.λy.Friends(x, y) maps things x to maps of things y to statements that x and y arefriends

82 / 113

λ-Calculus

(Much more powerful than what we’ll see today; ask your PL professor!)

Informally, two extensions:I λ-abstraction is another way to “scope” variables.

I If φ is a FOL formula and v is a variable, then λv.φ is a λ-term, meaning: anunnamed function from values (of v) to formulas (usually involving v)

I application of such functions: if we have λv.φ and ψ, then [λv.φ](ψ) is aformula.

I It can be reduced by substituting ψ in for every instance of v in φ.

Example:[λx.λy.Friends(x, y)](b) reduces to λy.Friends(b, y)

83 / 113

λ-Calculus

(Much more powerful than what we’ll see today; ask your PL professor!)

Informally, two extensions:I λ-abstraction is another way to “scope” variables.

I If φ is a FOL formula and v is a variable, then λv.φ is a λ-term, meaning: anunnamed function from values (of v) to formulas (usually involving v)

I application of such functions: if we have λv.φ and ψ, then [λv.φ](ψ) is aformula.

I It can be reduced by substituting ψ in for every instance of v in φ.

Example:[[λx.λy.Friends(x, y)](b)](a) reduces to [λy.Friends(b, y)](a), which reduces toFriends(b, a)

84 / 113

Semantic Attachments to CFG

I NNP → Adrian {a}I VBZ → likes {λf.λy.∀xf(x) ⇒ Likes(y, x)}I JJ → expensive {λx.Expensive(x)}I NNS → restaurants {λx.Restaurant(x)}I NP → NNP {NNP.sem}I NP → JJ NNS {λx.JJ.sem(x) ∧ NNS.sem(x)}I VP → VBZ NP {VBZ.sem(NP.sem)}I S → NP VP {VP.sem(NP.sem)}

85 / 113

Example

S

NP

NNP

Adrian

VP

VBZ

likes

NP

JJ

expensive

NNS

restaurants

86 / 113

Example

S : VP.sem(NP.sem)

NP : NNP.sem

NNP : a

Adrian

VP : VBZ.sem(NP.sem)

VBZ : . . .

likes

NP : λv.JJ.sem(v) ∧ NNS.sem(v)

JJ : λz.Expensive(z)

expensive

NNS : λw.Restaurant(w)

restaurants

87 / 113

Example

S : VP.sem(NP.sem)

NP : NNP.sem

NNP : a

Adrian

VP : VBZ.sem(NP.sem)

VBZ : . . .

likes

NP : λv.Expensive(v) ∧ Restaurant(v)

JJ : λz.Expensive(z)

expensive

NNS : λw.Restaurant(w)

restaurants

λv.

λz.Expensive(z)︸ ︷︷ ︸JJ.sem

(v) ∧

λw.Restaurant(w)︸ ︷︷ ︸NNS.sem

(v)

88 / 113

Example

...

VP : VBZ.sem(NP.sem)

VBZ : λf.λy.∀xf(x) ⇒ Likes(y, x)

likes

NP : λv.Expensive(v) ∧ Restaurant(v)

expensive restaurants

89 / 113

Example

...

VP : λy.∀x,Expensive(x) ∧ Restaurant(x) ⇒ Likes(y, x)

VBZ : λf.λy.∀xf(x) ⇒ Likes(y, x)

likes

NP : λv.Expensive(v) ∧ Restaurant(v)

expensive restaurantsλf.λy.∀xf(x) ⇒ Likes(y, x)︸ ︷︷ ︸VBZ.sem

λv.Expensive(v) ∧ Restaurant(v)︸ ︷︷ ︸NP.sem

λy.∀x [λv.Expensive(v) ∧ Restaurant(v)] (x) ⇒ Likes(y, x)

λy.∀x,Expensive(x) ∧ Restaurant(x) ⇒ Likes(y, x)90 / 113

Example

S : VP.sem(NP.sem)

NP : NNP.sem

NNP : a

Adrian

VP : λy.∀x,Expensive(x) ∧ Restaurant(x) ⇒ Likes(y, x)

likes expensive restaurants

91 / 113

Example

S : VP.sem(NP.sem)

NP : a

NNP : a

Adrian

VP : λy.∀x,Expensive(x) ∧ Restaurant(x) ⇒ Likes(y, x)

likes expensive restaurants

92 / 113

Example

S : ∀x,Expensive(x) ∧ Restaurant(x) ⇒ Likes(a, x)

NP : a

NNP : a

Adrian

VP : λy.∀x,Expensive(x) ∧ Restaurant(x) ⇒ Likes(y, x)

likes expensive restaurants

λy.∀x,Expensive(x) ∧ Restaurant(x) ⇒ Likes(y, x)︸ ︷︷ ︸VP.sem

a︸︷︷︸NP.sem

∀x,Expensive(x) ∧ Restaurant(x) ⇒ Likes(a, x)

93 / 113

Graph-Based RepresentationsAbstract Meaning Representation (Banarescu et al., 2013)

want-01

boy

visit-01

city

name

“New” “York” “City”

ARG0

ARG1

ARG0ARG1

name

op1 op2 op3

“The boy wants to visit New York City.”Designed for (1) annotation-ability and (2) eventual use in machine translation.

94 / 113

Combinatory Categorial Grammar(Steedman, 2000)

CCG is a grammatical formalism that is well-suited for tying together syntax andsemantics.

Formally, it is more powerful than CFG—it can represent some of the context-sensitivelanguages (which we do not have time to define formally).

95 / 113

CCG Types

Instead of the “N” of CFGs, CCGs can have an infinitely large set of structuredcategories (called types).

I Primitive types: typically S, NP, N, and maybe moreI Complex types, built with “slashes,” for example:

I S/NP is “an S, except that it lacks an NP to the right”I S\NP is “an S, except that it lacks an NP to its left”I (S\NP)/NP is “an S, except that it lacks an NP to its right, and its left”

You can think of complex types as functions, e.g., S/NP maps NPs to Ss.

96 / 113

CCG Combinators

Instead of the production rules of CFGs, CCGs have a very small set of genericcombinators that tell us how we can put types together.

Convention writes the rule differently from CFG: X Y ⇒ Z means that X and Ycombine to form a Z (the “parent” in the tree).

97 / 113

Application Combinator

Forward (X/Y Y ⇒ X) and backward (Y X\Y ⇒ X)

98 / 113

Application Combinator

Forward (X/Y Y ⇒ X) and backward (Y X\Y ⇒ X)

NP

NP/N

the

N

dog

99 / 113

Application Combinator

Forward (X/Y Y ⇒ X) and backward (Y X\Y ⇒ X)

NP

NP/N

the

N

N/N

yellow

N

dog

100 / 113

Application Combinator

Forward (X/Y Y ⇒ X) and backward (Y X\Y ⇒ X)

S

NP

NP/N

the

N

dog

S\NP

(S\NP)/NP

bit

NP

John

101 / 113

Conjunction Combinator

X and X ⇒ X

NP

NP

cats

and NP

dogs

102 / 113

Conjunction Combinator

X and X ⇒ X

S

NP

John

S\NP

S\NP

(S\NP)/NP

ate

NP

anchovies

and S\NP

(S\NP)/NP

drank

NP

beer

103 / 113

Conjunction Combinator

X and X ⇒ X

S

NP

NP/N

the

N

dog

S\NP

(S\NP)/NP

(S\NP)/NP

bit

and (S\NP)/NP

infected

NP

John

104 / 113

Composition Combinator

Forward (X/Y Y/Z ⇒ X/Z) and backward (Y \Z X\Y ⇒ X\Z)

S

NP

I

S\NP

(S\NP)/NP

(S\NP)/(S\NP)

would

(S\NP)/NP

prefer

NP

olives

105 / 113

Composition Combinator

Forward (X/Y Y/Z ⇒ X/Z) and backward (Y \Z X\Y ⇒ X\Z)

S

NP

I

S\NP

(S\NP)/(S\NP)

would

S \NP

(S\NP)/NP

prefer

NP

olives

106 / 113

Type-Raising CombinatorForward (X ⇒ Y/(Y \X)) and backward (X ⇒ Y \(Y/X))

S

S/NP

S/NP

S/(S\NP)

NP

I

(S\NP)/NP

love

and S/NP

S/(S\NP)

NP

Karen

(S\NP)/NP

hates

NP

chocolate

107 / 113

Back to Semantics

Each combinator also tells us what to do with the semantic attachments.

I Forward application: X/Y : f Y : g ⇒ X : f(g)

I Forward composition: X/Y : f Y/Z : g ⇒ X/Z : λx.f(g(x))

I Forward type-raising: X : g ⇒ Y/(Y \X) : λf.f(g)

108 / 113

CCG Lexicon

Most of the work is done in the lexicon!

Syntactic and semantic information is much more formal here.

I Slash categories define where all the syntactic arguments are expected to be

I λ-expressions define how the expected arguments get “used” to build up a FOLexpression

Extensive discussion: Carpenter (1997)

109 / 113

Some Topics We Don’t Have Time For

I Tasks, evaluations, annotated datasets (e.g., CCGbank, Hockenmaier andSteedman, 2007)

I Learning for semantic parsing (Zettlemoyer and Collins, 2005) and CCG parsing(Clark and Curran, 2004a)

I Using CCG to represent other kinds of semantics (e.g., predicate-argumentstructures; Lewis and Steedman, 2014)

I Integrating continuous representations in semantic parsing (Lewis and Steedman,2013; Krishnamurthy and Mitchell, 2013)

I Supertagging (Clark and Curran, 2004b) and making semantic parsing efficient(Lewis and Steedman, 2014)

I Grounding meaning in visual (or other perceptual) experience

110 / 113

References I

Collin F. Baker, Charles J. Fillmore, and John B. Lowe. The Berkeley FrameNet project. In Proc. ofACL-COLING, 1998.

Laura Banarescu, Claire Bonial, Shu Cai, Madalina Georgescu, Kira Griffitt, Ulf Hermjakob, Kevin Knight,Philipp Koehn, Martha Palmer, and Nathan Schneider. Abstract meaning representation for sembanking. InProc. of the Linguistic Annotation Workshop and Interoperability with Discourse, 2013.

Anders Bjorkelund, Bernd Bohnet, Love Hafdell, and Pierre Nugues. A high-performance syntactic and semanticdependency parser. In Proc. of COLING, 2010.

Bob Carpenter. Type-logical semantics. MIT Press, 1997.

Stephen Clark and James R. Curran. Parsing the WSJ using CCG and log-linear models. In Proc. of ACL, 2004a.

Stephen Clark and James R. Curran. The importance of supertagging for wide-coverage CCG parsing. In Proc.of COLING, 2004b.

Dipanjan Das, Desai Chen, Andre F. T. Martins, Nathan Schneider, and Noah A. Smith. Frame-semanticparsing. Computational Linguistics, 40(1):9–56, 2014.

Charles J. Fillmore. The case for case. In Bach and Harms, editors, Universals in Linguistic Theory. Holt,Rinehart, and Winston, 1968.

Nicholas FitzGerald, Oscar Tackstrom, Kuzman Ganchev, and Dipanjan Das. Semantic role labeling with neuralnetwork factors. In Proc. of EMNLP, 2015.

Daniel Gildea and Daniel Jurafsky. Automatic labeling of semantic roles. Computational Linguistics, 24(3):245–288, 2002.

111 / 113

References IIJames Henderson, Paola Merlo, Ivan Titov, and Gabriele Musillo. Multilingual joint parsing of syntactic and

semantic dependencies with a latent variable model. Computational Linguistics, 39(4):949–998, 2013.

Julia Hockenmaier and Mark Steedman. CCGbank: a corpus of CCG derivations and dependency structuresextracted from the Penn Treebank. Computational Linguistics, 33(3):355–396, 2007.

Daniel Jurafsky and James H. Martin. Speech and Language Processing: An Introduction to Natural LanguageProcessing, Computational Linguistics, and Speech Recognition. Prentice Hall, second edition, 2008.

Daniel Jurafsky and James H. Martin. Semantic role labeling and argument structure (draft chapter), 2016.URL https://web.stanford.edu/~jurafsky/slp3/22.pdf.

Jayant Krishnamurthy and Tom Mitchell. Vector space semantic parsing: A framework for compositional vectorspace models. 2013.

Meghana Kshirsagar, Sam Thomson, Nathan Schneider, Jaime Carbonell, Noah A. Smith, and Chris Dyer.Frame-semantic role labeling with heterogeneous annotations. In Proc. of ACL, 2015.

Beth Levin. English verb classes and alternations: A preliminary investigation. University of Chicago Press, 1993.

Mike Lewis and Mark Steedman. Combining distributional and logical semantics. Transactions of theAssociation for Computational Linguistics, 1:179–192, 2013.

Mike Lewis and Mark Steedman. A* CCG parsing with a supertag-factored model. In Proc. of EMNLP, 2014.

Richard Montague. Universal grammar. Theoria, 36:373–398, 1970.

Martha Palmer, Daniel Gildea, and Paul Kingsbury. The Proposition Bank: An annotated corpus of semanticroles. Computational Linguistics, 31(1):71–105, 2005.

112 / 113

References III

Vasin Punyakanok, Dan Roth, and Wen-tau Yih. The importance of syntactic parsing and inference in semanticrole labeling. Computational Linguistics, 34(2):257–287, 2008.

Mark Steedman. A very short introduction to CCG, 1996. URLhttp://www.inf.ed.ac.uk/teaching/courses/nlg/readings/ccgintro.pdf.

Mark Steedman. The Syntactic Process. MIT Press, 2000.

Kristina Toutanova, Aria Haghighi, and Christopher D. Manning. A global joint model for semantic rolelabeling. Computational Linguistics, 34(2):161–191, 2008.

Luke Zettlemoyer and Michael Collins. Learning to map sentences to logical form: Structured classification withprobabilistic categorial grammars. In Proc. of UAI, 2005.

113 / 113