CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking...

54
CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay 14 th April, 2011

Transcript of CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking...

Page 1: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

CS460/626 : Natural Language Processing/Speech, NLP and the Web

(Lecture 38–Universal Networking Language)

Pushpak BhattacharyyaCSE Dept., IIT Bombay

14th April, 2011

Page 2: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

A Perpective

Morphology

Lexicon

Syntax

Semantics

Pragmatics

Discourse

Page 3: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

UNL: a United Nations project Started in 1996 10 year program 15 research groups across continents First goal: generators Next goal: analysers (needs solving various

ambiguity problems) Current active language groups

UNL_French (GETA-CLIPS, IMAG) UNL_English+Hindi UNL_Italian (Univ. of Pisa) UNL_Portugese (Univ of Sao Paolo, Brazil) UNL_Russian (Institute of Linguistics, Moscow) UNL_Spanish (UPM, Madrid)

Page 4: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

4

World-wide Universal Networking Language (UNL) Project

UNL

English Russian

Japanese

Hindi

Spanish

Language independent meaning representation.

Marathi

Others

Page 5: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Foundations and Applications

UNL Foundations Semantic Relations Universal Words Attributes How to write UNL expressions

UNL Applications Machine Translation: Rule based and

Statistical Search Text Entailment Sentiment Analysis

Page 6: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

UNL represents knowledge: John eats rice with a spoon

Semantic relations

attributes

Universal words

Repositoryof 42SemanticRelations and84 attributelabels

Page 7: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Sentence embeddings

Deepa claimed that she had composed a poem.

[UNL]agt(claim.@entry.@past, Deepa)obj(claim.@entry.@past, :01)agt:01(compose.@past.@entry.@complete,

she)obj:01(compose.@past.@entry.@complete,

poem.@indef)

[\UNL]

Page 8: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

English sentences: basic structure

A <verb> B John eats bread agt(eat.@entry,

John) obj(eat.@entry,

bread)

A <verb> John sleeps aoj(sleep.@entry,

John)

A <be> B John is good aoj(good.@entry,

John)

verb

A

R1

R2

B

A

aoj

verb

BA

R1R2

Page 9: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Hindi sentences: basic structure

A B <verb> John roti khaataa hai agt(eat.@entry, John) obj(eat.@entry,

bread)

A <verb> John sotaa hai aoj(sleep.@entry,

John)

A <be> B John acchaa hai aoj(good.@entry,

John)

verb

A

R1

R2

B

A

aoj

verb

BA

R1R2

Page 10: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

:02:01

Complex English sentences: Use recursion on the basic structure

A <verb> B John who is a good boy eats

bread which is toasted

agt(eat.@entry, :01) obj(eat.@entry, :02) aoj:01(boy, John.@entry) mod:01(boy, good) obj:01(toast,

bread.@entry.@focus)

boy

John

aoj

toast

Bread

obj

eat

:02

:01

agt obj

good

mod

Red arrows indicate entry nodes

Page 11: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

11

Constituents of Universal Networking Language

Universal Words (UWs) Relations Attributes Knowledge Base

Page 12: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

12

What is a Universal Word (UW)? Words of UNL Constitute the UNL vocabulary, the

syntactic-semantic units to form UNL expressions

A UW represents a concept Basic UW (an English word/compound

word/phrase with no restrictions or Constraint List)

Restricted UW (with a Constraint List ) Examples:

“crane(icl>device)” “crane(icl>bird)”

Page 13: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

13

The Lexicon

Format of the dictionary entry

e.g., [minister] {} “minister(icl>person)” (N,ANIMT,PHSCL,PRSN);

Head word Universal word Attributes

Morphological - Pl(plural), V_ed(past tense form)

Syntactic - V(verb),VOA(verb of action) Semantic - ANIMT(animate), PLACE, TIME

[headword] {} “Universal word“ (Attribute list);

Page 14: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

14

The Lexicon (cntd)

Content words:

[forward] {} “forward(icl>send)” (V,VOA) <E,0,0>;

[mail] {} “mail(icl>message)” (N,PHSCL,INANI) <E,0,0>;

[minister] {} “minister(icl>person)” (N,ANIMT,PHSCL,PRSN) <E,0,0>;

Headword Universal Word Attributes

He forwarded the mail to the minister.

Page 15: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

15

The Lexicon (cntd)

function words:

[he] {} “he” (PRON,SUB,SING,3RD)[the] {} “the” (ART,THE) <E,0,0>;[to] {} “to” (PRE,#TO) <E,0,0>;

Headword Universal Word

Attributes

He forwarded the mail to the minister.

Page 16: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Multilingual dictionary

सा�र्व�भौ�मशब्दम�ख्य शब्द

farmer(icl>creator)farmer

शे�तकरी�

किकसा�न N,M,ANIMT,FAUNA,MML,PRSN,Na

N,ANIMT,FAUNA,MML,PRSN

E

M

H

N,M,ANIMT,FAUNA,MML,PRSN

गु�ण

Page 17: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

17

The Features of a UW

Every concept existing in any language must correspond to a UW

The constraint list should be as small as necessary to disambiguate the headword

Every UW should be defined in the UNL Knowledge-Base (now wordnet)

Page 18: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

18

Restricted UWs Examples

He will hold office until the spring of next year.

The spring was broken. Restricted UWs, which are Headwords

with a constraint list, for example:“spring(icl>season)” “spring(icl>device)”“spring(icl>jump)”“spring(icl>fountain)”

Page 19: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

19

How to create UWs?

Pick up a concept the concept of “crane"

as "a device for lifting heavy loads” or

as “a long-legged bird that wade in water in search of food”

Choose an English word for the concept. In the case for “crane", since it is a word of

English, the corresponding word should be ‘crane'

Choose a constraint list for the word. [ ] ‘crane(icl>device)' [ ] ‘crane(icl>bird)'

Page 20: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Example: Hindi word ghar ghar- house

usne garmii me ghar kii marammat kii he renovated the house in the summer

ghar- home office ke baad ghar louto return home after office

Ghar- family bade ghar kii betii girl from a renowned family

Page 21: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Example: ghar (cntd) ghar- own country

bahut saal bidesh me kaam karke ghar louta aayaa

returned home after working abroad for many years

Ghar- astrological position ashtam ghar par budh hai Mercury in in the eighth house

Page 22: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

House in English Wordnet 1. (1029) house -- (a dwelling that serves as

living quarters for one or more families; "he has a house on Cape Cod"; "she felt she had to get out of the house")

3. (51) house -- (a building in which something is sheltered or located; "they had a large carriage house")

4. (39) family, household, house, home, menage -- (a social unit living together; "he moved his family to Virginia"; "It was a good Christian household“;)

Page 23: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

House in English Wordnet

7. (13) house -- (aristocratic family line; "the House of York")

11. sign of the zodiac, star sign, sign, mansion, house, planetary house -- ((astrology) one of 12 equal areas into which the zodiac is divided)

Page 24: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Unambiguous construction of UWs

Use constraints: Ontological, Semantic and Argument

Example: forward a mail to the minister

forward(icl>do, icl>send, agt>thing(icl>animate), obj>thing(icl>inanimate), gol>thing)

Constraint types:icl>do: ontological,icl>send: semanticagt>thing, obj>thing, gol>thing: argument

Page 25: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

UNL Relations

Page 26: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Relations constitute the syntax of UNL Express how concepts (UWs) constitute

a sentence Represented as strings of 3 characters

or less A set of 41 relations specified in UNL

(e.g., agt, aoj, ben, gol, obj, plc, src, tim,…)

Refer to a semantic role between two lexical items in a sentence

Page 27: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

27

AGT / AOJ / OBJ AGT  (Agent)

Definition:  Agt defines a thing which initiates an action

AOJ (Thing with attribute)Definition:  Aoj defines a thing which is in a state or has an attribute

OBJ (Affected thing)Definition: Obj defines a thing in focus which is directly affected by an event or state

Page 28: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

28

Examples John broke the window.

agt ( break.@entry.@past, John)

This flower is beautiful.aoj ( beautiful.@entry, flower)

He blamed John for the accident.obj ( blame.@entry.@past, John)

Page 29: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Example: UNL Graph with agt, obj, ben

obj

agt

@ entry @ past

baby(icl>child)

carve(icl>cut)

toy(icl>plaything)

he(iof>person) @def

ben

He carved a toy for the baby.

Page 30: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

30

GOL / SRC

GOL  (Goal : final state)Definition:  Gol defines the final state of an object or the thing finally associated with an object of an event

SRC  (Source : initial state)Definition:  Src defines the initial state of object or the thing initially associated with object of an event

Page 31: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

31

GOL

I deposited my money in my bank account.

objagt

@ entry @ past

account(icl>statement)

deposit(icl>put)

money(icl>currency)

I

gol

bank(icl>possession)

modmod mod

I I

Page 32: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

32

SRC

They make a small income from fishing.

objagt

@ entry @ present

fishing(icl>business)

make(icl>do)

income(icl>gain)

they(icl>persons)

src

small(aoj>thing)

mod

Page 33: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

33

PUR PUR (Purpose or objective)

Definition:  Pur defines the purpose or objectives of the agent of an event or the purpose of a thing exist

This budget is for food.pur ( food.@entry, budget )mod ( budget, this )

Page 34: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

34

RSN RSN (Reason)

Definition:  Rsn defines a reason why an event or a state happens

They selected him for his honesty.agt(select(icl>choose).@entry, they)obj(select(icl>choose) .@entry, he)rsn (select(icl>choose).@entry, honesty)

Page 35: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

35

TIM

TIM (Time)Definition:  Tim defines the time an event occurs or a state is true

I wake up at noon.agt ( wake up.@entry, I )tim ( wake up.@entry, noon(icl>time))

Page 36: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

36

PLC PLC (Place)

Definition:  Plc defines the place an event occurs or a state is true or a thing exists

Temples are very famous in India.aoj (famous.@entry,

temple@pl )man (famous.@entry, very)plc (famous.@entry, India)

Page 37: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

37

INS

INS   (Instrument) Definition:  Ins defines the instrument to carry out an event

I solved it with computeragt ( solve.@entry.@past, I )ins ( solve.@entry.@past, computer )obj ( solve.@entry.@past, it )

Page 38: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

38

INS

objagt

@ entry @ past

blanket(icl>object)

cover(icl>do)

baby(icl>child)

John(iof>person)

@def

ins

John covered the baby with a blanket.

Page 39: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

39

Attributes Constitute syntax of UNL Play the role of bridging the conceptual world

and the real world in the UNL expressions Show how and when the speaker views what is

said and with what intention, feeling, and so on

Seven types: Time with respect to the speaker Aspects Speaker’s view of reference Speaker’s emphasis, focus, topic, etc. Convention Speaker’s attitudes Speaker’s feelings and viewpoints

Page 40: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

40

Tense: @past

The past tense is normally expressed by @past

{unl}agt(go.@entry.@past, he)…{/unl}

He went there yesterday

Page 41: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

41

Aspects: @progress

{unl}man

( rain.@entry.@present.@progress, hard )

{/unl}

It’s raining hard.

Page 42: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

42

Speaker’s view of reference

@def (Specific concept (already referred))The house on the corner is for sale.

@indef (Non-specific class)There is a book on the desk

@not is always attached to the UW which is negated.

He didn’t come. agt ( come.@entry.@past.@not, he )

Page 43: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

43

Speaker’s emphasis

@emphasisJohn his name is.

mod ( name, he )aoj ( John.@emphasis.@entry, name )

@entry denotes the entry point or main UW of an UNL expression

Page 44: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

How to generate UNL

Page 45: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

45

Early Enco (1996-98)

Analysis windows -Two in number Left Analysis Window (LAW) Right Analysis Window (RAW)

Condition windows - Many in number Left Condition Window (LCW) Right Condition Window (RAW)

LAW

Word2

Word1

Word4

RAW

RCW

Wordn

LCW

Word3

sentence

windows

Page 46: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

46

UNL Rule for a Semantic Relation

;Create relation between V and N2, after resolving the preposition preceding N2

<{V,VOA,:::}{N,TIME,DAY,ONRES,PRERES::tim:}P25;

IFthe left analysis window is on a verb(V) which is

verb of action (VOA) AND

the right analysis window is on a noun (N) and has TIME, DAY attribute for which the preceding preposition (on) has been processed and deleted

THENset up the tim relation between V and N2.

(indicated by < at the start of the rule)

Page 47: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

UNL generation using NLP tools and resources

47

Page 48: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

SRS based system

Page 49: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Multi parser based system

Page 50: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Evaluation Recall =

#expressions matched in gold and generated UNL

#expressions expected in gold UNL

Precision =#expressions matched in gold and generated

UNL #expressions in generated UNL

F1 score = 2 * recall * precision recall + precision

Page 51: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Comparison between the two systems

Table Name Accuracy of XLE Parser Based System

Accuracy of Multi-parser based system

evalTb_OXF_V_TO_INF 0.8376 0.8591

evalTb_OXF_VN_TO_INF 0.8369 0.8429

evalTb_OXF_S_TO_DO_VERB 0.7833 0.7833

evalTb_XTAG 0.7181 0.7835

evalTb_FRAMENET 0.6618 0.7591

evalTb_RADFORD 0.8141 0.8542

evalTb_V 0.5920 0.7587

evalTb_VN 0.7528 0.7625

evalTb_VNN 0.7692 0.7902

evalTb_VING 0.7084 0.7084

evalTb_VADJ 0.5486 0.6214

evalTb_VINF 0.7236 0.7772

evalTb_VTHAT 0.7988 0.7999

evalTb_TOI_Education 0.3875 0.3669

evalTb_test 0.4667 0.4667

evalTb_demo 1.0000 1.0000

evalTb_Test2 0.3913 0.5116

evalTb_t3 0.7155 0.8553

evalTb_Barcelona 0.3194 0.3181

Total 0.6489 0.7010

Page 52: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

LanguageProcessing & Understanding

Information Extraction: Part of Speech tagging Named Entity Recognition Shallow Parsing Summarization

Machine Learning: Semantic Role labeling Sentiment Analysis Text Entailment (web 2.0 applications)Using graphical models, support vector machines, neural networks

IR: Cross Lingual Search Crawling Indexing Multilingual Relevance Feedback

Machine Translation: Statistical Interlingua Based EnglishIndian languages Indian languagesIndian languages Indowordnet

Resources: http://www.cfilt.iitb.ac.inPublications: http://www.cse.iitb.ac.in/~pb

Linguistics is the eye and computation thebody

Use of UNL in multiple NLP tasks

Page 53: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

Summing up Some NLP milestones covered

WSD: various approaches SMT Parsing (classical and probabilistic) Phonology, Phonetics, Syllabification,

Transliteration Semantics, UNL

Assignments: to reinforce understanding of lectures

Important topics left out: IR, Similarity measures

Seminars: wide range of topics for breadth and exposure

Lectures: Foundation and depth

Page 54: CS460/626 : Natural Language Processing/Speech, NLP and the Web (Lecture 38–Universal Networking Language) Pushpak Bhattacharyya CSE Dept., IIT Bombay.

God Bless!!