CS460/626 : Natural Language Processing/Speech, NLP and the Web ( Lecture 38–Universal ...
description
Transcript of CS460/626 : Natural Language Processing/Speech, NLP and the Web ( Lecture 38–Universal ...
CS460/626 : Natural Language Processing/Speech, NLP and the Web
(Lecture 38–Universal Networking Language)
Pushpak BhattacharyyaCSE Dept., IIT Bombay
14th April, 2011
A Perpective
Morphology
Lexicon
Syntax
SemanticsPragmatics
Discourse
UNL: a United Nations project Started in 1996 10 year program 15 research groups across continents First goal: generators Next goal: analysers (needs solving various
ambiguity problems) Current active language groups
UNL_French (GETA-CLIPS, IMAG) UNL_English+Hindi UNL_Italian (Univ. of Pisa) UNL_Portugese (Univ of Sao Paolo, Brazil) UNL_Russian (Institute of Linguistics, Moscow) UNL_Spanish (UPM, Madrid)
4
World-wide Universal Networking Language (UNL) Project
UNL
English Russian
Japanese
Hindi
Spanish
Language independent meaning representation.
Marathi
Others
Foundations and Applications
UNL Foundations Semantic Relations Universal Words Attributes How to write UNL expressions
UNL Applications Machine Translation: Rule based and
Statistical Search Text Entailment Sentiment Analysis
UNL represents knowledge: John eats rice with a spoon
Semantic relations
attributes
Universal words
Repositoryof 42SemanticRelations and84 attributelabels
Sentence embeddingsDeepa claimed that she had composed a
poem.[UNL]
agt(claim.@entry.@past, Deepa)obj(claim.@entry.@past, :01)agt:01(compose.@past.@entry.@complete,
she)obj:01(compose.@past.@entry.@complete,
poem.@indef)[\UNL]
English sentences: basic structure
A <verb> B John eats bread agt(eat.@entry,
John) obj(eat.@entry,
bread) A <verb>
John sleeps aoj(sleep.@entry,
John) A <be> B
John is good aoj(good.@entry,
John)
verb
A
R1
R2
B
A
aoj
verb
BA
R1R2
Hindi sentences: basic structure
A B <verb> John roti khaataa hai agt(eat.@entry, John) obj(eat.@entry,
bread) A <verb>
John sotaa hai aoj(sleep.@entry,
John) A <be> B
John acchaa hai aoj(good.@entry,
John)
verb
A
R1
R2
B
A
aoj
verb
BA
R1R2
:02:01
Complex English sentences: Use recursion on the basic structure
A <verb> B John who is a good boy eats
bread which is toasted
agt(eat.@entry, :01) obj(eat.@entry, :02) aoj:01(boy, John.@entry) mod:01(boy, good) obj:01(toast,
bread.@entry.@focus)
boy
John
aoj
toast
Bread
obj
eat
:02
:01
agt obj
good
mod
Red arrows indicate entry nodes
11
Constituents of Universal Networking Language Universal Words (UWs) Relations Attributes Knowledge Base
12
What is a Universal Word (UW)? Words of UNL Constitute the UNL vocabulary, the
syntactic-semantic units to form UNL expressions
A UW represents a concept Basic UW (an English word/compound
word/phrase with no restrictions or Constraint List)
Restricted UW (with a Constraint List ) Examples:
“crane(icl>device)” “crane(icl>bird)”
13
The LexiconFormat of the dictionary entry
e.g., [minister] {} “minister(icl>person)” (N,ANIMT,PHSCL,PRSN); Head word Universal word Attributes
Morphological - Pl(plural), V_ed(past tense form)
Syntactic - V(verb),VOA(verb of action) Semantic - ANIMT(animate), PLACE, TIME
[headword] {} “Universal word“ (Attribute list);
14
The Lexicon (cntd)
Content words:
[forward] {} “forward(icl>send)” (V,VOA) <E,0,0>;
[mail] {} “mail(icl>message)” (N,PHSCL,INANI) <E,0,0>;
[minister] {} “minister(icl>person)” (N,ANIMT,PHSCL,PRSN) <E,0,0>;
Headword Universal Word Attributes
He forwarded the mail to the minister.
15
The Lexicon (cntd)
function words:
[he] {} “he” (PRON,SUB,SING,3RD)[the] {} “the” (ART,THE) <E,0,0>;[to] {} “to” (PRE,#TO) <E,0,0>;
Headword Universal Word
Attributes
He forwarded the mail to the minister.
Multilingual dictionary
सार्व�भौमशब्दमुख्य शब्द
farmer(icl>creator)farmer
शेतकरी
किकसान N,M,ANIMT,FAUNA,MML,PRSN,Na
N,ANIMT,FAUNA,MML,PRSN
E
M
H
N,M,ANIMT,FAUNA,MML,PRSN
गुण
17
The Features of a UW Every concept existing in any
language must correspond to a UW The constraint list should be as
small as necessary to disambiguate the headword
Every UW should be defined in the UNL Knowledge-Base (now wordnet)
18
Restricted UWs Examples
He will hold office until the spring of next year.
The spring was broken. Restricted UWs, which are Headwords
with a constraint list, for example:“spring(icl>season)” “spring(icl>device)”“spring(icl>jump)”“spring(icl>fountain)”
19
How to create UWs? Pick up a concept
the concept of “crane" as "a device for lifting heavy loads”
or as “a long-legged bird that wade in water in search of food”
Choose an English word for the concept. In the case for “crane", since it is a word of
English, the corresponding word should be ‘crane'
Choose a constraint list for the word. [ ] ‘crane(icl>device)' [ ] ‘crane(icl>bird)'
Example: Hindi word ghar ghar- house
usne garmii me ghar kii marammat kii he renovated the house in the summer
ghar- home office ke baad ghar louto return home after office
Ghar- family bade ghar kii betii girl from a renowned family
Example: ghar (cntd) ghar- own country
bahut saal bidesh me kaam karke ghar louta aayaa
returned home after working abroad for many years
Ghar- astrological position ashtam ghar par budh hai Mercury in in the eighth house
House in English Wordnet 1. (1029) house -- (a dwelling that serves as
living quarters for one or more families; "he has a house on Cape Cod"; "she felt she had to get out of the house")
3. (51) house -- (a building in which something is sheltered or located; "they had a large carriage house")
4. (39) family, household, house, home, menage -- (a social unit living together; "he moved his family to Virginia"; "It was a good Christian household“;)
House in English Wordnet 7. (13) house -- (aristocratic family
line; "the House of York") 11. sign of the zodiac, star sign,
sign, mansion, house, planetary house -- ((astrology) one of 12 equal areas into which the zodiac is divided)
Unambiguous construction of UWs
Use constraints: Ontological, Semantic and Argument
Example: forward a mail to the minister forward(icl>do, icl>send, agt>thing(icl>animate), obj>thing(icl>inanimate), gol>thing)
Constraint types:icl>do: ontological,icl>send: semanticagt>thing, obj>thing, gol>thing: argument
UNL Relations
Relations constitute the syntax of UNL Express how concepts (UWs) constitute
a sentence Represented as strings of 3 characters
or less A set of 41 relations specified in UNL
(e.g., agt, aoj, ben, gol, obj, plc, src, tim,…)
Refer to a semantic role between two lexical items in a sentence
27
AGT / AOJ / OBJ AGT (Agent)
Definition: Agt defines a thing which initiates an action
AOJ (Thing with attribute)Definition: Aoj defines a thing which is in a state or has an attribute
OBJ (Affected thing)Definition: Obj defines a thing in focus which is directly affected by an event or state
28
Examples John broke the window.
agt ( break.@entry.@past, John)
This flower is beautiful.aoj ( beautiful.@entry, flower)
He blamed John for the accident.obj ( blame.@entry.@past, John)
Example: UNL Graph with agt, obj, ben
objagt
@ entry @ past
baby(icl>child)
carve(icl>cut)
toy(icl>plaything)
he(iof>person) @def
ben
He carved a toy for the baby.
30
GOL / SRC GOL (Goal : final state)
Definition: Gol defines the final state of an object or the thing finally associated with an object of an event
SRC (Source : initial state)Definition: Src defines the initial state of object or the thing initially associated with object of an event
31
GOL I deposited my money in my bank
account.
objagt
@ entry @ past
account(icl>statement)
deposit(icl>put)
money(icl>currency)
I
gol
bank(icl>possession)
modmod mod
I I
32
SRC They make a small income from fishing.
objagt
@ entry @ present
fishing(icl>business)
make(icl>do)
income(icl>gain)
they(icl>persons)
src
small(aoj>thing)
mod
33
PUR PUR (Purpose or objective)
Definition: Pur defines the purpose or objectives of the agent of an event or the purpose of a thing exist
This budget is for food.pur ( food.@entry, budget )mod ( budget, this )
34
RSN RSN (Reason)
Definition: Rsn defines a reason why an event or a state happens
They selected him for his honesty.agt(select(icl>choose).@entry, they)obj(select(icl>choose) .@entry, he)rsn (select(icl>choose).@entry, honesty)
35
TIM TIM (Time)
Definition: Tim defines the time an event occurs or a state is true
I wake up at noon.agt ( wake up.@entry, I )tim ( wake up.@entry, noon(icl>time))
36
PLC PLC (Place)
Definition: Plc defines the place an event occurs or a state is true or a thing exists
Temples are very famous in India.aoj (famous.@entry,
temple@pl )man (famous.@entry, very)plc (famous.@entry, India)
37
INS INS (Instrument)
Definition: Ins defines the instrument to carry out an event
I solved it with computeragt ( solve.@entry.@past, I )ins ( solve.@entry.@past, computer )obj ( solve.@entry.@past, it )
38
INS
objagt
@ entry @ past
blanket(icl>object)
cover(icl>do)
baby(icl>child)
John(iof>person)
@def
ins
John covered the baby with a blanket.
39
Attributes Constitute syntax of UNL Play the role of bridging the conceptual world
and the real world in the UNL expressions Show how and when the speaker views what is
said and with what intention, feeling, and so on
Seven types: Time with respect to the speaker Aspects Speaker’s view of reference Speaker’s emphasis, focus, topic, etc. Convention Speaker’s attitudes Speaker’s feelings and viewpoints
40
Tense: @past
The past tense is normally expressed by @past
{unl}agt(go.@entry.@past, he)…{/unl}
He went there yesterday
41
Aspects: @progress
{unl}man
( rain.@entry.@present.@progress, hard )
{/unl}
It’s raining hard.
42
Speaker’s view of reference
@def (Specific concept (already referred))The house on the corner is for sale.
@indef (Non-specific class)There is a book on the desk
@not is always attached to the UW which is negated.
He didn’t come. agt ( come.@entry.@past.@not, he )
43
Speaker’s emphasis @emphasis
John his name is.mod ( name, he )aoj ( John.@emphasis.@entry, name )
@entry denotes the entry point or main UW of an UNL expression
How to generate UNL
45
Early Enco (1996-98)
Analysis windows -Two in number Left Analysis Window (LAW) Right Analysis Window (RAW)
Condition windows - Many in number Left Condition Window (LCW) Right Condition Window (RAW)
LAW
Word2
Word1
Word4
…
RAW
RCW
Wordn
LCW
Word3
sentence
windows
46
UNL Rule for a Semantic Relation
;Create relation between V and N2, after resolving the preposition preceding N2
<{V,VOA,:::}{N,TIME,DAY,ONRES,PRERES::tim:}P25;
IFthe left analysis window is on a verb(V) which is
verb of action (VOA) AND
the right analysis window is on a noun (N) and has TIME, DAY attribute for which the preceding preposition (on) has been processed and deleted
THENset up the tim relation between V and N2. (indicated by < at the start of the rule)
UNL generation using NLP tools and resources
47
SRS based system
Multi parser based system
Evaluation Recall =
#expressions matched in gold and generated UNL
#expressions expected in gold UNL
Precision =#expressions matched in gold and generated
UNL #expressions in generated UNL
F1 score = 2 * recall * precision recall + precision
Comparison between the two systems
Table Name Accuracy of XLE Parser Based System
Accuracy of Multi-parser based system
evalTb_OXF_V_TO_INF 0.8376 0.8591evalTb_OXF_VN_TO_INF 0.8369 0.8429evalTb_OXF_S_TO_DO_VERB 0.7833 0.7833evalTb_XTAG 0.7181 0.7835evalTb_FRAMENET 0.6618 0.7591evalTb_RADFORD 0.8141 0.8542evalTb_V 0.5920 0.7587evalTb_VN 0.7528 0.7625evalTb_VNN 0.7692 0.7902evalTb_VING 0.7084 0.7084evalTb_VADJ 0.5486 0.6214evalTb_VINF 0.7236 0.7772evalTb_VTHAT 0.7988 0.7999evalTb_TOI_Education 0.3875 0.3669evalTb_test 0.4667 0.4667evalTb_demo 1.0000 1.0000evalTb_Test2 0.3913 0.5116evalTb_t3 0.7155 0.8553evalTb_Barcelona 0.3194 0.3181
Total 0.6489 0.7010
LanguageProcessing & Understanding
Information Extraction: Part of Speech tagging Named Entity Recognition Shallow Parsing Summarization
Machine Learning: Semantic Role labeling Sentiment Analysis Text Entailment (web 2.0 applications)Using graphical models, support vector machines, neural networks
IR: Cross Lingual Search Crawling Indexing Multilingual Relevance Feedback
Machine Translation: Statistical Interlingua Based EnglishIndian languages Indian languagesIndian languages Indowordnet
Resources: http://www.cfilt.iitb.ac.inPublications: http://www.cse.iitb.ac.in/~pb
Linguistics is the eye and computation thebody
Use of UNL in multiple NLP tasks
Summing up Some NLP milestones covered
WSD: various approaches SMT Parsing (classical and probabilistic) Phonology, Phonetics, Syllabification,
Transliteration Semantics, UNL
Assignments: to reinforce understanding of lectures
Important topics left out: IR, Similarity measures
Seminars: wide range of topics for breadth and exposure
Lectures: Foundation and depth
God Bless!!