WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

29
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN

Transcript of WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.

WORDNETApproach on word sense techniques

- AKILAN VELMURUGAN

What is WORDNET

Machine readable semantic dictionary

interlinked by semantic relations

Developed by PRINCETON University

Large lexical database for English language

Language forms a scale free network with

small average shortest path having words as

nodes and concepts as links

source: http://wordnet.princeton.edu/

Use of wordnet Easily navigable Used as online dictionary for English Freely for public availability

structure to show relations in the form of - noun, verb, adjective, adverb - synonymn - hypernym (Is a kind of …) - hyponym (… is a kind of) - troponym (particular ways to …) - meronym (parts of . . .)

WORDNET Application

source: http://wordnet.princeton.edu/

Few representations of WORDNET

Schema representation Graph Theory Tree structure Force graph structure wordnet explorer

Visual Interface for wordnet

Using RDF Schema and OWL ontology

Wordnet classes and properties are represented as wn:word and wn:wordsense

Source: www.w3.org/.../WNET/wordnet-sw-20040713.html

Source: www.w3.org/.../WNET/wordnet-sw-20040713.html

Represented using Graph theory

can be directed or un-directed graph

Source: www. nodebox.net/code/index.php/Graph

Source: www. nodebox.net/code/index.php/Graph

Represented using Tree sturucture

uses tokens and lexical relations

Source: www. docs.huihoo.com/nltk/0.9.5/en/ch02.html

Source: www. docs.huihoo.com/nltk/0.9.5/en/ch02.html

Represented using Force Graph Structure

Presentation of words and meanings as graph nodes, and relations as edges between them

Source: www. code.google.com/p/synonym/

Source: www. code.google.com/p/synonym/

Represented for WORDNET Explorer

For applying visual principles to Lexical semantics

Source: www.cs.toronto.edu/~ccollins/research/wnVis.htm

Source: www.cs.toronto.edu/~ccollins/research/wnVis.htm

Flow of study

Background study on wordsense

word ontology

Word Sense Disambiguation

Variable lexical notation for a concept

i-level generic notation

i-level specific notation

Semantic relatedness in WSD

Experiment Results

Thesaurus as a complex network

Visual Interface for wordnet

WORDNET – synsets – word ontology – set algebra – rules for representing lexical notations – semantic relatedness between concepts – concept distribution statistics – Degree of semantic relatedness :: WSD – Word Sense Disambiguation – semcor – Test cases – WSD on a complex network – WSD in English Thesaurus – Future work

Source: http://kylescholz.com/projects/wordnet

Wordnet – common sense ontology Symbols are words Concept meanings are synsets

Represented by one or more wods Words used for representation: synonymns

Synonyms and polysemous word Synset comprises a list of words and a list of

semantic relations between other sysnsets. Part I – list of words each one with a list of synsets

that the word represents Part II – set of semantic relations between

synsets(is-a, part-of, substance-of, member-of)

WSD: variable lexical notations for a concept Generic concept

notation: D = I ∪ J ∪ K∴ J = D − (I ∪ K) = (D − I )∩(D − K) = D∩ (I∪ K) J = D∩ ( I ∩K)

since, B = D ∪ E ∪ F D = B − (E∪F) =(B − E)∩(B − F) = B∩(E ∪F) D =B ∩(E ∩ F)

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

¯¯¯¯

¯ ¯

¯¯¯¯

¯ ¯

WSD: variable lexical notations for a concept

J = D∩ ( I ∩K) =( B∩(E ∩ F) )∩( I ∩ K) J = B∩( (E ∩ F)∩( I ∩

K) )when J = fly, D = fish lure I = spinner k = troll And introducing boolean

operators,

AND for ∩

OR for ∪

NOT for

¯ ¯

¯ ¯ ¯ ¯

¯ ¯ ¯ ¯

¯

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

WSD: variable lexical notations for a concept

(“fly”) becomes : (“fisherman's lure” OR “fish

lure”) AND ( (NOT “spinner”) AND (NOT “troll”) )

then B = lure,

E = ground bait,

F = stool pigeon

(“fly”) becomes :

(“bait” OR “decoy” OR “lure”) AND ( ((NOT

“ground bait”) AND (NOT “stoolpigeon”) AND((NOT “spinner”)AND(NOT “troll”)) )

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

Notation for synset i-level generic notation for a

synset

If Sk is a synset, Fi is the synset that is located i links away following the hypernym links from Sk then the i-level generic notation for Sk is:

Note: Fi is the parent node of Fi-1, Fi-1 is the parent node of Fi-2 …

i-level specific notation for a synset

J = P ∪Q∪ R

when, P = T

Q = U

R = V∪ W

∴ J = T ∪ U ∪(V ∪W)

If S is a synset, Li is the set of synsets, Cik that are located i links away following the hyponym links from S, then the i-level specific regular notation for S is:

Note: if Cik is null, then C(i-1)k would be used (C(i-1)k is a leaf node in the case)

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

WSD: Semantic relatedness and word sense disambiguation

Procedure for determining the semantic relatedness of two given wordnet synsets

Conception 1: Concepts that appear more frequently and closer with each others are "more related" to each others than the concepts that appear less frequently and farther are.Conception 1 Synset relatedness measurement

concepts Synset lexical notation

close or far of appearance

Exists in a web page or not

co-occurance frequency

Number of web pages containing synsets

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

WSD: Semantic relatedness and word sense disambiguation

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

WSD: Tested for four random textsi-level generic notation ( 1, 2, 3 )Size of windows of context: Target words Vs Context words ( 3, 5, 7 )

Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications

Thesaurus as a complex network

As a Directed Graph:

sink composed of the 73,046 terms with kout = 0

source are the 30,260 terms with at least one outgoing link (kout > 0) – Root words absolute source : without

incoming links kin = 0 normal source : (kout > 0 and

kin > 0) bridge source : without

outgoing links to root words (kout(source) = 0)

1 – Normal source2 – Bridge source3 – Absolute source4 – sink

Source: arXiv:cond-mat/0312586 v1 2003

Thesaurus as a complex network

Frequency of outgoing links

Frequency of incoming links

Source: arXiv:cond-mat/0312586 v1 2003

Thesaurus as a complex network

Incoming Vs Outgoing Frequency Frequency distribution

Kout – for root words

Kin – for all words

- Root words in Kout

- All words in Kin

- Root words in Kin

- Non root words in Kin

Extension of wordnet

Transforming a Tree structure to a Matrix

structure

Wordnet in other languages (japanese,

korean, Thai)

Imagenet interlinked with wordnet

REBUILDER – a repository of software designs

Retrieves using bayesian network and wordnet