WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
-
Upload
clinton-small -
Category
Documents
-
view
231 -
download
1
Transcript of WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
What is WORDNET
Machine readable semantic dictionary
interlinked by semantic relations
Developed by PRINCETON University
Large lexical database for English language
Language forms a scale free network with
small average shortest path having words as
nodes and concepts as links
source: http://wordnet.princeton.edu/
Use of wordnet Easily navigable Used as online dictionary for English Freely for public availability
structure to show relations in the form of - noun, verb, adjective, adverb - synonymn - hypernym (Is a kind of …) - hyponym (… is a kind of) - troponym (particular ways to …) - meronym (parts of . . .)
WORDNET Application
source: http://wordnet.princeton.edu/
Few representations of WORDNET
Schema representation Graph Theory Tree structure Force graph structure wordnet explorer
Visual Interface for wordnet
Using RDF Schema and OWL ontology
Wordnet classes and properties are represented as wn:word and wn:wordsense
Source: www.w3.org/.../WNET/wordnet-sw-20040713.html
Represented using Graph theory
can be directed or un-directed graph
Source: www. nodebox.net/code/index.php/Graph
Represented using Tree sturucture
uses tokens and lexical relations
Source: www. docs.huihoo.com/nltk/0.9.5/en/ch02.html
Represented using Force Graph Structure
Presentation of words and meanings as graph nodes, and relations as edges between them
Source: www. code.google.com/p/synonym/
Represented for WORDNET Explorer
For applying visual principles to Lexical semantics
Source: www.cs.toronto.edu/~ccollins/research/wnVis.htm
Flow of study
Background study on wordsense
word ontology
Word Sense Disambiguation
Variable lexical notation for a concept
i-level generic notation
i-level specific notation
Semantic relatedness in WSD
Experiment Results
Thesaurus as a complex network
Visual Interface for wordnet
WORDNET – synsets – word ontology – set algebra – rules for representing lexical notations – semantic relatedness between concepts – concept distribution statistics – Degree of semantic relatedness :: WSD – Word Sense Disambiguation – semcor – Test cases – WSD on a complex network – WSD in English Thesaurus – Future work
Source: http://kylescholz.com/projects/wordnet
Wordnet – common sense ontology Symbols are words Concept meanings are synsets
Represented by one or more wods Words used for representation: synonymns
Synonyms and polysemous word Synset comprises a list of words and a list of
semantic relations between other sysnsets. Part I – list of words each one with a list of synsets
that the word represents Part II – set of semantic relations between
synsets(is-a, part-of, substance-of, member-of)
WSD: variable lexical notations for a concept Generic concept
notation: D = I ∪ J ∪ K∴ J = D − (I ∪ K) = (D − I )∩(D − K) = D∩ (I∪ K) J = D∩ ( I ∩K)
since, B = D ∪ E ∪ F D = B − (E∪F) =(B − E)∩(B − F) = B∩(E ∪F) D =B ∩(E ∩ F)
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
¯¯¯¯
¯ ¯
¯¯¯¯
¯ ¯
WSD: variable lexical notations for a concept
J = D∩ ( I ∩K) =( B∩(E ∩ F) )∩( I ∩ K) J = B∩( (E ∩ F)∩( I ∩
K) )when J = fly, D = fish lure I = spinner k = troll And introducing boolean
operators,
AND for ∩
OR for ∪
NOT for
¯ ¯
¯ ¯ ¯ ¯
¯ ¯ ¯ ¯
¯
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: variable lexical notations for a concept
(“fly”) becomes : (“fisherman's lure” OR “fish
lure”) AND ( (NOT “spinner”) AND (NOT “troll”) )
then B = lure,
E = ground bait,
F = stool pigeon
(“fly”) becomes :
(“bait” OR “decoy” OR “lure”) AND ( ((NOT
“ground bait”) AND (NOT “stoolpigeon”) AND((NOT “spinner”)AND(NOT “troll”)) )
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Notation for synset i-level generic notation for a
synset
If Sk is a synset, Fi is the synset that is located i links away following the hypernym links from Sk then the i-level generic notation for Sk is:
Note: Fi is the parent node of Fi-1, Fi-1 is the parent node of Fi-2 …
i-level specific notation for a synset
J = P ∪Q∪ R
when, P = T
Q = U
R = V∪ W
∴ J = T ∪ U ∪(V ∪W)
If S is a synset, Li is the set of synsets, Cik that are located i links away following the hyponym links from S, then the i-level specific regular notation for S is:
Note: if Cik is null, then C(i-1)k would be used (C(i-1)k is a leaf node in the case)
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: Semantic relatedness and word sense disambiguation
Procedure for determining the semantic relatedness of two given wordnet synsets
Conception 1: Concepts that appear more frequently and closer with each others are "more related" to each others than the concepts that appear less frequently and farther are.Conception 1 Synset relatedness measurement
concepts Synset lexical notation
close or far of appearance
Exists in a web page or not
co-occurance frequency
Number of web pages containing synsets
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: Semantic relatedness and word sense disambiguation
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
WSD: Tested for four random textsi-level generic notation ( 1, 2, 3 )Size of windows of context: Target words Vs Context words ( 3, 5, 7 )
Source: Proceedings of the 20th International Conference on Advanced Information Networking and Applications
Thesaurus as a complex network
As a Directed Graph:
sink composed of the 73,046 terms with kout = 0
source are the 30,260 terms with at least one outgoing link (kout > 0) – Root words absolute source : without
incoming links kin = 0 normal source : (kout > 0 and
kin > 0) bridge source : without
outgoing links to root words (kout(source) = 0)
1 – Normal source2 – Bridge source3 – Absolute source4 – sink
Source: arXiv:cond-mat/0312586 v1 2003
Thesaurus as a complex network
Frequency of outgoing links
Frequency of incoming links
Source: arXiv:cond-mat/0312586 v1 2003
Thesaurus as a complex network
Incoming Vs Outgoing Frequency Frequency distribution
Kout – for root words
Kin – for all words
- Root words in Kout
- All words in Kin
- Root words in Kin
- Non root words in Kin
Extension of wordnet
Transforming a Tree structure to a Matrix
structure
Wordnet in other languages (japanese,
korean, Thai)
Imagenet interlinked with wordnet
REBUILDER – a repository of software designs
Retrieves using bayesian network and wordnet