23- November-091 WordNet and Extended WordNet Sriram Rajaraman.
Lecture 18 Ontologies and Wordnet Topics Ontologies Wordnet Overview of MeaningReadings: Text 13.5...
-
Upload
jade-hawkins -
Category
Documents
-
view
218 -
download
0
Transcript of Lecture 18 Ontologies and Wordnet Topics Ontologies Wordnet Overview of MeaningReadings: Text 13.5...
Lecture 18 Ontologies and Wordnet
Lecture 18 Ontologies and Wordnet
Topics Topics Ontologies Wordnet Overview of Meaning
Readings:Readings: Text 13.5
NLTK book Chapter 2
March 25, 2013
CSCE 771 Natural Language Processing
– 2 – CSCE 771 Spring 2013
OverviewOverviewLast Time (Programming)Last Time (Programming)
Chunking Chunking with NLTK HW 5 Project Ideas
TodayToday app.ChunkParser under NLTK
Readings: Readings: Chapter 7 http://www.nltk.org/howto http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html
Next Time:Next Time:
– 3 – CSCE 771 Spring 2013
Ontologies – the old meaningOntologies – the old meaning
http://www.merriam-webster.com/dictionary/ontology
1.1.: a branch of metaphysics concerned with the nature : a branch of metaphysics concerned with the nature and relations of being and relations of being
2.2.: a particular theory about the nature of being or the : a particular theory about the nature of being or the kinds of things that have existence kinds of things that have existence
– 4 – CSCE 771 Spring 2013
Ontologies – the new (CS) meaningOntologies – the new (CS) meaning
http://en.wikipedia.org/wiki/Ontology_(information_science)
““In In computer science and and information science, an ontology , an ontology formally represents knowledge as a set of concepts formally represents knowledge as a set of concepts within a within a domain, and the relationships between pairs of , and the relationships between pairs of concepts.”concepts.”
"Toward Principles for the Design of Ontologies Used for "Toward Principles for the Design of Ontologies Used for Knowledge Sharing" by Knowledge Sharing" by Tom Gruber 1993 1993
• ““An ontology is a formal, explicit specification of a An ontology is a formal, explicit specification of a shared conceptualization.”shared conceptualization.”
http://en.wikipedia.org/wiki/Ontology_(information_science)
– 5 – CSCE 771 Spring 2013
Gruber elaboratingGruber elaborating
"An ontology is a description (like a formal "An ontology is a description (like a formal specification of a program) of the concepts and specification of a program) of the concepts and relationships that can formally exist for an agent or a relationships that can formally exist for an agent or a community of agents. This definition is consistent community of agents. This definition is consistent with the usage of ontology as set of concept with the usage of ontology as set of concept definitions, but more general. And it is a different definitions, but more general. And it is a different sense of the word than its use in philosophy."sense of the word than its use in philosophy."[8] Gruber 2001 “2001 “
– 6 – CSCE 771 Spring 2013
Focus Levels of OntologiesFocus Levels of Ontologies
GenericGeneric
CoreCore
DomainDomain
TaskTask
ApplicationApplication
– 7 – CSCE 771 Spring 2013
Examples of in-use OntologiesExamples of in-use Ontologies
MedicalMedical
• UMLS
• SNOMED-RT, ,
• GALEN, GALEN,
• MEDLINE
LinguisticsLinguistics
• Wordnet Miller Princeton 1990sWordnet Miller Princeton 1990s
• Gold Gold http://linguistics-ontology.org/
– 8 – CSCE 771 Spring 2013
Early OWL versionsEarly OWL versions
OWL provides three increasingly expressive OWL provides three increasingly expressive sublanguagessublanguages
1.1. OWL LiteOWL Lite supports those users primarily needing a supports those users primarily needing a classification hierarchy and simple constraintsclassification hierarchy and simple constraints
2.2. OWL DLOWL DL supports those users who want the supports those users who want the maximum expressiveness while retainingmaximum expressiveness while retaining computational completeness (all conclusions are
guaranteed to be computable) and decidability (all computations will finish in finite time).
3.3. OWL FullOWL Full is meant for users who want maximum is meant for users who want maximum expressiveness and the syntactic freedom of RDF expressiveness and the syntactic freedom of RDF with no computational guaranteeswith no computational guarantees
http://www.w3.org/TR/2004/REC-owl-features-20040210/#s1.3
– 9 – CSCE 771 Spring 2013
Owl 2.0Owl 2.0
The OWL 2 Web Ontology Language, informally OWL 2, The OWL 2 Web Ontology Language, informally OWL 2, is an ontology language for the Semantic Web with is an ontology language for the Semantic Web with formally defined meaning. formally defined meaning.
OWL 2 ontologies provide classes, properties, OWL 2 ontologies provide classes, properties, individuals, and data values and are stored as individuals, and data values and are stored as Semantic Web documents. Semantic Web documents.
OWL 2 ontologies can be used along with information OWL 2 ontologies can be used along with information written in RDF, and OWL 2 ontologies themselves written in RDF, and OWL 2 ontologies themselves are primarily exchanged as RDF documents.are primarily exchanged as RDF documents.
http://www.w3.org/TR/owl2-overview/
– 10 – CSCE 771 Spring 2013
Owl 2 relationships to other languagesOwl 2 relationships to other languages
http://www.w3.org/TR/owl2-overview/#Semantics
– 11 – CSCE 771 Spring 2013
ontology tools - Editorsontology tools - Editors
Editors – protégé Editors – protégé http://protege.stanford.edu/
– 12 – CSCE 771 Spring 2013
Semantic WebSemantic Web
Web – static web pages + Web – static web pages +
Web 2.0 - Web 2.0 - http://en.wikipedia.org/wiki/Web_2.0 ~1999 ~1999
Semantic WebSemantic Web
"The Semantic Web is not a separate Web but an extension of the "The Semantic Web is not a separate Web but an extension of the current one, in which information is given well-defined current one, in which information is given well-defined meaning, better enabling computers and people to work in meaning, better enabling computers and people to work in cooperation." It is a source to retrieve information from the web cooperation." It is a source to retrieve information from the web (using the web spiders from RDF files) and access the data (using the web spiders from RDF files) and access the data through Semantic Web Agents or Semantic Web Services. through Semantic Web Agents or Semantic Web Services. Source: "The Semantic Web" by Source: "The Semantic Web" by Tim Berners-Lee, , James Hendler, and , and Ora Lassila, , Scientific American, 2001
– 13 – CSCE 771 Spring 2013
Basic NLTK Corpus FunctionalityBasic NLTK Corpus FunctionalityExample Descriptionfileids() the files of the corpusfileids([categories]) the files of the corpus corresponding to these categoriescategories() the categories of the corpuscategories([fileids]) the categories of the corpus corresponding to these filesraw() the raw content of the corpusraw(fileids=[f1,f2,f3]) the raw content of the specified filesraw(categories=[c1,c2]) the raw content of the specified categorieswords() the words of the whole corpuswords(fileids=[f1,f2,f3]) the words of the specified fileidswords(categories=[c1,c2]) the words of the specified categoriessents() the sentences of the whole corpussents(fileids=[f1,f2,f3]) the sentences of the specified fileidssents(categories=[c1,c2]) the sentences of the specified categoriesabspath(fileid) the location of the given file on diskencoding(fileid) the encoding of the file (if known)open(fileid) open a stream for reading the given corpus fileroot() the path to the root of locally installed corpusreadme() the contents of the README file of the corpus
Reference: NLTK Book Chapter 2
– 14 – CSCE 771 Spring 2013
More from Chapter 2 of NLTK BookMore from Chapter 2 of NLTK Book
2.2 Conditional Frequency Distributions2.2 Conditional Frequency Distributions• Conditions and Events• Counting Words by Genre• Plotting and Tabulating Distributions• Generating Random Text with Bigrams
2.3 More Python: Reusing Code2.3 More Python: Reusing Code• Functions• Modules
2.4 Lexical Resources2.4 Lexical Resources• Wordlist Corpora• A Pronouncing Dictionary• Comparative Wordlists• Shoebox and Toolbox Lexicons
2.5 WordNet2.5 WordNetReference: NLTK Book Chapter 2
– 15 – CSCE 771 Spring 2013
WordnetWordnet
George Miller Princeton UniversityGeorge Miller Princeton University
NLTK includes the English WordNet, with 155,287 words NLTK includes the English WordNet, with 155,287 words and 117,659 synonym setsand 117,659 synonym sets
Links:Links:
• http://en.wikipedia.org/wiki/WordNet
• http://wordnet.princeton.edu/
• http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html
Reference: NLTK Book Chapter 2
– 16 – CSCE 771 Spring 2013
WordNetWordNet
WordNet distinguishes between WordNet distinguishes between nouns, , verbs, , adjectives and and adverbs—it does not include —it does not include prepositions, determiners etc. prepositions, determiners etc.
Every synset contains a group of synonymous words Every synset contains a group of synonymous words or collocations or collocations
Different senses of a word are in different synsets. Different senses of a word are in different synsets.
– 17 – CSCE 771 Spring 2013
Nouns in WordnetNouns in Wordnet
hypernymshypernyms: : YY is a hypernym of is a hypernym of XX if every if every XX is a (kind is a (kind of) of) YY ( (caninecanine is a hypernym of is a hypernym of dogdog, because every , because every dog is a member of the larger category of canines)dog is a member of the larger category of canines)
hyponymshyponyms: : YY is a hyponym of is a hyponym of XX if every if every YY is a (kind of) is a (kind of) XX ( (dogdog is a hyponym of is a hyponym of caninecanine))
coordinate termscoordinate terms: : YY is a coordinate term of is a coordinate term of XX if if XX and and YY share a hypernym (share a hypernym (wolfwolf is a coordinate term of is a coordinate term of dogdog, , and and dogdog is a coordinate term of is a coordinate term of wolfwolf))
holonymholonym: : YY is a holonym of is a holonym of XX if if XX is a part of is a part of YY ( (buildingbuilding is a holonym of is a holonym of windowwindow))
meronymmeronym: : YY is a meronym of is a meronym of XX if if YY is a part of is a part of XX ((windowwindow is a meronym of is a meronym of buildingbuilding))
– 18 – CSCE 771 Spring 2013
Verbs in WordnetVerbs in Wordnet
hypernymhypernym: the verb : the verb YY is a hypernym of the verb is a hypernym of the verb XX if the if the activity activity XX is a (kind of) is a (kind of) YY ( (to perceiveto perceive is an hypernym is an hypernym of of to listento listen))
troponymtroponym: the verb : the verb YY is a troponym of the verb is a troponym of the verb XX if the if the activity activity YY is doing is doing XX in some manner ( in some manner (to lispto lisp is a is a troponym of troponym of to talkto talk))
entailmententailment: the verb : the verb YY is entailed by is entailed by XX if by doing if by doing XX you you must be doing must be doing YY ( (to sleepto sleep is entailed by is entailed by to snoreto snore))
coordinate termscoordinate terms: those verbs sharing a common : those verbs sharing a common hypernym (hypernym (to lispto lisp and and to yellto yell))
– 19 – CSCE 771 Spring 2013
Adjectives/Adverbs in WordnetAdjectives/Adverbs in Wordnet
AdjectivesAdjectives
• related nounsrelated nouns
• similar tosimilar to
• participle of verbparticiple of verb
AdverbsAdverbs
• root adjectivesroot adjectives
– 20 – CSCE 771 Spring 2013
Knowledge Structure Example Knowledge Structure Example defined by hypernym or defined by hypernym or IS AIS A relationships relationships
Example:Example:
dog, domestic dog, Canis familiaris dog, domestic dog, Canis familiaris
=> canine, canid => canine, canid
=> carnivore => carnivore
=> placental, placental mammal, eutherian mammal => placental, placental mammal, eutherian mammal
=> mammal => mammal
=> vertebrate, craniate => vertebrate, craniate
=> chordate => chordate
=> animal, animate being, beast, brute, creature, fauna => animal, animate being, beast, brute, creature, fauna
=> ... => ...
– 21 – CSCE 771 Spring 2013
Hypernym/HyponymHypernym/Hyponym
Inverse relationsInverse relations
Hyponym == ISAHyponym == ISA
Hypernym == “contains the subset”Hypernym == “contains the subset”
ExamplesExamples
• car is a hyponym of vehicle car is a hyponym of vehicle vehicle is a hypernym of car vehicle is a hypernym of car
• Dog is hyponym of animal Dog is hyponym of animal animal is a hypernym of dog animal is a hypernym of dog
• Sometimes superordinate used instead of hypernymSometimes superordinate used instead of hypernym
– 22 – CSCE 771 Spring 2013
WordNet as an ontologyWordNet as an ontology
Hyponym == ISAHyponym == ISA
Meronymy – part of relationMeronymy – part of relation
wheel part of car wheel part of car wheel is meronymy of car wheel is meronymy of car
Holnym inverse of meronymyHolnym inverse of meronymy
– 23 – CSCE 771 Spring 2013
Senses and SynonymsSenses and Synonyms
>>> from nltk.corpus import wordnet as wn >>> from nltk.corpus import wordnet as wn
>>> wn.synsets('motorcar') >>> wn.synsets('motorcar')
[Synset('car.n.01')][Synset('car.n.01')]
one meaning the first(01) noun sense(n) of carone meaning the first(01) noun sense(n) of car
>>> wn.synset('car.n.01').lemma_names >>> wn.synset('car.n.01').lemma_names
['car', 'auto', 'automobile', 'machine', 'motorcar']['car', 'auto', 'automobile', 'machine', 'motorcar']
synonymous words (or "lemmas")synonymous words (or "lemmas")
Reference: NLTK Book Chapter 2
– 24 – CSCE 771 Spring 2013
Definitions and examplesDefinitions and examples
>>> wn.synset('car.n.01').definition >>> wn.synset('car.n.01').definition
'a motor vehicle with four wheels; usually propelled 'a motor vehicle with four wheels; usually propelled by an internal combustion engine' by an internal combustion engine'
>>> wn.synset('car.n.01').examples >>> wn.synset('car.n.01').examples
[['he needs a car to get to work']'he needs a car to get to work']
Reference: NLTK Book Chapter 2
– 25 – CSCE 771 Spring 2013
>>> wn.synsets('car') >>> wn.synsets('car')
[Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), [Synset('car.n.01'), Synset('car.n.02'), Synset('car.n.03'), Synset('car.n.04'), Synset('cable_car.n.01')] Synset('car.n.04'), Synset('cable_car.n.01')]
>>> for synset in wn.synsets('car'): >>> for synset in wn.synsets('car'):
... print synset.lemma_names ... print synset.lemma_names
... ...
['car', 'auto', 'automobile', 'machine', 'motorcar'] ['car', 'auto', 'automobile', 'machine', 'motorcar']
['car', 'railcar', 'railway_car', 'railroad_car'] ['car', 'railcar', 'railway_car', 'railroad_car']
['car', 'gondola'] ['car', 'gondola']
['car', 'elevator_car'] ['car', 'elevator_car']
['cable_car', 'car']['cable_car', 'car']Reference: NLTK Book Chapter 2
– 26 – CSCE 771 Spring 2013
The WordNet HierarchyThe WordNet Hierarchy
Hypernyms (up) Hypernyms (up)
Hyponyms (down)Hyponyms (down)
Meronyms- Meronyms- componentscomponents
holonyms - things they holonyms - things they are contained inare contained in
Reference: NLTK Book Chapter 2
– 27 – CSCE 771 Spring 2013
Synonyms and LemmasSynonyms and Lemmas
>>> motorcar = wn.synset('car.n.01') >>> motorcar = wn.synset('car.n.01')
>>> types_of_motorcar = motorcar.hyponyms() >>> types_of_motorcar = motorcar.hyponyms()
>>> types_of_motorcar[26] Synset('ambulance.n.01') >>> types_of_motorcar[26] Synset('ambulance.n.01')
>>> sorted([lemma.name for synset in >>> sorted([lemma.name for synset in types_of_motorcar for lemma in synset.lemmas])types_of_motorcar for lemma in synset.lemmas])
['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer', ['Model_T', 'S.U.V.', 'SUV', 'Stanley_Steamer', 'ambulance', 'beach_waggon', … ]'ambulance', 'beach_waggon', … ]
Reference: NLTK Book Chapter 2
– 28 – CSCE 771 Spring 2013
Meronyms and HolonymsMeronyms and Holonyms
>>> wn.synset('tree.n.01').part_meronyms() >>> wn.synset('tree.n.01').part_meronyms() [Synset('burl.n.02'), Synset('crown.n.07'), [Synset('burl.n.02'), Synset('crown.n.07'), Synset('stump.n.01'), Synset('trunk.n.01'), Synset('stump.n.01'), Synset('trunk.n.01'), Synset('limb.n.02')] Synset('limb.n.02')]
>>> wn.synset('tree.n.01').substance_meronyms() >>> wn.synset('tree.n.01').substance_meronyms() [Synset('heartwood.n.01'), Synset('sapwood.n.01')] [Synset('heartwood.n.01'), Synset('sapwood.n.01')]
>>> wn.synset('tree.n.01').member_holonyms() >>> wn.synset('tree.n.01').member_holonyms() [Synset('forest.n.01')][Synset('forest.n.01')]
Reference: NLTK Book Chapter 2
– 29 – CSCE 771 Spring 2013
>>> for synset in wn.synsets('mint', wn.NOUN): >>> for synset in wn.synsets('mint', wn.NOUN):
... print synset.name + ':', synset.definition ... print synset.name + ':', synset.definition
... ...
batch.n.02: (often followed by `of') a large number or amount or batch.n.02: (often followed by `of') a large number or amount or extent extent
mint.n.02: any north temperate plant of the genus Mentha with mint.n.02: any north temperate plant of the genus Mentha with aromatic leaves and small mauve flowers aromatic leaves and small mauve flowers
mint.n.03: any member of the mint family of plants mint.n.03: any member of the mint family of plants
mint.n.04: the leaves of a mint plant used fresh or candied mint.n.04: the leaves of a mint plant used fresh or candied
mint.n.05: a candy that is flavored with a mint oil mint.n.05: a candy that is flavored with a mint oil
mint.n.06: a plant where money is coined by authority of the mint.n.06: a plant where money is coined by authority of the governmentgovernment
Reference: NLTK Book Chapter 2
– 30 – CSCE 771 Spring 2013
EntailmentsEntailments
walking entails stepping walking entails stepping
>>> wn.synset('walk.v.01').entailments() >>> wn.synset('walk.v.01').entailments() [Synset('step.v.01')][Synset('step.v.01')]
>>> wn.synset('eat.v.01').entailments() >>> wn.synset('eat.v.01').entailments() [Synset('swallow.v.01'), Synset('chew.v.01')] [Synset('swallow.v.01'), Synset('chew.v.01')]
>>> wn.synset('tease.v.03').entailments() >>> wn.synset('tease.v.03').entailments() [Synset('arouse.v.07'), Synset('disappoint.v.01')][Synset('arouse.v.07'), Synset('disappoint.v.01')]
Reference: NLTK Book Chapter 2
– 31 – CSCE 771 Spring 2013
AntonymsAntonyms
Reference: NLTK Book Chapter 2
– 32 – CSCE 771 Spring 2013
Semantic SimilaritySemantic Similarity>>> right = wn.synset('right_whale.n.01') >>> right = wn.synset('right_whale.n.01')
>>> orca = wn.synset('orca.n.01') >>> orca = wn.synset('orca.n.01')
>>> minke = wn.synset('minke_whale.n.01') >>> minke = wn.synset('minke_whale.n.01')
>>> tortoise = wn.synset('tortoise.n.01') >>> tortoise = wn.synset('tortoise.n.01')
>>> novel = wn.synset('novel.n.01') >>> novel = wn.synset('novel.n.01')
>>> right.lowest_common_hypernyms(minke) >>> right.lowest_common_hypernyms(minke) [Synset('baleen_whale.n.01')] [Synset('baleen_whale.n.01')]
>>> right.lowest_common_hypernyms(orca) >>> right.lowest_common_hypernyms(orca)
[Synset('whale.n.02')] [Synset('whale.n.02')]
>>> right.lowest_common_hypernyms(tortoise) >>> right.lowest_common_hypernyms(tortoise) [Synset('vertebrate.n.01')] [Synset('vertebrate.n.01')]
>>> right.lowest_common_hypernyms(novel) >>> right.lowest_common_hypernyms(novel)
[Synset('entity.n.01')][Synset('entity.n.01')]
Reference: NLTK Book Chapter 2
– 33 – CSCE 771 Spring 2013
Generality/Specificity and DepthGenerality/Specificity and Depth
>>> wn.synset('baleen_whale.n.01').min_depth() >>> wn.synset('baleen_whale.n.01').min_depth()
1414
>>> wn.synset('whale.n.02').min_depth() >>> wn.synset('whale.n.02').min_depth()
13 13
>>> wn.synset('vertebrate.n.01').min_depth() >>> wn.synset('vertebrate.n.01').min_depth()
8 8
>>> wn.synset('entity.n.01').min_depth() >>> wn.synset('entity.n.01').min_depth()
00
– 34 – CSCE 771 Spring 2013
Similarity Scores from Right WhaleSimilarity Scores from Right Whale
>>> right.path_similarity(minke) >>> right.path_similarity(minke)
0.25 0.25
>>> right.path_similarity(orca) >>> right.path_similarity(orca)
0.16666666666666666 0.16666666666666666
>>> right.path_similarity(tortoise)>>> right.path_similarity(tortoise)
0.076923076923076927 0.076923076923076927
>>> right.path_similarity(novel) >>> right.path_similarity(novel)
0.0434782608695652160.043478260869565216
– 35 – CSCE 771 Spring 2013
Googlecode - HowToGooglecode - HowTo
http://nltk.googlecode.com/svn/trunk/doc/howto/http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html wordnet.html
WordNet InterfaceWordNet Interface
>>> from nltk.corpus import wordnet as wn>>> from nltk.corpus import wordnet as wn
Reference: http://nltk.googlecode.com/svn/trunk/doc/howto/wordnet.html