Lec 13 (pptx) - LSA.303 Introduction to Computational Linguistics

Click here to load reader

download Lec 13 (pptx) - LSA.303 Introduction to Computational Linguistics

of 70

description

 

Transcript of Lec 13 (pptx) - LSA.303 Introduction to Computational Linguistics

LSA.303 Introduction to Computational Linguistics

Dan JurafskyLecture 13: Computational Lexical Semantics

CS 124/LINGUIST 180: From Languages to Information1Outline: Computational Lexical SemanticsIntro to Lexical Semantics Homonymy, Polysemy, SynonymyOnline resources: WordNetComputational Lexical SemanticsWord SimilarityThesaurus-basedDistributional2Three Perspectives on MeaningLexical SemanticsThe meanings of individual wordsFormal Semantics (or Compositional Semantics or Sentential Semantics)How those meanings combine to make meanings for individual sentences or utterances Discourse or PragmaticsHow those meanings combine with each other and with other facts about various kinds of context to make meanings for a text or discourseDialog or Conversation is often lumped together with Discourse3PreliminariesWhats a word?Definitions weve used over the quarter: Types, tokens, stems, roots, etc...

Lexeme: An entry in a lexicon consisting of a pairing of a form with a single meaning representationLexicon: A collection of lexemes4Relationships between word meaningsHomonymyPolysemySynonymyAntonymyHypernomyHyponomyMeronomy5First idea: The unit of meaning is called a Sense or wordsenseOne word bank can have multiple different meanings:Instead, a bank can hold the investments in a custodial account in the clients nameBut as agriculture burgeons on the east bank, the river will shrink even more

We say that a sense is a representation of one aspect of the meaning of a word.Thus bank here has two sensesBank1:Bank2:Some more terminologyLemmas and wordformsA lexeme is an abstract pairing of meaning and formA lemma or citation form is the grammatical form that is used to represent a lexeme.Carpet is the lemma for carpetsDormir is the lemma for duermes.Specific surface forms carpets, sung, duermes are called wordformsThe lemma bank has two senses:Instead, a bank can hold the investments in a custodial account in the clients nameBut as agriculture burgeons on the east bank, the river will shrink even more.A sense is a discrete representation of one aspect of the meaning of a word7HomonymyHomonymy:Lexemes that share a formPhonological, orthographic or bothBut have unrelated, distinct meaningsClear example: Bat (wooden stick-like thing) vs Bat (flying scary mammal thing)Or bank (financial institution) versus bank (riverside)Can be homophones, homographs, or both:Homophones:Write and rightPiece and peace8Homonymy causes problems for NLP applicationsText-to-SpeechSame orthographic form but different phonological form bass vs bassInformation retrievalDifferent meanings same orthographic formQUERY: bat careMachine TranslationSpeech recognitionWhy? 9Polysemy1. The bank was constructed in 1875 out of local red brick.2. I withdrew the money from the bank Are those the same sense?We might call sense 2:A financial institutionAnd sense 1The building belonging to a financial institutionOr consider the following exampleWhile some banks furnish sperm only to married women, others are less restrictive

Which sense of bank is this?10PolysemyWe call polysemy the situation when a single word has multiple related meanings (bank the building, bank the financial institution, bank the biological repository)Most non-rare words have multiple meanings

11Polysemy: A systematic relationship between sensesLots of types of polysemy are systematicSchool, university, hospitalCan all be used to mean the institution or the building.We might say there is a relationship:Building OrganizationOther such kinds of systematic polysemy:

How do we know when a word has more than one sense?Consider examples of the word serve:Which flights serve breakfast?Does America West serve Philadelphia?The zeugma test:

?Does United serve breakfast and San Jose?Since this sounds weird, we say that these are two different senses of serve13SynonymsWord that have the same meaning in some or all contexts.filbert / hazelnutcouch / sofabig / largeautomobile / carvomit / throw upWater / H20Two lexemes are synonyms if they can be successfully substituted for each other in all situationsIf so they have the same propositional meaning

14SynonymsBut there are few (or no) examples of perfect synonymy.Why should that be? Even if many aspects of meaning are identicalStill may not preserve the acceptability based on notions of politeness, slang, register, genre, etc.Example:Water and H20Big/largeBrave/courageous15Synonymy is a relation between senses rather than wordsConsider the words big and largeAre they synonyms?How big is that plane?Would I be flying on a large or small plane?How about here:Miss Nelson, for instance, became a kind of big sister to Benjamin.?Miss Nelson, for instance, became a kind of large sister to Benjamin.Why?big has a sense that means being older, or grown uplarge lacks this sense16AntonymsSenses that are opposites with respect to one feature of their meaningOtherwise, they are very similar!dark / lightshort / longhot / coldup / downin / outMore formally: antonyms candefine a binary opposition or at opposite ends of a scale (long/short, fast/slow)Be reversives: rise/fall, up/down17HyponymyOne sense is a hyponym of another if the first sense is more specific, denoting a subclass of the othercar is a hyponym of vehicledog is a hyponym of animalmango is a hyponym of fruitConverselyvehicle is a hypernym/superordinate of caranimal is a hypernym of dogfruit is a hypernym of mango

superordinatevehiclefruitfurnituremammalhyponymcarmangochairdog18Hypernymy more formallyExtensional:The class denoted by the superordinateextensionally includes the class denoted by the hyponymEntailment:A sense A is a hyponym of sense B if being an A entails being a BHyponymy is usually transitive (A hypo B and B hypo C entails A hypo C)19II. WordNetA hierarchically organized lexical databaseOn-line thesaurus + aspects of a dictionaryVersions for other languages are under development

CategoryUnique FormsNoun117,097Verb11,488Adjective22,141Adverb4,60120WordNetWhere it is:http://www.cogsci.princeton.edu/cgi-bin/webwn

21Applications of OntologiesInformation ExtractionBioinformaticsMedical InformaticsInformation RetrievalQuestion AnsweringMachine TranslationDigital LibrariesBusiness Process ModelingUser InterfacesFormat of Wordnet Entries

23WordNet Noun Relations

24WordNet Verb Relations

25WordNet Hierarchies

26How is sense defined in WordNet?The set of near-synonyms for a WordNet sense is called a synset (synonym set); its their version of a sense or a conceptExample: chump as a noun to mean a person who is gullible and easy to take advantage of

Each of these senses share this same glossThus for WordNet, the meaning of this sense of chump is this list.

27Thesauri Examples : MeSHMeSH (Medical Subject Headings)organized by terms (~ 250,000) that correspond to medical subjectsfor each term syntactic, morphological or semantic variants are givenMeSH Heading Databases, GeneticEntry Term Genetic DatabasesEntry Term Genetic Sequence DatabasesEntry Term OMIMEntry Term Online Mendelian Inheritance in ManEntry Term Genetic Data BanksEntry Term Genetic Data BasesEntry Term Genetic DatabanksEntry Term Genetic Information DatabasesSee Also Genetic ScreeningSlide from Paul Buitelaar28MeSH (Medical Subject Headings) Thesaurus29

MeSH DescriptorDefinitionSynonym setSlide from Illhoi Yoo, Xiaohua (Tony) Hu,and Il-Yeol Song29MeSH TreeMeSH OntologyHierarchically arranged from most general to most specific.Actually a graph rather than a treenormally appear in more than one place in the tree

MeSH TreeSlide from Illhoi Yoo, Xiaohua (Tony) Hu,and Il-Yeol Song30The second part of MeSH is MeSH Tree. This picture shows a part of MeSH Tree. As you see, MeSH concepts are hierarchically arranged from most general to most specific. Because they normally appear in more than one place in the tree, they are represented in a graph rather than a tree.MeSH OntologyMeSH OntologySolving traditional synonym/hypernym/hyponym problems in information retrieval and text mining.Synonym problems