CS 4705 Probabilistic Approaches to Pronunciation and Spelling.
CS 4705 Morphology: Words and their Parts CS 4705.
-
date post
22-Dec-2015 -
Category
Documents
-
view
220 -
download
2
Transcript of CS 4705 Morphology: Words and their Parts CS 4705.
CS 4705
Morphology: Wordsand their Parts
CS 4705
Basic Uses of Morphology
• The study of how words are composed from smaller, meaning-bearing units (morphemes)
• Applications:– Spelling correction: referece– Hyphenation algorithms: refer-ence– Part-of-speech analysis: googler– Text-to-speech: grapheme-to-phoneme
conversion• hothouse (/T/ or /D/)
– Speech recognition: phoneme-to-grapheme conversion
– Artificial languages in standardized tests• ‘Twas brillig and the slithy toves…
• Muggles moogled migwiches
What is a word?
• In formal languages, words are arbitrary strings• In natural languages, words are made up of
meaningful subunits called morphemes– Allows for productivity: googled, texted– Subword units express concepts denoting
entities or relationships in the world• Roots +• Syntactic or grammatical elements
– Realizations of morphemes: morphs• Door realizes door; take and took realize take
• Allomorphs are classes of related morphs that realize a given morpheme
– Allomorphs of s include en, men, es in English– Take and took are allomorphs of take
• Syntactic or grammatical morphemes can convey many things– In Italian, nouns are marked for gender and number
Singular PluralMasc pomodoro pomodoriFem cipolla cipolle
– pomodor- cipoll- are called stems, which may or may not occur on their own as words
– Stem may not occur as a word: derivative/deriv– Base form (lemma) occurs as word: derivative/derive– Sometimes the same: cars has stem ‘car’ and base form
or lemma ‘car’ too
What information does morphology give us?
• Differs by language– Spanish: hablo, hablaré/ English: I speak, I will
speak– English: book, books/ Japanese: hon, hon
• Languages also differ in how they encode information– Isolating languages (e.g. Mandarin) have no
bound forms (affixes) that attach to a word
– Agglutinative languages (e.g. Finnish, Turkish) are composed of prefixes and suffixes added to a stem like beads on a string – each feature is expressed by a single affix
– Inflectional languages (e.g. English) merges different features into a single affix (e.g. person and tense of verbs); same feature can be realized by different affixes
– Polysynthetic languges (e.g. Inuit languages) express much of their syntax in their morphology, incorporating a verb’s arguments into the verb, e.g.
– So….different languages may require very different morphological analyzers
Morphology Helps Define Word Classes
• AKA morphological classes, parts-of-speech• Closed vs. open (function vs. content) class words
– Pronoun, preposition, conjunction, determiner,…
– Noun, verb, adverb, adjective,…• Identifying word classes is useful for almost any
task in NLP, from translation to speech recognition to topic detection…
Inflectional Morphology
• Word stem + grammatical morpheme different forms of same word– Usually produces word of same class– Usually serves a syntactic or grammatical
function (e.g. agreement)like likes or likedbird birds
• Nominal morphology– Plural forms
• s or es• Irregular forms (goose/geese)
• Mass vs. count nouns (fish/fish(es), email or emails?)
– Possessives (cat’s, cats’)
• Verbal inflection
– Main verbs (sleep, like, fear) relatively regular• -s, ing, ed
• And productive: emailed, instant-messaged, faxed, homered
• But some are not:
– eat/ate/eaten, catch/caught/caught
– Primary (be, have, do) and modal verbs (can, will, must) often irregular and not productive
» Be: am/is/are/were/was/been/being
– Irregular verbs few (~250) but frequently occurring
• Particles occur in only one form: in English– Prepositions: to, from– Adverbs: happily, quickly– Conjunctions: but, and– Articles: the, a, an
• So….English inflectional morphology is fairly easy to model….with some special cases...
Derivational Morphology
• Word stem + syntactic/grammatical morpheme new words– Usually produces word of different class– Incomplete process: derivational morphs cannot
be applied to just any member of a class• Verbs --> nouns
– -ize verbs -ation nouns– generalize, realize generalization, realization
• Verbs, nouns adjectives– embrace, pity embraceable, pitiable– care, wit careless, witless
• Adjective adverb– happy happily
• But process is selective in unpredictable ways– Less productive: nerveless/*evidence-less,
malleable/*sleep-able, rar-ity/*rareness– Meanings of derived terms harder to predict by
rule• clueless, careless, nerveless, sleepless
• Derivation can be applied recursively:– Hospital hospitalize hospitalization
prehospitalization …– Morphological analysis identifies concatenative
process as well as morphemes[pre[[[hospital]ize]ation]]
– Bracketing paradoxesunhappier
[un[happier]: not happier
[[unhappy]er]: more unhappy
Compounding
• Two base forms join to form a new word– Bedtime, Weinerschnitzel, Rotwein– Careful? Compound or derivation?
Affixes can be attached to stems in different ways
– Prefixation• Immaterial
– Suffixation: more common across languages than prefixation
• Trying
– Circumfixation: combine prefixation and suffixation
• Gesagt
– Infixation• English: Absobl**dylutely
• Bontoc: ‘um’ turns adjectives and nouns into verbs (kilad (red) kumilad (to be red))
Concatenative vs. non-concatenative morphology
• Semitic root-and-pattern morphology– Root (2-4 consonants) conveys basic semantics
(e.g. Arabic /ktb/)– Vowel pattern conveys voice and aspect– Derivational template (binyan) identifies word
class
Template Vowel Pattern
active passive
CVCVC katab kutib write
CVCCVC kattab kuttib cause to write
CVVCVC ka:tab ku:tib correspond
tVCVVCVC taka:tab tuku:tibwrite each other
nCVVCVC nka:tab nku:tib subscribe
CtVCVC ktatab ktutib write
stVCCVC staktab stuktib dictate
Morphotactics
• What are the ‘rules’ for word construction in a language?– pseudointellectual vs. *intellectualpseudo– rationalize vs *izerational– cretinous vs. *cretinly vs. *cretinacious
• Possible ‘rules’– Suffixes are suffixes and prefixes are prefixes– Certain affixes attach to certain types of stems
(nouns, verbs, etc.)– Certain stems can/cannot take certain affixes, e.g.
• Semantics: In English, un- cannot attach to adjectives that already have a negative connotation:– Unhappy vs. *unsad– Unhealthy vs. *unsick– Unclean vs. *undirty
• Phonology: In English, -er cannot attach to words of more than two syllables– great, greater– Happy, happier– Competent, *competenter– Elegant, *eleganter– Unruly, unrulier????
Morphological Representations: Evidence from Human Performance
• Hypotheses:– Full listing hypothesis: words listed – Minimum redundancy hypothesis:
morphemes listed• Experimental evidence:
– Priming experiments (Does seeing/hearing one word facilitate recognition of another?) suggest neither
– Regularly inflected forms (e.g. cars) prime stem (car) but not derived forms (e.g. management, manage)
– But spoken derived words can prime stems if they are semantically close (e.g. government/govern but not department/depart)
• Speech errors suggest affixes must be represented separately in the mental lexicon– ‘easy enoughly’ for ‘easily enough’
Summing Up
• Different languages have different morphological systems– If we can discover how to decode such a
system, we can identify useful information about the word class and the semantic meaning of a word
– Morphological rules provide basis for morphological analyzers (computational morphology)
• Next time: – Read Ch 3.2-3.8 (new version)