From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data...
-
Upload
gyles-gray -
Category
Documents
-
view
224 -
download
0
Transcript of From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data...
From Lexical Semantics to Knowledge Systems:How to infer cognitive systems
from linguistic data
Chu-Ren HuangAcademia Sinica
http://cwn.ling.sinica.edu.tw/huang/huang.htm
2007.03.09 ISLCC Chu-Ren Huang
Outline A generative lexicalist approach to gram
mar From distributional data to the basic contras
ts in a semantic field (or conceptual motivation for corpus distribution)
Lexical distribution as cognitive model Radical as ontology Language as a knowledge system
2007.03.09 ISLCC Chu-Ren Huang
Introduction: A generative lexicalist approach to grammarBack to Aristotle (through Pustejovsky) How do know and know and what do we kn
ow: through what we experience Qualia Structure: what we experience
Formal Constitutive Agentive Telic
2007.03.09 ISLCC Chu-Ren Huang
Linguistics: What do we know about language Qualia Structure of Theory of Language
Formal: from Sign to Structure, Structuralism Constitutive: from IA to IP, rule and transformatio
n based theories Agentive: UG approaches Telic: Function and Use based Theories
We need a linguistic theory that accounts for the complete knowledge structure, not just its individual aspects
2007.03.09 ISLCC Chu-Ren Huang
Towards Language as Knowledge System Atoms of knowledge : lexicalized concepts ‘frames’ of knowledge : lexical semantic rel
ations Instantiation of knowledge : corpus
lexicon-driven, corpus-basedto infer knowledge structure underlying lingu
istic structure
2007.03.09 ISLCC Chu-Ren Huang
Three Studies The semantic field of emotion: ( elaborate
d from Chang et al. 2000 ) Lexicalized Model of Cognition:
(Huang and Hong 2005) Conventionalized Ontology in Writing
( Chou and Huang 2005 )
2007.03.09 ISLCC Chu-Ren Huang
Semantic Field of Verbs of Emotion
Issues: Methodological Interpretation of Distributional Data Measuring and Interpreting lexical choices
Issues: Linguistic Archetype Via Contrast Why Change-of-State:
Saliency and relevance to human cognition
2007.03.09 ISLCC Chu-Ren Huang
Distributional Contrast of Verbs of Emotion高興 gao1xing4 (Type A) Vs. 快樂 kuai4le4 (Type B) Category: intrans. vs. trans. state verb Function: more predicative vs. more nominalized Collocation: CAUSE complement vs. no CAUSE Collocation: Perfect aspect vs. no -le Collocation (modified nouns): Eventive vs. no sel
ection Interpretation (Imperative): Command vs. Wish
2007.03.09 ISLCC Chu-Ren Huang
A Natural Dichotomy of Verbs of Emotion
Subtype Type A Type BHappiness gao1xing4高興 (669) kuai4le4快樂 (942)
kai1xin1 開心 (152) yu2kuai4 愉快 (271)
tong4kuai4 痛快 (40) xi3yue4 喜悅 (156)
huan1le4 歡樂 (141)
huan1xi3 歡喜 (107)
kuai4huo2 快活 (48)
Depression nan2guo4難過 (232) Tong4ku3痛苦 (443)
tong4xin1 痛心 (48) chen2zhong4 沈重 (83)ju3sang4 沮喪 (62)
2007.03.09 ISLCC Chu-Ren Huang
A Natural Dichotomy of Verbs of EmotionSubtype Type A Type B
Sadness hang1xin1傷心 (134) bei1shang1悲傷 (52)
Regret hou4hui3後悔 (102) yi2han4遺憾 (198)
Anger seng1qi4生氣 (307) fen4nu4憤怒 (112)
qi4fen4 氣憤 (49)Fear hai4pa4害怕 (261) kong3ju4恐懼 (149)
wei4ju4 畏懼 (40)Worry dan1xin1擔心 (609) fan2nao3煩惱 (199)
dan1you1 擔憂 (64) ku3nao3 苦惱 (45)
you1xin1 憂心 (46)
2007.03.09 ISLCC Chu-Ren Huang
Some Observations Each of the seven kinds of emotion verbs s
how the same dichotomy: change-of-state vs. homogeneous state
Each side of the dichotomy is dominated by a dominating verb in terms of frequency and prototypicality of meani
ng
2007.03.09 ISLCC Chu-Ren Huang
Semantic Field and Contrast Set
A semantic field is consisted of a unique covering term and a number of contrast sets. Paraphrase of Grandy 1992 The unique covering term may or may not occur i
n a contrast set. All other members of the semantic field must be
determined by entering into a contrast set relation with a known member of the semantic field.
2007.03.09 ISLCC Chu-Ren Huang
Observation: Chinese Defines a Property by Contrast
qing1zhong4 light+heavy = weight da4xiao3 big+small = size gao1ai3 tall+short = height shi4fei1/dui4cuo4 right+wrong = affair xiong1di4 elder+younger = brothers zang1pi3 praise+attack = criticize hu1xi1 exhale+inhale = breathe
2007.03.09 ISLCC Chu-Ren Huang
Our Proposal T is either a single term or a privileged
contrast set, called a contrast pair. When T is a contrast pair, the semantic
field can be defined by the shared semantic properties of the pair.
The fundamental contrast relation defining a contrast pair may be shared by a super-set of semantic fields.
2007.03.09 ISLCC Chu-Ren Huang
Our Proposal T must enter contrast set relations with
other members of the semantic field, although the contrast relation may be weakened to a marked/unmarked contrast.
The set of fundamental contrast relations are shared by all semantic fields. [cf. Semantic relations]
2007.03.09 ISLCC Chu-Ren Huang
Patterns of Distribution as Representational Clues
Numbers Don’t Lie The pattern itself is a proof that
generalizations based on a single lexical item is replicable.
The uniformity and universality of the pattern across a broad but contiguous semantic field strongly favors a conceptual motivation.
2007.03.09 ISLCC Chu-Ren Huang
Functional Distribution of Type A Verbs of Emotion
Type A Pred. Nom. N.M.gao1xing4 85.05% 0.30% 1.35%nan2guo4 86.64% 2.16% 2.59%shang1xin1 76.12% 2.99% 11.19%hou4hui3 94.12% 0.00% 2.94%sheng1qi4 87.82% 0.00% 4.06%hai4pa4 93.10% 3.07% 2.68%dan1xin1 96.72% 1.97% 1.31%Average 88.51% 1.50% 3.73%
2007.03.09 ISLCC Chu-Ren Huang
Functional Distribution of Type B Verbs of Emotion
Type B Pred. Nom. N.M.kuai4le4 37.79% 26.43% 24.84%tong4ku3 25.73% 45.60% 20.54%bei1shang1 40.38% 28.85% 19.23%yi2han4 34.85% 33.84% 3.54%fen4nu4 28.57% 37.50% 17.86%kong3ju4 23.49% 68.46% 7.38%fan2nao3 24.12% 69.85% 6.03%Average 30.70% 44.36% 14.21%
2007.03.09 ISLCC Chu-Ren Huang
Preference of A verbs over B verbs in Predicative Uses
Verbs Pred.-Freq. A/B Ratiogaoxing/kuaile 569/356 1.59nanguo/tongku 201/114 1.76shangxin/beishang 102/21 4.86houhui/yihan 96/69 1.39shengqi/fennu 238/32 7.44haipa/kongju 243/35 6.94danxin/fannao 589/48 12.27Average ratio 5.62
2007.03.09 ISLCC Chu-Ren Huang
Preference of B verbs over A verbs in Nominal Uses
Verbs Nom.-Freq. B/A Ratio gaoxing/kuaile 11/483 43.91nanguo/tongku 11/293 26.64shangxin/beishang 19/25 1.32houhui/yihan 3/74 24.67shengqi/fennu 11/62 5.64haipa/kongju 15/113 7.53danxin/fannao 20/151 7.55Average ratio 16.75
2007.03.09 ISLCC Chu-Ren Huang
Summary of the Likelyhood Ratio Data A clear lexical preference between near-sy
nonyms are established. Predicative preference and deverbal prefer
ence tend to compensate each other to establish contrast.
Overall, the deverbal preference seems to be the defining feature of the dichotomy. [note that these are all verbs.]
2007.03.09 ISLCC Chu-Ren Huang
Deverbal Use Frequency ofType A Verbs
tong4kuai4 痛快 0.00%gao1xing4 高興 1.65%hou4hui3 後悔 2.94%dan1xin1 擔心 3.28%sheng1qi4 生氣 3.58%tong4xin1 痛心 4.17%nan2guo4 難過 4.75%hai4pa4 害怕 5.75%you1xin1 憂心 6.52%kai1xin1 開心 7.89%dan1you1 擔憂 9.38%shang1xin1 傷心 14.18%
2007.03.09 ISLCC Chu-Ren Huang
Deverbal Use Frequency ofType B Verbsqi4fen4 氣憤 24.49% chen1zhong4 沈重 48.19%wei4ju4 畏懼 25.00% kuai4le4 快樂 51.27%yu2kuai4 愉快 29.89% fen4nu4 憤怒 55.36%huan1xi1 歡喜 30.84% tong4ku3 痛苦 66.14%kuai4huo2 快活 33.33% kong3ju4 恐懼 75.84%ju3sang4 沮喪 33.87% fan2nao3 煩惱 75.88%yi2han4 遺憾 37.38% xi1yue4 喜悅 92.20%ku3nao3 苦惱 46.67% huan1le1 歡樂 92.91%bei1shang1 悲傷 48.08%
2007.03.09 ISLCC Chu-Ren Huang
Deverbal Use Frequency as a Benchmark for Type A/B Verbs
More than 10% differentiates the lowest Type B verb (qi4fen4 氣憤 24.49%) from the highest Type A verbs (shang1xin1 傷心 14.18%).
The smallest gap between a competing pair is almost 34% (shang1xin1 傷心 14.18% vs. bei1shang1 悲傷 48.08% ).
2007.03.09 ISLCC Chu-Ren Huang
The Noisy-Channel Model of Theory of Communication
Our Proposal Language is an information-based
communication system. An optimized communication system is
where all redundant signs (for one piece of information) also minimally differentiate another piece of information.
2007.03.09 ISLCC Chu-Ren Huang
Re-Interpretation of the Data Members of the same semantic field in
general, and a near-synonym pair in particular, are competing signs to express information pertaining to the field.
A sign is chosen to represent a piece of information because it expresses that piece of information most effectively.
2007.03.09 ISLCC Chu-Ren Huang
Re-Interpretation of the Data This preference for expressing certain infor
mation can be lexicalized to establish logical implicature.
Once that lexical preference is established, linguists could use the preferential ratio to infer the lexical information being carried.
2007.03.09 ISLCC Chu-Ren Huang
Lexical distribution as cognitive model: Senses A further step based on property defined by
contrast, with focus on how senses are represented
Study the sense of hearing and the basic property term of sheng-yin ‘sound/voice’
We (Huang and Hong 2005) look at the distribution of these two lexical elements in all derived words
2007.03.09 ISLCC Chu-Ren Huang
聲 Sheng vs. 音 Yin 聲樂 vs. 音樂 vocal music vs. music 發聲 vs. 發音 make a sound vs. articulate
高聲 vs. 高音 loudly vs. high pitch
*噪聲 vs. 噪音 noise
大聲 vs. *大音 loudly
2007.03.09 ISLCC Chu-Ren Huang
NN Compound N+*聲 Sheng +source 歌 掌 人 腳步 風 鐘 水 …
音 Yin + quality 嗓 鄉 喉 裝飾 尾 哨 …
2007.03.09 ISLCC Chu-Ren Huang
The semantic Contrast 聲
Production of sounds
Often refers to the manner or source of haw a sound was made
音 Perception of a soun
d Often refers to the s
ound quality or how a sound is perceived by an intelligent agent
2007.03.09 ISLCC Chu-Ren Huang
A Lexicalized Schema for A Lexicalized Schema for Hearing in ChineseHearing in Chinese From Huang and Hong 2005
Process of Hearing
聲 sheng 音 yin
起點、來源 source 終點、結果 goal 主動完成 production 被動接收 reception
發動者 (instigator)
經驗者 (experiencer)
2007.03.09 ISLCC Chu-Ren Huang
A Lexicalized Schema for A Lexicalized Schema for Sense in ChineseSense in Chinese
Process of Sensation
word1word1 word0word0 經驗者 (experiencer)
Goal/perceptiopn: experience of sense
感知接收 (sensation)
2007.03.09 ISLCC Chu-Ren Huang
詞彙詞義分析詞彙詞義分析 (7)(7)「視覺」、「觸覺」與「聽覺」三者的關係圖示
特徵
詞彙
認知特徵的對比
感覺發動者(instigator of action)
— marked
感覺經驗者(experiencer of sensation)— shared and unmarked
聽覺 聲 (production) 音 (perception)
視覺 看 (inchoative) 見 (bounded result)
觸覺 觸 (activity) 摸 (incremental theme)
perceptionperception
2007.03.09 ISLCC Chu-Ren Huang
Radical as ontology Chinese writing system has been
conventionalized and shared for over three thousand years
And adopted by typologically very different languages
If the radical system is a system of conceptualization, then it is the most robust and most widely used ontology
2007.03.09 ISLCC Chu-Ren Huang
Example: the horse radical (from Chou 2005)
馬 is a semantic symbol of horse
Examples: 驩 : 馬名 a kind of horse 驫 : 眾馬 horses 騎 : 騎馬 riding a horse 驍 : 良馬 a good horse 驚 : 馬驚 a scared horse
馬
2007.03.09 ISLCC Chu-Ren Huang
Research Tool and Issue
Formal Description IEEE SUMO ( Suggested Upper Merged Ontology)
http://www.ontologyportal.orghttp://BOW.sinica.edu.tw
Issue: Why Chinese radicals are Issue: Why Chinese radicals are usually considered as a imperfect usually considered as a imperfect and misleading taxonomy?and misleading taxonomy?
2007.03.09 ISLCC Chu-Ren Huang
Knowledge System of the Radical 艸 /艹 (Grass, for Plants)
蕃藥蔬菜薪苑藩藉茭
萌莖芽茄苗蓮葉
蕉蘭芒蒙菌蔓苦菊茱范荷茅蕈蔚菲草
Parts
DescriptionUsage
Plants
IS-A Constitutive Descriptive/formal
telic
茲蒼芳落茸茂荒薄芬蒸莊
2007.03.09 ISLCC Chu-Ren Huang
Conclusion I:Corpus as Evidence Core issue of a scientific explanation of language
and cognition Language as an living organism allows variations
and adaptations (the evolutionary view) The coherence of language is the shared
tendency of all users Distributional data in corpus lead to discovery of
these shared tendencies This should be more valuable than incidental example
2007.03.09 ISLCC Chu-Ren Huang
Conclusion II: Language as a Knowledge System The generative lexicalist approach to gra
mmar: language as a knowledge system All aspects of Language are projected fro
m a unified knowledge system Lexical semantics based on distributional
data offers the best window to the underlying knowledge system of language