From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data...

40
From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica http://cwn.ling.sinica.edu.tw/huang/huan g.htm

Transcript of From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data...

Page 1: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

From Lexical Semantics to Knowledge Systems:How to infer cognitive systems

from linguistic data

Chu-Ren HuangAcademia Sinica

http://cwn.ling.sinica.edu.tw/huang/huang.htm

Page 2: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Outline A generative lexicalist approach to gram

mar From distributional data to the basic contras

ts in a semantic field (or conceptual motivation for corpus distribution)

Lexical distribution as cognitive model Radical as ontology Language as a knowledge system

Page 3: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Introduction: A generative lexicalist approach to grammarBack to Aristotle (through Pustejovsky) How do know and know and what do we kn

ow: through what we experience Qualia Structure: what we experience

Formal Constitutive Agentive Telic

Page 4: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Linguistics: What do we know about language Qualia Structure of Theory of Language

Formal: from Sign to Structure, Structuralism Constitutive: from IA to IP, rule and transformatio

n based theories Agentive: UG approaches Telic: Function and Use based Theories

We need a linguistic theory that accounts for the complete knowledge structure, not just its individual aspects

Page 5: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Towards Language as Knowledge System Atoms of knowledge : lexicalized concepts ‘frames’ of knowledge : lexical semantic rel

ations Instantiation of knowledge : corpus

lexicon-driven, corpus-basedto infer knowledge structure underlying lingu

istic structure

Page 6: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Three Studies The semantic field of emotion: ( elaborate

d from Chang et al. 2000 ) Lexicalized Model of Cognition:

(Huang and Hong 2005) Conventionalized Ontology in Writing

( Chou and Huang 2005 )

Page 7: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Semantic Field of Verbs of Emotion

Issues: Methodological Interpretation of Distributional Data Measuring and Interpreting lexical choices

Issues: Linguistic Archetype Via Contrast Why Change-of-State:

Saliency and relevance to human cognition

Page 8: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Distributional Contrast of Verbs of Emotion高興 gao1xing4 (Type A) Vs. 快樂 kuai4le4 (Type B) Category: intrans. vs. trans. state verb Function: more predicative vs. more nominalized Collocation: CAUSE complement vs. no CAUSE Collocation: Perfect aspect vs. no -le Collocation (modified nouns): Eventive vs. no sel

ection Interpretation (Imperative): Command vs. Wish

Page 9: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

A Natural Dichotomy of Verbs of Emotion

Subtype Type A Type BHappiness gao1xing4高興 (669) kuai4le4快樂 (942)

kai1xin1 開心 (152) yu2kuai4 愉快 (271)

tong4kuai4 痛快 (40) xi3yue4 喜悅 (156)

huan1le4 歡樂 (141)

huan1xi3 歡喜 (107)

kuai4huo2 快活 (48)

Depression nan2guo4難過 (232) Tong4ku3痛苦 (443)

tong4xin1 痛心 (48) chen2zhong4 沈重 (83)ju3sang4 沮喪 (62)

Page 10: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

A Natural Dichotomy of Verbs of EmotionSubtype Type A Type B

Sadness hang1xin1傷心 (134) bei1shang1悲傷 (52)

Regret hou4hui3後悔 (102) yi2han4遺憾 (198)

Anger seng1qi4生氣 (307) fen4nu4憤怒 (112)

qi4fen4 氣憤 (49)Fear hai4pa4害怕 (261) kong3ju4恐懼 (149)

wei4ju4 畏懼 (40)Worry dan1xin1擔心 (609) fan2nao3煩惱 (199)

dan1you1 擔憂 (64) ku3nao3 苦惱 (45)

you1xin1 憂心 (46)

Page 11: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Some Observations Each of the seven kinds of emotion verbs s

how the same dichotomy: change-of-state vs. homogeneous state

Each side of the dichotomy is dominated by a dominating verb in terms of frequency and prototypicality of meani

ng

Page 12: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Semantic Field and Contrast Set

A semantic field is consisted of a unique covering term and a number of contrast sets. Paraphrase of Grandy 1992 The unique covering term may or may not occur i

n a contrast set. All other members of the semantic field must be

determined by entering into a contrast set relation with a known member of the semantic field.

Page 13: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Observation: Chinese Defines a Property by Contrast

qing1zhong4 light+heavy = weight da4xiao3 big+small = size gao1ai3 tall+short = height shi4fei1/dui4cuo4 right+wrong = affair xiong1di4 elder+younger = brothers zang1pi3 praise+attack = criticize hu1xi1 exhale+inhale = breathe

Page 14: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Our Proposal T is either a single term or a privileged

contrast set, called a contrast pair. When T is a contrast pair, the semantic

field can be defined by the shared semantic properties of the pair.

The fundamental contrast relation defining a contrast pair may be shared by a super-set of semantic fields.

Page 15: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Our Proposal T must enter contrast set relations with

other members of the semantic field, although the contrast relation may be weakened to a marked/unmarked contrast.

The set of fundamental contrast relations are shared by all semantic fields. [cf. Semantic relations]

Page 16: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Patterns of Distribution as Representational Clues

Numbers Don’t Lie The pattern itself is a proof that

generalizations based on a single lexical item is replicable.

The uniformity and universality of the pattern across a broad but contiguous semantic field strongly favors a conceptual motivation.

Page 17: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Functional Distribution of Type A Verbs of Emotion

Type A Pred. Nom. N.M.gao1xing4 85.05% 0.30% 1.35%nan2guo4 86.64% 2.16% 2.59%shang1xin1 76.12% 2.99% 11.19%hou4hui3 94.12% 0.00% 2.94%sheng1qi4 87.82% 0.00% 4.06%hai4pa4 93.10% 3.07% 2.68%dan1xin1 96.72% 1.97% 1.31%Average 88.51% 1.50% 3.73%

Page 18: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Functional Distribution of Type B Verbs of Emotion

Type B Pred. Nom. N.M.kuai4le4 37.79% 26.43% 24.84%tong4ku3 25.73% 45.60% 20.54%bei1shang1 40.38% 28.85% 19.23%yi2han4 34.85% 33.84% 3.54%fen4nu4 28.57% 37.50% 17.86%kong3ju4 23.49% 68.46% 7.38%fan2nao3 24.12% 69.85% 6.03%Average 30.70% 44.36% 14.21%

Page 19: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Preference of A verbs over B verbs in Predicative Uses

Verbs Pred.-Freq. A/B Ratiogaoxing/kuaile 569/356 1.59nanguo/tongku 201/114 1.76shangxin/beishang 102/21 4.86houhui/yihan 96/69 1.39shengqi/fennu 238/32 7.44haipa/kongju 243/35 6.94danxin/fannao 589/48 12.27Average ratio 5.62

Page 20: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Preference of B verbs over A verbs in Nominal Uses

Verbs Nom.-Freq. B/A Ratio gaoxing/kuaile 11/483 43.91nanguo/tongku 11/293 26.64shangxin/beishang 19/25 1.32houhui/yihan 3/74 24.67shengqi/fennu 11/62 5.64haipa/kongju 15/113 7.53danxin/fannao 20/151 7.55Average ratio 16.75

Page 21: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Summary of the Likelyhood Ratio Data A clear lexical preference between near-sy

nonyms are established. Predicative preference and deverbal prefer

ence tend to compensate each other to establish contrast.

Overall, the deverbal preference seems to be the defining feature of the dichotomy. [note that these are all verbs.]

Page 22: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Deverbal Use Frequency ofType A Verbs

tong4kuai4 痛快 0.00%gao1xing4 高興 1.65%hou4hui3 後悔 2.94%dan1xin1 擔心 3.28%sheng1qi4 生氣 3.58%tong4xin1 痛心 4.17%nan2guo4 難過 4.75%hai4pa4 害怕 5.75%you1xin1 憂心 6.52%kai1xin1 開心 7.89%dan1you1 擔憂 9.38%shang1xin1 傷心 14.18%

Page 23: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Deverbal Use Frequency ofType B Verbsqi4fen4 氣憤 24.49% chen1zhong4 沈重 48.19%wei4ju4 畏懼 25.00% kuai4le4 快樂 51.27%yu2kuai4 愉快 29.89% fen4nu4 憤怒 55.36%huan1xi1 歡喜 30.84% tong4ku3 痛苦 66.14%kuai4huo2 快活 33.33% kong3ju4 恐懼 75.84%ju3sang4 沮喪 33.87% fan2nao3 煩惱 75.88%yi2han4 遺憾 37.38% xi1yue4 喜悅 92.20%ku3nao3 苦惱 46.67% huan1le1 歡樂 92.91%bei1shang1 悲傷 48.08%

Page 24: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Deverbal Use Frequency as a Benchmark for Type A/B Verbs

More than 10% differentiates the lowest Type B verb (qi4fen4 氣憤 24.49%) from the highest Type A verbs (shang1xin1 傷心 14.18%).

The smallest gap between a competing pair is almost 34% (shang1xin1 傷心 14.18% vs. bei1shang1 悲傷 48.08% ).

Page 25: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

The Noisy-Channel Model of Theory of Communication

Our Proposal Language is an information-based

communication system. An optimized communication system is

where all redundant signs (for one piece of information) also minimally differentiate another piece of information.

Page 26: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Re-Interpretation of the Data Members of the same semantic field in

general, and a near-synonym pair in particular, are competing signs to express information pertaining to the field.

A sign is chosen to represent a piece of information because it expresses that piece of information most effectively.

Page 27: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Re-Interpretation of the Data This preference for expressing certain infor

mation can be lexicalized to establish logical implicature.

Once that lexical preference is established, linguists could use the preferential ratio to infer the lexical information being carried.

Page 28: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Lexical distribution as cognitive model: Senses A further step based on property defined by

contrast, with focus on how senses are represented

Study the sense of hearing and the basic property term of sheng-yin ‘sound/voice’

We (Huang and Hong 2005) look at the distribution of these two lexical elements in all derived words

Page 29: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

聲 Sheng vs. 音 Yin 聲樂 vs. 音樂 vocal music vs. music 發聲 vs. 發音 make a sound vs. articulate

高聲 vs. 高音 loudly vs. high pitch

*噪聲 vs. 噪音 noise

大聲 vs. *大音 loudly

Page 30: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

NN Compound N+*聲 Sheng +source 歌 掌 人 腳步 風 鐘 水 …

音 Yin + quality 嗓 鄉 喉 裝飾 尾 哨 …

Page 31: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

The semantic Contrast 聲

Production of sounds

Often refers to the manner or source of haw a sound was made

音 Perception of a soun

d Often refers to the s

ound quality or how a sound is perceived by an intelligent agent

Page 32: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

A Lexicalized Schema for A Lexicalized Schema for Hearing in ChineseHearing in Chinese From Huang and Hong 2005

Process of Hearing

聲 sheng 音 yin

起點、來源 source 終點、結果 goal 主動完成 production 被動接收 reception

發動者 (instigator)

經驗者 (experiencer)

Page 33: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

A Lexicalized Schema for A Lexicalized Schema for Sense in ChineseSense in Chinese

Process of Sensation

word1word1 word0word0 經驗者 (experiencer)

Goal/perceptiopn: experience of sense

感知接收 (sensation)

Page 34: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

詞彙詞義分析詞彙詞義分析 (7)(7)「視覺」、「觸覺」與「聽覺」三者的關係圖示

特徵

詞彙

認知特徵的對比

感覺發動者(instigator of action)

— marked

感覺經驗者(experiencer of sensation)— shared and unmarked

聽覺 聲 (production) 音 (perception)

視覺 看 (inchoative) 見 (bounded result)

觸覺 觸 (activity) 摸 (incremental theme)

perceptionperception

Page 35: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Radical as ontology Chinese writing system has been

conventionalized and shared for over three thousand years

And adopted by typologically very different languages

If the radical system is a system of conceptualization, then it is the most robust and most widely used ontology

Page 36: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Example: the horse radical (from Chou 2005)

馬 is a semantic symbol of horse

Examples: 驩 : 馬名 a kind of horse 驫 : 眾馬 horses 騎 : 騎馬 riding a horse 驍 : 良馬 a good horse 驚 : 馬驚 a scared horse

Page 37: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Research Tool and Issue

Formal Description IEEE SUMO ( Suggested Upper Merged Ontology)

http://www.ontologyportal.orghttp://BOW.sinica.edu.tw

Issue: Why Chinese radicals are Issue: Why Chinese radicals are usually considered as a imperfect usually considered as a imperfect and misleading taxonomy?and misleading taxonomy?

Page 38: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Knowledge System of the Radical 艸 /艹 (Grass, for Plants)

蕃藥蔬菜薪苑藩藉茭

萌莖芽茄苗蓮葉

蕉蘭芒蒙菌蔓苦菊茱范荷茅蕈蔚菲草

Parts

DescriptionUsage

Plants

IS-A Constitutive Descriptive/formal

telic

茲蒼芳落茸茂荒薄芬蒸莊

Page 39: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Conclusion I:Corpus as Evidence Core issue of a scientific explanation of language

and cognition Language as an living organism allows variations

and adaptations (the evolutionary view) The coherence of language is the shared

tendency of all users Distributional data in corpus lead to discovery of

these shared tendencies This should be more valuable than incidental example

Page 40: From Lexical Semantics to Knowledge Systems: How to infer cognitive systems from linguistic data Chu-Ren Huang Academia Sinica .

2007.03.09 ISLCC Chu-Ren Huang

Conclusion II: Language as a Knowledge System The generative lexicalist approach to gra

mmar: language as a knowledge system All aspects of Language are projected fro

m a unified knowledge system Lexical semantics based on distributional

data offers the best window to the underlying knowledge system of language