Text-based Construction and Comparison of Domain Ontology: A study based on classical poetry

60
Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan. Text-based Construction and Text-based Construction and Comparison of Domain Ontology: Comparison of Domain Ontology: A study based on classical A study based on classical poetry poetry Chu-Ren Huang Academia Sinica

description

Text-based Construction and Comparison of Domain Ontology: A study based on classical poetry. Chu-Ren Huang Academia Sinica. Outline. Motivation and Framework: Laying the foundation Basic Resources: The building blocks From General Ontology to Specific Ontology: Study of Shu-Shi Poems - PowerPoint PPT Presentation

Transcript of Text-based Construction and Comparison of Domain Ontology: A study based on classical poetry

Page 1: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Text-based Construction and Text-based Construction and Comparison of Domain Comparison of Domain

Ontology: Ontology: A study based on classical A study based on classical

poetrypoetry

Chu-Ren HuangAcademia Sinica

Page 2: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Outline• Motivation and Framework: Laying the fou

ndation

• Basic Resources: The building blocks

• From General Ontology to Specific Ontology: Study of Shu-Shi Poems

• Epilogue: From Specific Ontology to General Ontology

• Conclusion

Page 3: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Motivation and Framework: Laying the

foundationKnowledge Structure Discovery

Issues and Significance

Page 4: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Knowledge and Knowledge Structure

Variation

Knowledge is Structured Information

• Most salient factors dictating variations in knowledge structures are time, space, and domain

• Language is both the product and conduit of the conceptual structure of its speakers

Page 5: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Knowledge and Structure Mismatch: a

historical example盧 家 少 婦 鬱 金 香 ,海 燕 雙 棲 玳 瑁 梁 。(from Tang 300)

-Tulips ( 鬱 金 香 )in Tang ?

-No, the text refer to the fragrance of a ginger like herb - 鬱 金

‘Young lady Lu, as fresh and fragrant as ginger grass,

Looks on the pair of seagulls resting on the beam inlaid with sea turtle shells.’

Page 6: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Accessing Knowledge Structure

• In order to become sharable and reusable knowledge, all extracted information must first be correctly situated in a knowledge structure

• The situated information must be allowed to transfer from knowledge structure to knowledge structure without losing its meaningful content

Page 7: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Research Goal • Knowledge Structure Discovery

– Knowledge as situated information– Language endows information with structure

• Text-based and Lexicon-driven Knowledge Structure Discovery– General Ontology: the upper ontology shared

by all domains (such as SUMO)– Specific Ontology: a ontology specific to a

domain, historical period, an author etc.

Page 8: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Research Issues • Identification of Conceptual Atoms

• Re-construction and Verification of Conceptual Structure

• Knowledge Processing with Mismatched Knowledge Structures

Page 9: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Page 10: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Knowledge Inferred form the Ontology of Tang Animals

• No marsupials: only found in Australia, and only found much later

• No marine mammals: Tang civilization activities mainly stays on land, as well as the dominance of hoofed animals (fascination with horses?)

• Large number of birds among mammals, and the dominance of insects 昆蟲 among invertebrates 無脊椎動物Tang civilization’s fascination with flying

[Birds fly. And insects are the invertebrates that have wings.]

Page 11: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Research Methodology • The Mental Lexicon Approach

• The Shakespearean-garden Approach

• The Ontology-merging as Ontology-discovery Approach

Page 12: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

The Mental Lexicon Approach

• Concepts are stored in the mental lexicon

• The basic unit of mental lexicon organization and access is lexical entry

• A complete list of lexical entries covers the complete list of conceptual atoms

• Lexical semantic relations mirror conceptual relations

Each Word is a Conceptual Atom

Page 13: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

The Shakespearean-garden Approach

• A Shakespearean garden collects all the plants referred to in Shakespearean texts.– The garden is used to illustrate the flora of the

Shakespearean England and gives scholars a context in which to interpret his work.

• There is a knowledge structure behind each corpus (i.e. a collection of texts with design criteria)

Lexicon as a Structured Inventory of Conceptual Atoms

For instance, complete set of texts by an author, from a certain period, or in a certain domain

Page 14: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

The Ontology-merging as Ontology-discovery

Approach I• Ontology provides a structure for

knowledge to be situated• However, there is a dilemma for the

construction of a new ontology– If no existing ontology is referred to:

reinventing the wheel, difficult to start a structure from scratch without rules

– If existing ontology is referred to: mislead by existing structure, mismatched or erroneous

Page 15: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

The Ontology-merging as Ontology-discovery

Approach IIThe Solution• Map conceptual atoms to two (or more) ref

erence ontologies• Merge the two resultant ontologies

– Matched Mapping: Confirmation of knowledge structure

– Mismatched Mapping: Only one or neither is correct. Possibly lead to discovery of new knowledge structure

– Complimentary Mapping: Increases coverage

Page 16: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Further Developments• The Ontology of Chinese Characters: A co

mmon knowledge structure for East Asian Cultures

• Contrary to earlier study of constructing specific ontologies based on general ontology, the Chinese character ontology will be a crucial general ontology based on a specific ontology

Page 17: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Basic Resources: The building blocks

From Text to Lexicon

From Lexicon to Ontology

Page 18: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Resources used • WordNet

• SUMO Ontology

• Academia Sinica Bilingual Ontological Wordnet (Sinica BOW)

• Domain Lexicon Management System:Segmentation,

New Word Detection

Lexical Database

Page 19: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Resources• Sinica BOW: SUMO+WordNet

http://bow.sinica.edu.twhttp://www.ontologyportal.org orhttp://ontology.teknowledge.comhttp://www.cogsci.princeton.edu/~wn/

• Segmentation Program etc.http://LingAnchor.sinica.edu.tw/

Page 20: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

SUMO: Suggested Upper Merged

Ontology

SUMO Atoms• Concepts: around 1000

Note that concepts are not necessarily linguistically realized

• Relations (ISA): See SUMO Graph

• Axioms: for inference• Open resource created under an initiative from

IEEE Standard Upper Ontology Working Group

Page 21: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Methodology • From lexicon to ontology (from items to

structure)

• Ontology discovery through ontology merging

Page 22: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

WHY?• We do not have the knowledge structure

(ontology) of a new domain (historical period, field etc.)

• But typical ontology discovery needs a framework to be mapped to

• To solve the dilemma we map the conceptual atoms to both SUMO and WN (as a linguistic ontology)

Page 23: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

From General Ontology to Specific Ontology: Study of Shu-Sh

i PoemsA Research Collaborated with Feng-ju Luo, Sue-ming Chang,

and Ru-Yng Chang

Page 24: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Opus Shu Shi 蘇軾• Who is Su Shi (A.D.1036-1101)?

• One of the most prominent scholars in Song dynasty who is very knowledgeable and well-traveled.

• 45 volumes (out of 50) of his work has already been digitized and segmented (by Feng-ju Luo)

Page 25: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

How to build a domain ontology

Word segmentationWord segmentation

Match WordNet synset and SUMO concept automatically Match WordNet synset and SUMO concept automatically

WordNet

SUMO

Use WordNet information to check results and extend concept

Use WordNet information to check results and extend concept

Transform into ontology browser format

Transform into ontology browser format

Page 26: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Page 27: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Distribution of Su Shi lexicon

• 98,430 words in NO.1-45 volume

Distribution of concepts in Su Shi'spoems

plant(1.7393%) animal(1.4294%)

artifact(1.4467%) other(95.3845%)

Page 28: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

The distribution of animal, plant, and artifact concepts in Su Shi’s

poems Distribution of plant concepts in SuShi's poems

body part(3.2126%) fungus(0.0584%)

uncertain(2.278%) alga(0.1168%)moss(1.1682%) reproductive body(0.6425%)

fern(0.2336%) flowering plant (92.2897%)

Distribution of animal concepts in Su Shi'spoems

arachnid(0.1421%) invertebrate(0.2132%)myriapod(0.0711%) larval(1.2082%)feline(2.7008%) uncertain(0.2843%)canine(2.5586%) carnivore(0.0711%)amphibian(1.6347%) crustacean(1.2793%)insect(8.742%) reptile(5.4726%)hoofed mammal(22.6724%) mammal(1.0661%)worm(0.924%) fish(7.8181%)rodent(2.5586%) bird(37.1002%)aquatic mammal(0.6397%) mollusk(0.7107%)monkey(2.1322%)

Distribution of artifact concepts in SuShi's poems

transportation device(16.6433%) artifact(32.5140%)fabric(6.1096%) uncertain(6.1798%)icon(0.4916%) mineral(0.9831%)musical instrument(0.6320%) device(10.6039%)engineering component(0.2107%) clothing(20.1545%)currency measure(1.7556%) substance(0.4916%)ordering(0.0702%) weapon(2.5983%)art work(0.4916%) law(0.0702%)

Page 29: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Page 30: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Page 31: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Page 32: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Comparing Two Ontologies: 300 Tang Poems and Collection of Su Shi

’s Poems• One conceptual node missing in both ontologies:

– 有袋類( marsupial ) • Concepts found in Su Shi’s but not in Tang 300

– palm 棕櫚科植物 ( plant -> woody plant ->tree-> palm )

椰葉( coconut palm )、檳榔 * ( betel palm )• 無枝林>食檳榔>月照無枝林, • 椰葉>追餞正輔表兄至博羅,賦詩為別>置酒椰葉桄榔間。

Guangdong and Hainan Island

Page 33: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Comparing Two Ontologies: 300 Tang Poems and Collection of Su Shi

’s Poems• Words stand for multiple concepts in the same source.

Source Word SUMO English Example sentence

Su Shi 杜鵑 bird cuckoo 子規>和孔郎中荊林馬上見寄>春山聞子規。啼鴃>次韻劉景文登介亭>朝先啼鴃起,

flowering plant

azalea 杜鵑>菩提寺南漪堂杜鵑花>南漪杜鵑天下無,

葛 flowering plant

kudzu vine

白葛>和文與可洋川園池三十首:湖橋>白葛烏紗曳履行。

fabric kudzu vine

白葛>病中遊祖塔院>烏紗白葛道衣涼。

Tang 300 - - - -

Page 34: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Example of WordNet lexical relation

CuckooCuckoo

cuculiform_birdcuculiform_bird

aniani roadrunnerroadrunner coucalcoucal

birdbird

pheasant_coucalpheasant_coucalCentropus_sinensisCentropus_sinensis

azaleaazalea

rhododendronrhododendron

shrubshrub bushbush

杜鵑DuJuan

Page 35: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

SUMO + WordNet• SUMO • WordNet

CuckooCuckoo

cuculiform_birdcuculiform_bird

aniani roadrunnerroadrunner coucalcoucal

birdbird

pheasant_coucalpheasant_coucalCentropus_sinensisCentropus_sinensis

azaleaazalea

rhododendronrhododendron

shrubshrub bushbush

organismorganism

animalanimal

vertebratevertebrate

birdbird

plantplant

invertebrateinvertebrate

warm blooded vertebratewarm blooded vertebrate

Flowering plantFlowering plant

mammalmammal

杜鵑DuJuan

Page 36: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

What We Learned about Specific Ontology

Constructing ontology from a larger corpus and comparison of two specific ontologies

• Local information can be effectively mapped• Global information offers deeper insights into the knowle

dge structure ☆Human conceptualization of animals and plants has be

en relatively stable. But NOT artifacts.☆Regardless of the criteria for classification, genetically determine

d features (behaviors, appearances etc.) do not vary greatly☆However, human technology is highly fluid. Our conceptualizatio

n of artifacts is highly dependent on the development of engineering and by our varying societal needs.

Page 37: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Towards a Workbench for Specific Ontology: Browser

and EditorUser loginUser login

Function menu(Personal ontologies list)

Function menu(Personal ontologies list)

Browse an ontologyBrowse an ontology Edit an ontologyEdit an ontology Add an ontologyAdd an ontology

1. SUMO2. SUMO + WordNet +concept map with l

exicon

1. SUMO2. SUMO + WordNet +concept map with l

exicon

LogoutLogout

1. Update lexical concepts

2. Update mapping between WordNet synset and lexicon

3. Edit other information in lexicon

1. Update lexical concepts

2. Update mapping between WordNet synset and lexicon

3. Edit other information in lexicon

Import textImport text Import lexiconImport lexicon

Word segmentationWord segmentation

Match concept and synset automatically

Match concept and synset automatically

1. Suggestion list2. Missing list

1. Suggestion list2. Missing list

Page 38: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Constructing a Specific Ontology

• Import text, or domain lexicon – Select style of writing– Select category of word list for word segmentation– Select reference ontologies to match SUMO and lexi

con

• Information of suggestion list– Candidate synset– Candidate synset synonyms– Explanation of candidate synset– Concept of candidate synset

Page 39: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Example of SUMO concept

Page 40: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

http://bow.sinica.edu.tw/ont/SuShi_ont.html

Page 41: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Page 42: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Page 43: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Summary and Future Work• Ontologies represent the knowledge structure of

a domain or historical period• We have provided an online interface to browse

ontologies and lexica• In the future, we will complete the online ontolog

y editor and browser, which will– Map lexicon, WordNet and SUMO.– Integrate ontologies based on different texts.– Facilitate comparative studies of various domain ont

ologies.

Page 44: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

From Specific Ontology to General Ontology漢字知識本體

An introduction to Hanzi ontologyResearch in Collaboration with an

d Conducted by

Ya-Ming Zhou 周亞民

Page 45: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Outline• Introduction• The logographic features of Hanzi• Semantic symbols of Hanzi• The structure of lexicon relation• The structure of Hanzi ontology• Summery

Page 46: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Introduction (1/2)• Ideograph: Each Chinese character (kanji) is a

writing unit which also represents a pre-defined concept.

The represented concept is independent of phonological variations, including language changes and cross-lingual adaptation

• The complete Han writing system is expected to consists of 40,000-70,000 characters each representing one or more concepts.

Page 47: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Logographic Features of Hanzi

• 馬 is a semantic symbol of horse

• Examples:– 驩 : 馬名 a kind of horse– 驫 : 眾馬 horses– 騎 : 騎馬 riding a horse– 驍 : 良馬 a good horse– 驚 : 馬驚 a scared horse馬

Page 48: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Semantic Symbols in Hanzi(1/3)

• The characteristics of Hanzi mainly come from semantic symbols.

• According to Xyu Shen’s ShoWenJieZi (100 A.D.) , there are 540 semantic classes (radicals)

• These radicals represent the knowledge structure of Hanzi.

Page 49: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Semantic Symbols in Hanzi• 540 radicals are used to classify all Chinese cha

racters and represented

• The semantic symbols about animals:– 鳥 (bird), 隹 (bird), 犬 (dog), 馬 (horse), 羊 (sheep), 虫

(insect)…

• The semantic symbols about plant:– 艸 , 木 , 竹 , 禾…

• The semantic symbols about religion:– 示

Page 50: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

The Classification of Hanzi with 艸 ( 艹 )

蕃藥蔬菜薪苑藩藉茭

萌莖芽茄苗蓮葉

茲蒼芳落茸茂荒薄芬蒸莊

蕉蘭芒蒙菌蔓苦菊茱范荷茅蕈蔚菲草

Parts

DescriptionUsage

Plants

Name Parts Description Usage

Page 51: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

The structure of Hanzi lexical relation(1/2)

Paradigmatic associations

Page 52: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

The structure of Hanzi lexical relation(2/2)

Syntagmatic associations

Page 53: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Tennis Problem: Cross-classification and multiple

Inheritance• WordNet suffers from “tennis problem”( Gorge Miller,1993)

• Tennis refers both to the entity, the sport, and the organization running the sport etc.

• Tennis problem is caused by the lack of syntagmatic association in WordNet

• Most computational ontologies only have paradigmatic associations, and have the same problem

Page 54: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

The semantic symbol 艸 ( 艹 ) pulls the concepts about plant

s together

Body Parts

Physical

Subjective Assessment Attribute

Internal Attribute

Plants

Process

Abstract

External Attribute

Page 55: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

The structure of Hanzi Ontology

• Each character is connected with semantic symbols ontology.

• The derived and loan meaning of character are mapping to SUMO.

Page 56: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

字形 意義

G1 G2 G3 . . . Gm

M1 - - . . . O

M2 - O - . . . -

M3 - - . . . -

M4 L D - . . . -

M5 - D . . . -

. O - - . . . -

. - L - . . . -

Mn - - D . . . -

鳥 艸 竹

……

SUMO

時間 Ti+1 的字形、意義、 、

水 示

Ti

Semantic symbols ontology

G-Glyph M-Meaning O-Original Meaning D-Derived Meaning L-Loan Meaning

D

Page 57: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

字形 意義

G1 G2 G3 . . . Gm

M1 - - . . . O

M2 - O - . . . -

M3 - - . . . -

M4 L D - . . . -

M5 - D . . . -

. O - - . . . -

. - L - . . . -

Mn - - D . . . -

鳥 艸 竹

……

SUMO

水 示

Ti+1

Semantic symbols ontology

G-Glyph M-Meaning O-Original Meaning D-Derived Meaning L-Loan Meaning

Page 58: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

字形 意義

G1 G2 G3 . . . Gm

M1 - - . . . O

M2 - O - . . . -

M3 - - . . . -

M4 L D - . . . -

M5 - D . . . -

. O - - . . . -

. - L - . . . -

Mn - - D . . . -

鳥 艸 竹

……

SUMO

水 示

Ti+1

Semantic symbols ontology

Ti-1Ti

G-Glyph M-Meaning O-Original Meaning D-Derived Meaning L-Loan Meaning Time

Page 59: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Summery• Hanzi ontology is organized by glyph( 字

形 ) structure of each character.

• We believe it can assist solving the problem in Chinese information processing.

• It would also be a good candidate for cross-lingual general ontology for Easter Asian languages using Chinese characters.

Page 60: Text-based Construction and Comparison of Domain Ontology:  A study based on classical poetry

Chu-Ren Huang. PACLIC 18, 2004. Waseda University, Tokyo, Japan.

Conclusion• KSD can be achieved by mapping lexicon

to general ontology• General ontologies have constraints in thei

r applicability. In particular, the sub-structures dependent on human activities will also vary easily when human societies change.

• Specific ontology can turn out to be a useful basis for general ontology.