Domain specific facet based language...

52
Domain specific facet based language resource Biswanath Dutta [email protected] DISI, University of Trento, Trento, Italy

Transcript of Domain specific facet based language...

Page 1: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Domain specific facet based language

resource

Biswanath Dutta

[email protected] DISI, University of Trento, Trento, Italy

Page 2: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Topics  Introduction

 Language Resources (LR)

 Knowledge Organization Systems (KOS)

 DERA

 From DERA to Description Logics (DL)

 Demo: Modeling a Domain using DERA

 UniTn UK

 Conclusion and Future Work 2

Page 3: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Introduction   Language resources

  General purpose language resources   WordNet (http://wordnet.princeton.edu/)   MultiWordNet (http://multiwordnet.fbk.eu/english/home.php)   EuroWordNet (http://www.illc.uva.nl/EuroWordNet/)   Rogets’s thesaurus

  Domain specific language resources   Dewey Decimal Classification (DDC)   Library of Congress Classification (LCC)   Universal Decimal Classification (UDC)   Bliss Bibliographic Classification (BC)   Colon Classification (CC)   AGROVOC   Art and Agriculture Thesaurus

  UniTn UK

3

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 4: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Uses of the Language Resources

  Overall uses   Classification of contents of a library, the Internet or

other databases (for systematic organization and for easy management)

  Search of contents of a library, the Internet or other databases (for retrieval)

  Other uses   Natural Language Processing (NLP)

  Automatic summarization

  Named entity recognition

  Word Sense Disambiguation

  …

4

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 5: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Overall uses

  Classification   Document classification (e.g., book, journal article,

news article, scientific papers, web page, etc.)

  Items in a supper market (e.g., detergents, food items, vegetables)

  …

  Search   Cataloguing

  Indexing

5

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 6: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Other Uses Natural Language Processing

  Automatic text summarization:   Produce a readable summary of a chunk of text. Often used to

provide summaries of text of a known type, for example, articles in the sport section of a newspaper

  Named entity recognition:   Given a stream of text, determine which items in the text map to

proper names, such as people or places, and what the type of each such name is (e.g. person, location, organization)

  Word Sense Disambiguation:   Many words have more than one meaning; we have to select the

meaning which makes the most sense in context

  …

6

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

http://en.wikipedia.org/wiki/Natural_language_processing

Page 7: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Challenges   Challenges we are facing in general

  High recall

  Low precision

  Natural language processing   Disambiguation problems

  E.g., a word “bass”   Sense 1: A kind of saltwater fish

  Sense 2: Tones of low frequency

Natural language sentences:

  I went fishing for some sea bass

  The bass line of the song is too weak

  There is a lack of availability of large scale and high quality language resources

7

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 8: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Solution

  A Large Scale, Domain Specific LR based on Facet based KO is a better Resource for addressing the challenges of Low Precision and High Recall all applications using LR like WordNet are facing

8

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 9: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Objective

  At the end of this presentation we will know the followings:

  Fundamental differences between enumerative and facet based KOS systems

  Modeling domain specific knowledge using DERA, a faceted knowledge organization framework

  UniTn Universal Knowledge base (UK), a large scale high quality language resource

9

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 10: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

General Purpose Language Resources

10

Page 11: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

WordNet

•  is a large lexical database for the English language, developed at the Cognitive Science Laboratory at Princeton University

•  it groups words of different part of speech (nouns, verbs, adjectives and adverbs) into sets of cognitive synonyms, called synsets, each expressing a distinct concept (i.e., each synset groups all the words with same meaning or sense)

•  Synsets are interlinked by means of conceptual-semantic and lexical relations

•  Typical semantic relations are hypernym (is-a) and part meronym (part-of)

•  WordNet 2.1 consists of: •  117,597 synsets, 354,057 relations, 147,252 terms and

207,019 senses 11

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 12: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

  is a multilingual lexical database

  it includes languages such as, English, Italian, Spanish, Portuguese, Hebrew, Romanian, Latin

  Italian WordNet is strictly aligned with WordNet 1.6

  It consists of:   102100+ synsets, 131300+ terms, 176600+ senses [English]

  38500+ synsets, 46200+ terms, 67400+ senses [Italian]

  57400+ synsets, 67200+ terms, 106500+ senses [Spanish]

  17200+ synsets, 16200+ terms, 21500+ senses [Portuguese]

  5900+ synsets, 0 terms, 0 senses [Hebrew]

  20100+ synsets, 20000+ terms, 34200+ senses [Romanian]

  8973+ synsets, 9100+ terms, 25900+ senses [Latin]

12

MultiWordNet INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 13: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Domain Specific KOS as Language Resources

13

Page 14: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

What is Knowledge Organization (KO)?   It is about organization of knowledge, where organization refers to the

arrangement and organizing structure to arrange and classify

  KO = KO Processes (KOP) + KO Systems (KOS)

  KOP = is about activities such as document description, indexing and classification performed in libraries, databases, archives etc.

  KOS = is about the tools that present the organized interpretation of knowledge structures which include classification (and categorization) schemes that organize materials at a general level, subject headings that provide more detailed access, and authority files that control variant versions of key information such as geographic names and personal names. Also include highly structured vocabularies, such as thesauri, and less traditional schemes, such as semantic networks and ontologies (Hodge, 2000)

14 Hodge, G. (2000). Systems of Knowledge Organization for Digital libraries. Beyond traditional authority files. Washington, DC: the Council on Library and Information Resources. http://www.clir.org/pubs/reports/pub91/contents.html

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 15: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Library Classification Systems

  DDC

  LCC

15

  CC

  BC

  UDC

Enumerative classification system

Faceted classification system

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 16: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Enumerative vs. Facet

16

Page 17: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Enumerative

 List all the classes of objects in a nested tree structure

 The classes are organized in successive, subordinate relationships

 In analogous to a simplified language where all possible statements have been pre-defined  E.g., French poetry, Italian drama, Chicken ribs

 Important: it is impossible to list the universe of all possible subjects (a subject is a composition of one or more concepts)  E.g., Literature, French poetry, Italian drama, Chicken

ribs, Island 17

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 18: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Enumerative

18

Enumerative

Literature

English literature Poetry Drama Fiction Italian literature Poetry Drama Fiction Tamil literature Poetry Drama Fiction

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 19: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Facet [First Introduced by Ranganathan (1930s) in Library and Information Science]

 “A generic term used to denote any component – be it a basic subject or an isolate – of a compound subject, …” - Ranganathan

 It is a category that expresses some aspect of the knowledge being described

 A facet is a hierarchy of homogeneous terms, where each term in the hierarchy denotes a primitive atomic concept

 E.g., Organ facet, geographical facet, language facet, property facet, author facet, religion facet, commodity facet, etc.

19

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 20: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Facet Example

20

Language

by Indo-European Teutonic Gothic English American English German Latin Italian French Greek

by Dravidian Tamil Tulu

by Geographic location Asian language (collective treatment) Japanese language Indian language African language  

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 21: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Enumerative vs. Facet

21

Enumerative Faceted

Literature

English literature Poetry Drama Fiction Italian literature Poetry Drama Fiction Tamil literature Poetry Drama Fiction

Literature

by language

by form Poetry Drama Fiction

Language

by Indo-European Teutonic Gothic English American English German Latin Italian French Greek

by Dravidian Tamil Tulu

by Geographic location Asian language (collective treatment) Japanese language Indian language African language  

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

In analogy to natural language, it is like a vocabulary of nouns, verbs, adjectives, etc., where the words and the types of words have been defined, but the speakers combine them as needed to create statements

Page 22: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

  Flexibility

  Expandability

  Reusability

  Compactness

  Robustness

  Consistency

Characteristics of Facet based Systems

22

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 23: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Facet based KO Frameworks

23

Page 24: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

PMEST [Ranganathan, 1933]

  Consists of:   Personality [P]

  Matter [M] (matter material, matter method, matter property)

  Energy [E] (internal and external actions, processes)

  Space [S]

  Time [T]

  [Basic subject] ,[P] ;[M] :[E] .[S] ‘[T]

  Drawbacks:   Sometime it is difficult to interpret, particularly the “P”   No explicit relations (is-a/part-of) between the classes

  More syntactic approach than semantic

  Does not allow for direct translation of the elements of its facets P, M, E, S, T into DL axioms

  No clear separation between entity and entity classes

  … 24

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 25: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

DEPA [G. Bhattacharyya, 1979]   It is a simplified version of Ranganathan’s PMEST

  Consists of:   Domain [D]

  Entity [E]

  Property [P]

  Action [A]

  Plus one additional facet: Modifier [m]   Modifiers: space, time, form, language, environment, etc.

  Drawbacks:   No explicit relations (is-a/part-of) between the classes

  More syntactic approach than semantic

  Does not allow for direct translation of the elements of its facets E, P, A into DL axioms

  No clear separation between entity and entity classes

  … 25

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 26: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

DERA

[F. Giunchiglia and B. Dutta, 2011]

  Consists of:   Domain [D]

  Entity [E]

  Relation [R]

  Attribute [A]

  It is a further refined and simplified form of Bhattacharyya’s DEPA

  Has direct mapping to DL

  Emphasis is on the named entities

26

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 27: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

DERA

27

Page 28: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

DERA

 A facet based KO Framework

 A framework independent of any particular domain

 A framework for developing domain specific knowledge resources

 Has mapping to Description Logics (DL)

  Is logically sound

  Is designed in UniTn KnowDive group

28

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 29: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Domain

 An area of knowledge that we are interested in or we are communicating about it

 An organized field of knowledge that deals with specific kinds of subjects

 E.g., Space, Time, Music, Movie, Sport, Mathematics, etc.

29

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 30: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

DERA (2)   D = <E, R, A>

  E = Entity - an elementary component consisting of classes and their instances, having either perceptual correlates or only conceptual existence within a domain in context.   E.g., in the Space domain, natural elevations, such as, mountain, hill, seamount

etc. are entity classes, while the Himalaya, Monte Bondone, Loihi seamount etc. are entities

  R = Relation - an elementary component consisting of classes representing the relation between entities.   E.g., in the Space domain, north, south, near, adjacent, in front, etc. are spatial

relations between entities

  A = Attribute - an elementary component consisting of classes denoting the qualitative/ quantitative or descriptive properties of entities.   E.g., in the Space domain, altitude (of a hill), length (of a river), surface area (of

a lake), etc. are qualitative/ quantitative properties, while the kinds of rocks (of a mountain), architectural style (of a monument) are descriptive attributes

30

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 31: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Entity

  An elementary component that consists of classes (categories) and their instances, having either perceptual correlates or only conceptual existence in a domain in context

  E = <{e}, {E}>

  e = Entity class - consists of the core classes within a domain

  E = Entity - consists of the real world (named) entities which are instances of the entity classes “e”

31

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 32: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Relation   An elementary component that consists of relations inside a domain

  R = <{r}>

  r = Relation – consists of the classes representing the relations between entities

  Relations play an important role for effective knowledge discovery, for example,   Retrieve all the secondary schools within 500 meters of the Dante railway

station in Trento

  Find all the highways of the Trentino province adjacent to marine areas

32

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 33: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Attribute

  An elementary component consists of classes belonging to or characteristic of entities.

  A = <{A}, {e}>

  A = Datatype attribute – consists of classes which qualify or quantify the entities

  E = Descriptive attribute – consists of classes describing entities

33

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 34: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

From DERA to Description Logics (DL)

34

Page 35: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Mapping to DL

 From DERA to DL

  Entity class (e) -> Concepts   Relation (R) -> Roles   Datatype attribute (A) -> Roles   Descriptive attribute (e) -> Roles   Entity (E) -> Individuals

35

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 36: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

eI ⊆ DI, EI ∈ DI, RI ⊆ DI× DI, AI ⊆ DI × DI, eI ⊆ DI × DI

36

  An Interpretation of a DERA domain consists of:

  an Interpretation Function I and a not empty set DI (the domain of Interpretation)

  DI consists of set of:   Concepts (eI)   Entities (EI)   Relations (RI)   Datatype attributes (AI)   Descriptive attributes (eI)

Mapping INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 37: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

37

TBox and ABox TBox ABox

Flowing body of water ⊑ Body of water

Natural flowing body of water ⊑ Flowing body of water

Stream ⊑ Natural flowing body of water

Internal spatial relation ⊑ Spatial relation

Central ⊑ Internal spatial relation

Midplane ⊑ Central

Volume ⊑ Dimension

Roman architecture ⊑ Architectural style

instance_of(Lake Garda, Lake)

near(Lake garda, Lake Caldonazzo)

near(Lake Garda, Venice)

central(Duomo, Trento)

architectural Style(Colosseum, Roman architecture)

primaryInflow(Gordan, Great salt lake)

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Inferred facts, e.g.,

(1) “Midplane” is an “Internal spatial relation” (2) “Lake Caldonazzo” is near “Venice”

Page 38: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Demo: Modeling a Domain Specific Knowledge

38

Page 39: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Analytico-Synthetic Approach

[Ranganathan (1933)]

 A well established technique for building classificatory structures from atomic concepts which are analysed into facets and sequenced according to the syntax of a system

 In this approach a domain under examination is decomposed into its basic terms and organize those terms into facets to display the relationships between them

 A faceted representation scheme codifies the basic building blocks that can be used - at indexing, classification and searching time, and also to construct complex labels (subjects)

“Stormy weather [topic] during the rainy season [time] in the Java Island [space]”

39

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 40: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Methodology   Step 1: Identification of the atomic concepts

  Step 2: Analysis (per genus et differentiam)

  Step 3: Synthesis

  Step 4: Standardization

  Step 5: Ordering

  Following the above steps leads to the creation of a set of facets. They constitute a faceted representation scheme for a domain

40

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 41: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Principles 1.  Relevance (e.g., breed is more realistic to classify the universe of cows instead of

by grade)

2.  Ascertainability (e.g., flowing body of water)

3.  Permanence (e.g., Spring- a natural flow of ground water)

4.  Exhaustiveness (e.g., to classify the universe of people, we need both male and female)

5.  Exclusiveness (e.g., age and date of birth, both produce the same divisions)

6.  Context (e.g., bank, a bank of a river, OR, a building of a financial institution)   Important: helps in reducing the homographs

7.  Currency (e.g., metro station vs. subway station)

8.  Reticence (e.g., minority author, black man)

9.  Ordering   Important: ordering carries semantics as it provides implicit relations between the

coordinate terms

41

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 42: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Domain: Painting

 Step 1:

 Identification of the atomic concepts (from various sources):

 Human figure, Bust, Earth, Mountain, Water, Lake, Sky, Plant, Animal, Painter, Location, Year, Red, Orange, Blue, height, width, Oil paint, Water-based paint, Distemper, History, …

42

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 43: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

 Step 2: Analysis (and categories the concepts based upon their dissimilar and similar characteristics)

  {Human figure, Bust}

  {Earth, Mountain, Water, Lake, Sky, Plant, Animal}

  Painter

  Location

  Year

  {Red, Orange, Blue}

  {Height, Width}

  {Oil paint, Water-based paint, Distemper}

  History

43

Domain: Painting INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 44: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

 Step 3: Synthesis (build the facets and name/label the categories)

  Figure {Human figure(Bust)}

  Nature{(Earth, Mountain, Water, Lake, Sky), Plant, Animal}

  Painter

  Location

  Year

  Color {Red, Orange, Blue}

  Dimension {Height, Width}

  Painting type {Oil paint, Water-based paint, Distemper}

  History

44

Domain: Painting INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 45: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

 Step 4: standardize the terms

 Step 5: model the domain according to the DERA structure

45

  Entity class (e):   Figure

  Human figure

  Bust

  Nature

  Earth

  Mountain

  Water

  Lake

  Sky

  Plant

  Animal

Domain: Painting

  Relation (R):   Painter

  Location

  Year

  Datatype attribute (A):   Color

  Red

  Orange

  Blue

  Dimension

  Height

  Width

  Descriptive attribute (e):   Painting type

  Oil paint

  Water-based paint

  Distemper

  History

Background Knowledge (BK) for “Painting” domain

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 46: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Advantages of BK

  Projection of the domain specific knowledge

  Categorization is not merely limited to the entity types, but also includes the relations between entities (i.e. interactions with other entities), behavior of the entities, process in it or on it, and so on

  All the related knowledge necessary to express a domain is in one place

  In a domain, a word can have only one meaning (concept)

46

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 47: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

UniTn UK

47

Page 48: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

  A knowledge base, developed by KnowDive group, University of Trento

  Allows encoding knowledge in multiple languages

  Started with WordNet 2.1 + MultiWordNet

  Also, imported knowledge from:   GeoNames (http://www.geonames.org/), PAT (Autonomous

Province of trento), YAGO (http://www.mpi-inf.mpg.de/yago-naga/yago/), etc.

  300+ domains

  At present it consists of:   150,000+ terms

  10,000,000 entities

  93,000,000+ axioms 48

UniTn UK INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 49: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

UniTn UK Structure

ENTITIES CONCEPTS���

AND���RELATIONS

LINGUISTIC KNOWLEDGE���(words and senses in different languages)

UK

BACKGROUND KNOWLEDGE���(personalization)

Linguistic resources���(e.g. WordNet)

+ KOS ���

(e.g. thesauri, classification schemes)

Representation Reasoning

Maintenance ���Evolution

Integration

DOMAIN���KNOWLEDGE

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 50: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

UK and DERA

50

ENTITIES CONCEPTS���

AND���RELATIONS

LINGUISTIC KNOWLEDGE���(words and senses in different languages)

UK DOMAIN���

KNOWLEDGE Domains are modeled in this block according to the DERA framework

In building domain specific knowledge, we refer to the facets/ sub-facets defined in the “CONCEPTS AND RELATIONS” block

In this block domain independent knowledge is structured in facets*

*Note that, at the beginning we populated this block by uploading the concepts and relations from WordNet 2.1 and latter on we applied the analytico-synthetic approach to build the facets

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 51: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Conclusion and Future Work

  General purpose and domain specific language resources   State-of-the-art problems

  Enumerative and Faceted approach in KOS   KO frameworks

  How to model a domain using DERA   UniTn UK

  Future work:   Our aim is to add more knowledge into UK in terms of terms, axioms and

entities

  Our aim is to implement more number of domains

  Our aim is to develop UK as multilingual resources. At present UK knowledge base is in English language and a very limited set of knowledge is in Italian language

51

INTRODUCTION::GPLD::KOS::DERA::MAPPING TO DL::DEMO::UNITN UK::CONCLUSION

Page 52: Domain specific facet based language resourcedisi.unitn.it/~bernardi/Courses/DL/Slides_10_11/2011_05... ·  · 2011-05-25Domain specific facet based language resource Biswanath Dutta

Further Readings   Ranganathan, S. R. (1967). Prolegomena to library classification. London: Asia

Publishing House.

  Bhattacharyya, G. "POPSI: its fundamentals and procedure based on a general theory of subject indexing languages", Library Science with a Slant to Documentation, Vol. 16 No. 1, March, pp. 1-34, 1979.

  Giunchiglia, F. and Dutta, B. (2011). DERA: a faceted knowledge organization framework. DISI Technical Report.

  Giunchiglia, F., Dutta, B., and Maltese, V. (2009). Faceted lightweight ontologies. Appeared in "Conceptual Modeling: Foundations and Applications", Alex Borgida, Vinay Chaudhri, Paolo Giorgini, Eric Yu (Eds.), LNCS 5600 Springer.

  V. Broughton, 2006. The need for a faceted classification as the basis of all methods of information retrieval. Aslib Proceedings, 58(1/2) pp. 49-72.

  Hjørland, B. (2003). Fundamentals of knowledge organization. Knowledge Organization, 30(2), pp. 87-111.

  Spiteri, L. (1998). A Simplified Model for Facet Analysis. Journal of Information and Library Science, Vol 23, pp. 1-30.

  Dutta, B., Giunchiglia, F. and Maltese, V. (2011). A facet-based methodology for geo-spatial modelling. In Proceedings of GeoS'11, C. Claramunt, S. Levashkin and M. Bertolotto (Eds.), LNCS, Springer, vol. 6631, pp. 133-150.

  Giunchiglia, F., Maltese, V., F. Farazi and Dutta, B. (2010), GeoWordNet: a resource for geo-spatial applications. In the Proc. of ESWC, 2010.

  Universal catalogue. http://universal.elra.info/index.php

52