Mapping FAO’s AGROVOC Thesaurus

25
Food and Agriculture Organization of the UN Library and Documentatio n Systems Division Slide 1 July 2005 Mapping CAT to AGROVOC 6 th AOS Workshop Vila Real (Portugal) 26-27 July 2005 Mapping FAO’s AGROVOC Thesaurus and the Chinese Agricultural Thesaurus (CAT) Anita Liang July 26, 2005 Sixth AOS Workshop Vila Real, Portugal

Transcript of Mapping FAO’s AGROVOC Thesaurus

Page 1: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 1

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Mapping FAO’s AGROVOC Thesaurus and the

Chinese Agricultural Thesaurus (CAT)

Anita Liang

July 26, 2005

Sixth AOS Workshop

Vila Real, Portugal

Page 2: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 2

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Outline

• The goals• Benefits• Project outputs• Characteristics of the terminologies• Definitions• Guidelines• Mapping• Outputs• Issues

Page 3: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 3

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

• To draw equivalences between corresponding concepts within the two agricultural terminologies

• To enrich and structurally improve both sources

The goals

CAT

Chineseworld view

MAPPINGAGROVOC

Englishworld view

Page 4: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 4

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Benefits

• Multilinguality: improved language coverage

• Domain coverage: improved domain coverage

• Interoperability: IS, applications

Page 5: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 5

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Project outputs

(1) a mapping file that links corresponding concepts from CAT to AGROVOC

(2) a list of modifications to be applied to AGROVOC that serve both to improve its content and to provide a valid mapping

Page 6: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 6

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Comparison• AGROVOC:

– 27736 English terms: 16769 descriptors, 10967 non descriptors

– 25060 Chinese terms: 16628 descriptors, 8432 non descriptors

– It is hierarchically structured with BT/NT relations. It has associative relations RT and UF/USE, as well as UF+.

• CAT: – Chinese terms: 64638 (51614 descriptors, 13024 non-

descriptors)– English terms: 90% descriptors (10% have both English

and Latin translations or no translations), no non-descriptors

– It is hierarchically structured with BT/NT relations and contains associative relations RT, UF/USE.

Page 7: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 7

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Definitions: Mapping v. integration

• integration of different sources into a single unified thesaurus, may involve complete restructuring of both sources, recoverability and integrity of original sources less a priority than the overall logical consistency of the integrated product

• mapping of one source to the other, i.e., sources are revised, but each retains their original structure, mutual consistency is desirable but less a priority than establishing approximate equivalences

Page 8: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 8

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Definitions (cont’d)• The source vocabulary is CAT; the target

vocabulary is AGROVOC.• Mapping means linking an entry in the source

vocabulary to an entry in the target vocabulary.• A term is a lexical representation of a concept.• An entry in CAT consists of the Chinese term and

any English translation(s) along with its relations to other entries. An entry in AGROVOC consists of at least one English or Chinese term along with their translations as well as its relations to other entries.

==> entry = concept

Page 9: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 9

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Mapping between entries/concepts

AGROVOCCAT

zh term

en term

zh term

en term

fr term

es term

mappingCAT_ID = 123

(CAT termcode)

AGROVOC_ID = 345

(AGROVOC termcode)

Page 10: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 10

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Working formats

What we have: RDBMS• AGROVOC scheme (MySQL)• CAT scheme (MySQL)

What we need: RDF(S)-based• SKOS?• OWL Lite

Page 11: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 11

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Guidelines: General (1/4)• Entries should be mapped irrespective of their status

as descriptors or non-descriptors• Mappings should be between entry IDs, not term

IDs.• Many to one: multiple CAT entries can be mapped

to the same entry in the target vocabulary • One to many: an entry in CAT can be mapped to one

or more entries in the target vocabulary • Mapping relations are based on SKOS Mapping

relations and should include only the following: – Exact– Broader/Narrower (subsumption)– AND, OR, NOT

Page 12: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 12

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Guidelines: Source/Target Modifications (2/4)

• When a gap occurs in either vocabulary because the corresponding term is missing, the term should be added to the appropriate vocabulary.

• When a gap occurs in the target vocabulary because the concept does not exist :

– If there is no parent in the target vocabulary to which it could be matched, then add the concept to the target vocabulary. Add the Chinese even if the English does not exist. Try to put relations where possible. Then do an exact mapping.

Page 13: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 13

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Guidelines: Source/Target Modifications (3/4)

• Wrong translations should be fixed in both sources.

• Inconsistencies should be fixed within the terminologies

cat_zh1 BT cat_zh2agr_zh1 UF agr_zh2

• Conflicting semantics should be fixed within the terminologies

cat_zh1 BT cat_zh2 cat_en1 BT cat_en2agr_zh1 NT agr_zh2 agr_en1 NT agr_en2

Page 14: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 14

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Guidelines: Source/Target Modifications (4/4)

• If two source entries need to be added to target vocabulary (but they have the same English translation), put a scope note or a definition to explain the difference.

Page 15: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 15

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Mapping: exactMatch• If CAT entry A and AGROVOC entry B mean the same thing, i.e., are synonymous, they should be exact matches.

e.g., zh1 and zh2 are synonyms

cat_zh1 / cat_en1 agr_zh2 / agr_en1

exactMatch

Page 16: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 16

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Mapping: broaderMatch

B is a concept that exists in CAT but not in AGROVOC

• solution 1) broaderMatch• solution 2) add the concept in the target (only in the original language) and do an exactMatch

a_Ac_A

c_B

broaderMatch

exactMatch

Page 17: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 17

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

A is a concept that exists in CAT but not in AGROVOC

• solution 1) narrowMatch• solution 2) add the concept a_B in the target (only in the original language) and do an exactMatch

Mapping: narrowMatch

a_Ac_A

narrowMatch

a_B

exactMatch

Page 18: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 18

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Problem?

• CAT has concept { Mathematics } containing nearly 200 narrower terms

• AGROVOC has concept { Mathematics } with no narrow terms

• ==> Map all 200 CAT terms as broaderMatch to ag_Mathematics?

Page 19: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 19

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Mapping: inheritance (1/3)

Map every source entries at the most general level in the target vocabulary.1. Map c_A to a_A2. Descendants of c_A are by inheritance mapped as descendants of a_A3. If there are corresponding descendants of c_A and a_A, they should

be mapped.

c_D c_Da_Da_D

a_Ac_A

c_B a_B

1

3

a_Ac_A

c_B

c_C

a_B

1

2

3

c_Ca_Ca_C2

Page 20: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 20

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Mapping process: inheritance (2/3)

Another type of inheritance:1. Map c_A to a_A1 with exactMatch2. If there are corresponding descendants of c_A and a_A, they should be

mapped (c_B with a_B2).3. Descendant of c_B are by inheritance mapped as descendants of a_B2

c_D

c_A

c_B

c_D

1 exactMatch a_A1

c_C

a_B1

3

a_A2

a_B22

c_C

Page 21: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 21

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Mapping: inheritance (3/3)

In case of partial inheritance, do not map single children (fig. 1) , but map the parent and exclude using NOT the entries that should not be mapped (fig. 2).

c_fo a_fo

a_B

a_D

a_CNOT

c_fo a_fo

a_B

a_D

a_C

Page 22: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 22

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

The output (1/2)

<c:Concept ID=“uri”>

<prefLabel lang=“zh”> 中国 </prefLabel> <map:exactMatch>

<a:Concept ID =“uri”><prefLabel

lang=“en”>China</prefLabel> </a:Concept> </map:exactMatch></c:Concept>

Page 23: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 23

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Application

JSP Page

cow search

search terms

FAOBIB

AGRIS(chinese)RDF

mapping

results

AGROVOCRDF

search records

search records

CATRDF

CAASBibliogr. DB

Page 24: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 24

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Issues

• SKOS mapping– “interlingua”: language independence - mapping

is oriented towards source terminology– Set theory metaphor: Difficult to put into practice

• Both terminologies are multilingual in overlapping languages - what is being mapped?

Page 25: Mapping FAO’s AGROVOC Thesaurus

Food and Agriculture

Organization of the UN

Library and Documentation

Systems Division

Slide 25

July 2005

Mapping CAT to AGROVOC

6th AOS Workshop

Vila Real(Portugal)

26-27 July 2005

Thank you.