Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology...

25
Data Categories/Term Entry Design 2009-09-19 Chicata Fall Conference 1 Data Modelling and Data Categories for Terminology Management Sue Ellen Wright Sue Ellen Wright Kent State University TRST 80098 Terminology Management Seminar Fall 2009 Terminology Management There is unfortunately no cure There is unfortunately no cure for terminology; you can only hope to manage it. ª(Kelly Washbourne) 2 of 50 ª(Kelly Washbourne)

Transcript of Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology...

Page 1: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 1

Data Modelling and gData Categories

for Terminology Management

Sue Ellen WrightSue Ellen WrightKent State University

TRST 80098 Terminology Management Seminar Fall 2009

Terminology Management

There is unfortunately no cureThere is unfortunately no cure for terminology; you can only hope to manage it.

(Kelly Washbourne)

2 of 50

(Kelly Washbourne)

Page 2: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 2

OverviewData modelling principles1) Concept orientation2) D t t i (d t t)2) Data categories (datcat)3) Term autonomy & datcat repeatability4) Elementarity5) Dependencies (combinability)6) Data modelling variance

3 of 50

6) Data modelling variance7) Granularity8) Shared resources

1) Concept Orientation

All terminological information b l i i l dibelonging to one concept, including all terms in all languages and all term-related and administrative data are be stored (or displayed) in oneterminological entry

4 of 50

terminological entry1 concept = 1 (real or virtual) terminological entry

1) Concept Orientation

Page 3: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 3

Modern Terminology Management Systems (TMS)

Concept Orientation

y ( )Follow the concept-oriented approach Support features for consistent concept entries.Discourage doublettes (“double entries”) for the same concept

5 of 50

Allow for homographs (different concepts represented by the same term)

1) Concept Orientation

2) Data CategoriesPrior to computerization, data categories were not discussed in detail indiscussed in detail in terminology theory.

Initial approaches to data categories involved describing “fields” in paper

6 of 50

describing fields in paper forms for recordingterminological data offline.

2) Data Categories

Page 4: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 4

Paper FicheSubject field Language

Subset Project code Classification codeNotation

2) Data Categories

Source

Term

Definition

Contexts

Short forms, abbreviations, orthographic variants Grammar

Source

Source

Source

Comments (note)

Creator/Date Editor/Date Entry Class

SourceSynonyms

Packaging Data

Model for visualizing informationA h fl f diff i d “ ff”An amorphous flow of undifferentiated “stuff”An aggregate of individual elements (little packages, components) that can be identified, delimited, organized (modeled), stored, retrieved, manipulated, and reusedStuff people can figure out if they think about it

8 of 50

Stuff a computer program can automatically recognize and process

2) Data Categories

Page 5: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 5

The Shopping Basket Model

A shopping basket is a container to put stuff in.The stuff in the basket can be tossed in loose without packaging.Stuff gets lost. St ff t i d

9 of 50

Stuff gets mixed up.

2) Data Categories

Bulk Product LoadingUnpackagedrat poison flour

Unwrappedfoodstuffs

2) Data Categories

Page 6: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 6

Packaging Stuff

Keeps products clean and t i t duncontaminated

Makes them easy to identifyMakes them easy to storeMakes them easy to reuse

11 of 50

03.02.1997 - 18:16:50 Walther 25.01.2000 - 18:56:54 Wright 34 Network management alias 97/02/10 -10:29:28 Walther 97/02/10 - 10:29:28 Walther noun Main Entry Term shorter form of a long name, such as an email address, a directory, or a command Fahey:7 Actually, all text-type addresses, long or short, are li f th IP dd hi h i i l laliases of the IP address which is numerical only. Fahey:7 Aliases in real-time chat are usually referred to as nicknames and handles nicknamenoun Synonym When you start an IRC session, you specify a nickname, up to nine characters, which you can also change at any time. Osborne94: 413 handle noun The nickname you assign yourself when conversing in discussion groups. Falcón: 94 Use only in g p ycolloquial situations. alias noun m un nombre corto o apodo que se utiliza para enviar mensajes a aquellas direcciones de uso más frecuente. Carballar94:122 Un alias representa a un grupo de personas, y cada vez que se envía un correo a un alias, se está enviando a todo el grupo de personas que lo forman. Carballar94: 112

Unpackaged Data

2) Data Categories

Page 7: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 7

Organized Data

2) Data Categories

ConceptRepresented by ID-No. and/or classification / notation

Term Autonomy

Language 1 Language 2 Language 3 ...

Term 1+ DatCatSet

Term 2

Term 1+ DatCatSet

Term 2

Term 1+ DatCatSet

Term 2+ DatCatSet

Term 2+ DatCatSet

Term 3+ DatCatSet

3) Autonomy

Page 8: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 8

Kinds of Data Categoriescomplex data category (MultiTerm: index, text, picklist)

data category that has content of some sort

open data category (MultiTerm: index, text)p g y ( , )complex data category whose conceptual domain is not restricted to a set of values (non-enumerated CD)

closed data category (MultiTerm: picklist)complex data category whose conceptual domain is restricted to a set of identified simple data categories making up its value domain (enumerated conceptual domain)

15 of 50

domain)

simple data category (MultiTerm: picklist value)data category with no conceptual domain; a member of a value domain

2) Data Categories

Data Categories

open data categorycomplex data category whose contentcomplex data category whose content (conceptual domain) is not restricted to a set of values (non-enumerated conceptual domain)Example: /term/, /definition/, /note/Open data categories are still constrained by their specifications and definitions.

16 of 50

pOnly a term should go in a term field.Only a definition should go in a definition fieldBut no one can say precisely what terms or definitions there are

2) Data Categories

Page 9: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 9

Sample Open Data CategoryData categories can be combined:

en:term = a term in EnglishAn instance of a data category is a data element.

en:term=external collateral ligament is an individual data element.en:term=fibular collateral ligament is an

17 of 50

en:term=fibular collateral ligament is an another individual data element.

2) Data Categories

Picklists: Closed Data Categories/grammatical gender/ is a data category/grammatical gender/ has defined values:/grammatical gender/ has defined values:

masculinefeminineneuterother/f/ / l ll d i kli t l

18 of 50

m/f/n/o are also called picklist values, permissible values or permissible instances.

2) Data Categories

Page 10: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 10

Permissible ValuesTogether, permissible values comprise a value domain.N h l ll d lNo other values are allowed unless a change is made in the system.Terminologists sometimes call these values simple data categories. They arevalues, but they do not themselves have d d l

19 of 50

dependent values.Each has its own value meaning.

2) Data Categories

Concept Entry Level

Concept levelSubject field(s)Administrative information Graphic (option)Definition & related info (option)Note (repeatable anywhere)

20 of 50

Note (repeatable anywhere)

9) Metamodel

Page 11: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 11

Language Level

Object language infoGraphic (option) & related infoDefinition (option) & related infoAdministrative infoNote (repeatable anywhere)

21 of 509) Metamodel

Term Information Group Level

TermTerm-related infoDescriptive info

Definitions, other concept descriptionGraphic

22 of 50

Administrative info

9) Metamodel

Page 12: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 12

Term-Related DatCatsTerm

Part of speechPart of speechGrammatical genderGrammatical number (use when necessary)

Plural formTerm type (Type) (see next slide)StatusRegional label

23 of 50

Regional label PronunciationRegister (usage register)

9) Metamodel + “Vocabulary”

Term TypeAbbreviationF ll fFull formVariantPhraseCollocation

24 of 509) Metamodel + “Vocabulary”

Page 13: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 13

Data-Element Related FeaturesData Element Autonomy (repeatability)

Elementarity of data elementsDependency (combinability)Dependency (combinability)

Data Modeling VarianceElementarityGranularity

Shared Resources

25 of 50

Metamodel issues Standards

All terms should be managed as autonomous (repeatable) blocks of data categories without any preference for a

Term Autonomy & Repeatability

categories without any preference for a specific term

Each term documented with the relevant term-related data categoriesTerm autonomy applicable for preferred term, synonyms, variants, & short forms

26 of 50

te , sy o y s, a a ts, & s o t o sTerm autonomy not explicitly discussed in other theoretical literature

3) Autonomy

Page 14: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 14

Here each term has its own DatCatSet.

27 of 50

3) Autonomy,

Here terms are hidden in /variant/ and /synonym/ fields.

28 of 50

3) Autonomy

Page 15: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 15

Embedded or Hidden Termsen:term=external collateral ligament definition=ligament that bla bla bladefinition=ligament that bla bla bla …note=bla,bla, bla, also sometimes called the “fibular collateral ligament”.Problem:

You can’t look up the buried term.

29 of 50

It doesn’t become part of any concept system.It ends up in a /note/ element instead of a /term/ element.

3) Autonomy, SEW

Here the synonyms and variants are strung together in one /term/ field & we can’t document info about individual terms..

30 of 50

Klaus-Dirk Schmitz

Page 16: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 16

Data Element Elementarity

Only one of a thing can occupy a data element

e.g., only one term in a term fieldOnly one kind of thing can occupy a data element

e g no terms or synonyms listed as such

31 of 50

e.g., no terms or synonyms listed as such in definition fields

4) Elementarity

Common Elementarity Errors (1)Error:

en:term = United Nations (UN)en:term United Nations (UN)Correct:

en:term = United Nationsterm type = full formen:term = UNterm type = acronym

C bi bili bl id if i di id l

32 of 50

Combinability enables us to identify individual units of content: /term/ combines here with /term type/.

4) Elementarity

Page 17: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 17

Common Elementarity Errors (2)Error:

definition = international organization that … (Merriam Webster, 10th. Edition 2004, p. 256)

Correct:definition = international organization that …source = Webster2005, p. 256Webster2005 points to a shared resource (bibliographical entry)

33 of 50

( g p y)Combinability: Source can be used with a term or any text or graphics field.

4) Elementarity

Dependencies between data categoriesISO 12620:1999 provided a “simple

Modelling Dependencies (Combinability)

ISO 12620:1999 provided a simple hierarchy” of data categories

grammar = term-related: grammar is dependent upon term

Other dependenciessource is dependent upon definition

34 of 50

source is dependent upon definitionfor additional definitions, additional sources are neededIndividual source indicated for different items (term, definition, context, etc.)

Page 18: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 18

The same data category can be modeled in multiple ways (some of th t t b i ht!)

Data Modelling Variance

them not too bright!)gendervalue = m, f, nvalue = masculine, feminine, neutergender

35 of 50

gmasculine = yes/nofeminine = yes/noneuter = yes/no

6) Variance

Complex exampleterm: ink jet printer

Modelling variances

term: ink jet printersuperordinate concept: non-impact printersubordinate concept: bubble jet printercoordinate concept: laser printer

term: ink jet printerrelated concept: non-impact printer

type of relation: superordinaterelated concept: bubble jet printer

36 of 50

related concept: bubble jet printertype of relation: subordinate

related concept: laser printertype of relation: coordinate

6) Variance

Page 19: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 19

GranularityThe degree of detail that can be achieved by using the available data fields (data

t i ) t d t t i l i lcategories) to document terminological informationLow granularity:

Ex: /grammar/ m,n,s (masculine noun singular)High granularity

/part of speech/ = noun

37 of 50

/part of speech/ noun/grammatical gender/ = masculine/number/ = singular

7) Granularity

GranularityAdvantage of granularity: retrievabilityDisadvantage: more workDisadvantage: more workBest practices:

General agreements on what constitutes an acceptable level of effort in order to achieve a desired level of granularity makes it easier to share data.

38 of 50

How can you automatically convert low granularity to high granularity?

7) Granularity

Page 20: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 20

Bibliographical

39 of 50

Entries

39 of 458) Shared

9) Metamodel + “Vocabulary”9) Metamodel + “Vocabulary”

40 of 50

Page 21: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 21

9) Metamodel + “Vocabulary”9) Metamodel + Vocabulary

41 of 50

9) Metamodel + “Vocabulary”9) Metamodel + “Vocabulary”

42 of 50

Page 22: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 22

Definition-Related InfoDefinition

Source ID(Points to shared resources)

Definition typeTranslation?

Choice of levelsConcept (one def for all languages)L ( d f f h l )

43 of 50

Language (one def for each language)Term (one def for each term in descriptive work)

9) Metamodel + “Vocabulary”

Context-Related Info

ContextContextSource ID

(Points to shared resources)Context type

Translation?B th t d t l t d

44 of 50

Both term and concept-relatedAlways associated at the term-level, never at lang or concept level

9) Metamodel + “Vocabulary”

Page 23: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 23

Administrative InformationAdministrative info

ResponsibilityDateAuthorizationRoleVarious sorting subsets

– Business unit– Customer

45 of 50

– Computing environment– Product class, etc.

Applicable at multiple levels

9) Metamodel + “Vocabulary”

Other Data Categories

Concept relationsNotesOther administrative informationBibliographical informationSpecial categories, e.g.:

46 of 50

For standardizationFor inventory control

9) Metamodel + “Vocabulary”

Page 24: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 24

Data Category TypesGraphics (link to multimedia file)TermsTerms

By languageSynonymsTerm-related information

Text fieldsDefinitions Contexts Notes etc

47 of 50

Definitions, Contexts, Notes, etc.Automatically inserted administrative dataLinks

10) MultiTerm Meta-Categories

MultiTerm Levels & TypesIndex

TermsTerm like elementsTerm-like elements

(collocations, boilerplate)Text (definitions, contexts, notes, etc.)

Free-form data entryPicklistMultimedia fileN b

48 of 50

NumberDate

10) MultiTerm Meta-Categories

Page 25: Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology Management ... apodo que se utiliza para enviar mensajes a aquellas ... Shared Resources

Data Categories/Term Entry Design 2009-09-19

Chicata Fall Conference 25

Graphics can be used as resources for the whole entry, as a reference for a single l t ill t t i l tlanguage, or even to illustrate a single term.On my Christmas list: I want this kind of graphic to be an image map where I can click to the various terms cited on the image.Note the inclusion of the source of the graphic.

10) MultiTerm Meta-Categories

10) MultiTerm Meta-Categories