Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology...
Transcript of Data Modelling and Data Categories for Terminology Management · Data Categories for Terminology...
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 1
Data Modelling and gData Categories
for Terminology Management
Sue Ellen WrightSue Ellen WrightKent State University
TRST 80098 Terminology Management Seminar Fall 2009
Terminology Management
There is unfortunately no cureThere is unfortunately no cure for terminology; you can only hope to manage it.
(Kelly Washbourne)
2 of 50
(Kelly Washbourne)
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 2
OverviewData modelling principles1) Concept orientation2) D t t i (d t t)2) Data categories (datcat)3) Term autonomy & datcat repeatability4) Elementarity5) Dependencies (combinability)6) Data modelling variance
3 of 50
6) Data modelling variance7) Granularity8) Shared resources
1) Concept Orientation
All terminological information b l i i l dibelonging to one concept, including all terms in all languages and all term-related and administrative data are be stored (or displayed) in oneterminological entry
4 of 50
terminological entry1 concept = 1 (real or virtual) terminological entry
1) Concept Orientation
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 3
Modern Terminology Management Systems (TMS)
Concept Orientation
y ( )Follow the concept-oriented approach Support features for consistent concept entries.Discourage doublettes (“double entries”) for the same concept
5 of 50
Allow for homographs (different concepts represented by the same term)
1) Concept Orientation
2) Data CategoriesPrior to computerization, data categories were not discussed in detail indiscussed in detail in terminology theory.
Initial approaches to data categories involved describing “fields” in paper
6 of 50
describing fields in paper forms for recordingterminological data offline.
2) Data Categories
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 4
Paper FicheSubject field Language
Subset Project code Classification codeNotation
2) Data Categories
Source
Term
Definition
Contexts
Short forms, abbreviations, orthographic variants Grammar
Source
Source
Source
Comments (note)
Creator/Date Editor/Date Entry Class
SourceSynonyms
Packaging Data
Model for visualizing informationA h fl f diff i d “ ff”An amorphous flow of undifferentiated “stuff”An aggregate of individual elements (little packages, components) that can be identified, delimited, organized (modeled), stored, retrieved, manipulated, and reusedStuff people can figure out if they think about it
8 of 50
Stuff a computer program can automatically recognize and process
2) Data Categories
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 5
The Shopping Basket Model
A shopping basket is a container to put stuff in.The stuff in the basket can be tossed in loose without packaging.Stuff gets lost. St ff t i d
9 of 50
Stuff gets mixed up.
2) Data Categories
Bulk Product LoadingUnpackagedrat poison flour
Unwrappedfoodstuffs
2) Data Categories
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 6
Packaging Stuff
Keeps products clean and t i t duncontaminated
Makes them easy to identifyMakes them easy to storeMakes them easy to reuse
11 of 50
03.02.1997 - 18:16:50 Walther 25.01.2000 - 18:56:54 Wright 34 Network management alias 97/02/10 -10:29:28 Walther 97/02/10 - 10:29:28 Walther noun Main Entry Term shorter form of a long name, such as an email address, a directory, or a command Fahey:7 Actually, all text-type addresses, long or short, are li f th IP dd hi h i i l laliases of the IP address which is numerical only. Fahey:7 Aliases in real-time chat are usually referred to as nicknames and handles nicknamenoun Synonym When you start an IRC session, you specify a nickname, up to nine characters, which you can also change at any time. Osborne94: 413 handle noun The nickname you assign yourself when conversing in discussion groups. Falcón: 94 Use only in g p ycolloquial situations. alias noun m un nombre corto o apodo que se utiliza para enviar mensajes a aquellas direcciones de uso más frecuente. Carballar94:122 Un alias representa a un grupo de personas, y cada vez que se envía un correo a un alias, se está enviando a todo el grupo de personas que lo forman. Carballar94: 112
Unpackaged Data
2) Data Categories
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 7
Organized Data
2) Data Categories
ConceptRepresented by ID-No. and/or classification / notation
Term Autonomy
Language 1 Language 2 Language 3 ...
Term 1+ DatCatSet
Term 2
Term 1+ DatCatSet
Term 2
Term 1+ DatCatSet
Term 2+ DatCatSet
Term 2+ DatCatSet
Term 3+ DatCatSet
3) Autonomy
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 8
Kinds of Data Categoriescomplex data category (MultiTerm: index, text, picklist)
data category that has content of some sort
open data category (MultiTerm: index, text)p g y ( , )complex data category whose conceptual domain is not restricted to a set of values (non-enumerated CD)
closed data category (MultiTerm: picklist)complex data category whose conceptual domain is restricted to a set of identified simple data categories making up its value domain (enumerated conceptual domain)
15 of 50
domain)
simple data category (MultiTerm: picklist value)data category with no conceptual domain; a member of a value domain
2) Data Categories
Data Categories
open data categorycomplex data category whose contentcomplex data category whose content (conceptual domain) is not restricted to a set of values (non-enumerated conceptual domain)Example: /term/, /definition/, /note/Open data categories are still constrained by their specifications and definitions.
16 of 50
pOnly a term should go in a term field.Only a definition should go in a definition fieldBut no one can say precisely what terms or definitions there are
2) Data Categories
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 9
Sample Open Data CategoryData categories can be combined:
en:term = a term in EnglishAn instance of a data category is a data element.
en:term=external collateral ligament is an individual data element.en:term=fibular collateral ligament is an
17 of 50
en:term=fibular collateral ligament is an another individual data element.
2) Data Categories
Picklists: Closed Data Categories/grammatical gender/ is a data category/grammatical gender/ has defined values:/grammatical gender/ has defined values:
masculinefeminineneuterother/f/ / l ll d i kli t l
18 of 50
m/f/n/o are also called picklist values, permissible values or permissible instances.
2) Data Categories
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 10
Permissible ValuesTogether, permissible values comprise a value domain.N h l ll d lNo other values are allowed unless a change is made in the system.Terminologists sometimes call these values simple data categories. They arevalues, but they do not themselves have d d l
19 of 50
dependent values.Each has its own value meaning.
2) Data Categories
Concept Entry Level
Concept levelSubject field(s)Administrative information Graphic (option)Definition & related info (option)Note (repeatable anywhere)
20 of 50
Note (repeatable anywhere)
9) Metamodel
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 11
Language Level
Object language infoGraphic (option) & related infoDefinition (option) & related infoAdministrative infoNote (repeatable anywhere)
21 of 509) Metamodel
Term Information Group Level
TermTerm-related infoDescriptive info
Definitions, other concept descriptionGraphic
22 of 50
Administrative info
9) Metamodel
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 12
Term-Related DatCatsTerm
Part of speechPart of speechGrammatical genderGrammatical number (use when necessary)
Plural formTerm type (Type) (see next slide)StatusRegional label
23 of 50
Regional label PronunciationRegister (usage register)
9) Metamodel + “Vocabulary”
Term TypeAbbreviationF ll fFull formVariantPhraseCollocation
24 of 509) Metamodel + “Vocabulary”
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 13
Data-Element Related FeaturesData Element Autonomy (repeatability)
Elementarity of data elementsDependency (combinability)Dependency (combinability)
Data Modeling VarianceElementarityGranularity
Shared Resources
25 of 50
Metamodel issues Standards
All terms should be managed as autonomous (repeatable) blocks of data categories without any preference for a
Term Autonomy & Repeatability
categories without any preference for a specific term
Each term documented with the relevant term-related data categoriesTerm autonomy applicable for preferred term, synonyms, variants, & short forms
26 of 50
te , sy o y s, a a ts, & s o t o sTerm autonomy not explicitly discussed in other theoretical literature
3) Autonomy
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 14
Here each term has its own DatCatSet.
27 of 50
3) Autonomy,
Here terms are hidden in /variant/ and /synonym/ fields.
28 of 50
3) Autonomy
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 15
Embedded or Hidden Termsen:term=external collateral ligament definition=ligament that bla bla bladefinition=ligament that bla bla bla …note=bla,bla, bla, also sometimes called the “fibular collateral ligament”.Problem:
You can’t look up the buried term.
29 of 50
It doesn’t become part of any concept system.It ends up in a /note/ element instead of a /term/ element.
3) Autonomy, SEW
Here the synonyms and variants are strung together in one /term/ field & we can’t document info about individual terms..
30 of 50
Klaus-Dirk Schmitz
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 16
Data Element Elementarity
Only one of a thing can occupy a data element
e.g., only one term in a term fieldOnly one kind of thing can occupy a data element
e g no terms or synonyms listed as such
31 of 50
e.g., no terms or synonyms listed as such in definition fields
4) Elementarity
Common Elementarity Errors (1)Error:
en:term = United Nations (UN)en:term United Nations (UN)Correct:
en:term = United Nationsterm type = full formen:term = UNterm type = acronym
C bi bili bl id if i di id l
32 of 50
Combinability enables us to identify individual units of content: /term/ combines here with /term type/.
4) Elementarity
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 17
Common Elementarity Errors (2)Error:
definition = international organization that … (Merriam Webster, 10th. Edition 2004, p. 256)
Correct:definition = international organization that …source = Webster2005, p. 256Webster2005 points to a shared resource (bibliographical entry)
33 of 50
( g p y)Combinability: Source can be used with a term or any text or graphics field.
4) Elementarity
Dependencies between data categoriesISO 12620:1999 provided a “simple
Modelling Dependencies (Combinability)
ISO 12620:1999 provided a simple hierarchy” of data categories
grammar = term-related: grammar is dependent upon term
Other dependenciessource is dependent upon definition
34 of 50
source is dependent upon definitionfor additional definitions, additional sources are neededIndividual source indicated for different items (term, definition, context, etc.)
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 18
The same data category can be modeled in multiple ways (some of th t t b i ht!)
Data Modelling Variance
them not too bright!)gendervalue = m, f, nvalue = masculine, feminine, neutergender
35 of 50
gmasculine = yes/nofeminine = yes/noneuter = yes/no
6) Variance
Complex exampleterm: ink jet printer
Modelling variances
term: ink jet printersuperordinate concept: non-impact printersubordinate concept: bubble jet printercoordinate concept: laser printer
term: ink jet printerrelated concept: non-impact printer
type of relation: superordinaterelated concept: bubble jet printer
36 of 50
related concept: bubble jet printertype of relation: subordinate
related concept: laser printertype of relation: coordinate
6) Variance
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 19
GranularityThe degree of detail that can be achieved by using the available data fields (data
t i ) t d t t i l i lcategories) to document terminological informationLow granularity:
Ex: /grammar/ m,n,s (masculine noun singular)High granularity
/part of speech/ = noun
37 of 50
/part of speech/ noun/grammatical gender/ = masculine/number/ = singular
7) Granularity
GranularityAdvantage of granularity: retrievabilityDisadvantage: more workDisadvantage: more workBest practices:
General agreements on what constitutes an acceptable level of effort in order to achieve a desired level of granularity makes it easier to share data.
38 of 50
How can you automatically convert low granularity to high granularity?
7) Granularity
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 20
Bibliographical
39 of 50
Entries
39 of 458) Shared
9) Metamodel + “Vocabulary”9) Metamodel + “Vocabulary”
40 of 50
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 21
9) Metamodel + “Vocabulary”9) Metamodel + Vocabulary
41 of 50
9) Metamodel + “Vocabulary”9) Metamodel + “Vocabulary”
42 of 50
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 22
Definition-Related InfoDefinition
Source ID(Points to shared resources)
Definition typeTranslation?
Choice of levelsConcept (one def for all languages)L ( d f f h l )
43 of 50
Language (one def for each language)Term (one def for each term in descriptive work)
9) Metamodel + “Vocabulary”
Context-Related Info
ContextContextSource ID
(Points to shared resources)Context type
Translation?B th t d t l t d
44 of 50
Both term and concept-relatedAlways associated at the term-level, never at lang or concept level
9) Metamodel + “Vocabulary”
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 23
Administrative InformationAdministrative info
ResponsibilityDateAuthorizationRoleVarious sorting subsets
– Business unit– Customer
45 of 50
– Computing environment– Product class, etc.
Applicable at multiple levels
9) Metamodel + “Vocabulary”
Other Data Categories
Concept relationsNotesOther administrative informationBibliographical informationSpecial categories, e.g.:
46 of 50
For standardizationFor inventory control
9) Metamodel + “Vocabulary”
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 24
Data Category TypesGraphics (link to multimedia file)TermsTerms
By languageSynonymsTerm-related information
Text fieldsDefinitions Contexts Notes etc
47 of 50
Definitions, Contexts, Notes, etc.Automatically inserted administrative dataLinks
10) MultiTerm Meta-Categories
MultiTerm Levels & TypesIndex
TermsTerm like elementsTerm-like elements
(collocations, boilerplate)Text (definitions, contexts, notes, etc.)
Free-form data entryPicklistMultimedia fileN b
48 of 50
NumberDate
10) MultiTerm Meta-Categories
Data Categories/Term Entry Design 2009-09-19
Chicata Fall Conference 25
Graphics can be used as resources for the whole entry, as a reference for a single l t ill t t i l tlanguage, or even to illustrate a single term.On my Christmas list: I want this kind of graphic to be an image map where I can click to the various terms cited on the image.Note the inclusion of the source of the graphic.
10) MultiTerm Meta-Categories
10) MultiTerm Meta-Categories