1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.
-
Upload
corey-woods -
Category
Documents
-
view
212 -
download
0
Transcript of 1 ISOCAT Proposed solutions for Problems encountered in DUELME-LMF Jan Odijk Nijmegen 21 Sep 2010.
1
ISOCAT Proposed solutions for
Problems encountered in DUELME-LMF
Jan Odijk
Nijmegen 21 Sep 2010
2
Overview
• General• Standardized DCs?• Multiple relevant DCs in ISOCAT• Overlap with other projects• Container Data Catgegories• Almost Identical DCs• Language Sections• Existing Tagsets
3
General
• Always try to map to an existing ISOCAT DC, – Where possible– Irrespective of whether the ISOCAT DC is part of an
official standard• If not possible, or if there is uncertainty
– Create a new DC, but– Also specify the relation with existing closely related
ISOCAT DCs. Provide • Type of the relation
– dropdown list to be provided by RELCAT developers,» E.g. equals, almost-equals, is hyponym of , is hyperonym of, etc.
• Textual clarification of the deviation
4
General
• Relation to be entered into Relation Registry (RR) as soon as it is available
• Temporarily Proposed notation:– recordset in CSV format with records consisting of 4
fields:• Relation type (from drop-down list; should be ISOCAT DCs
themselves)• Data-category 1 (ISOCAT PID)• Data-category 2 (ISOCAT PID)• Clarification (rich text)• Plus some administrative info: User id, creation date etc.
– To import into RR as soon as available
5
Standardized DCs?
• Ignore +/- standard status of DC in ISOCAT
• If needed, use relations in Relation Registry
6
Multiple ISOCAT DCs
• Map to an existing DC that is identical (wherever possible)
• Use relations to relate it to almost identical DCs in ISOCAT
7
Overlap with other projects
• Consult with other projects
• Registry of topics people/projects are working on– Dieter took some initiative– http://spreadsheets.google.com/ccc?key=0Al5Lw-
npZ6ZTdDZlT2VjeGhwZm5iRW5IM3BTZFI5WEE&hl=en&authkey=CL_Wl4ID
• This workshop (and others if needed)
8
Container data categories
• ISOCAT might be extended for this
• Probably not really a problem in the short term(?)
9
Almost identical DCs
• For ill-defined DCs in ISOCAT– Suggest better definitions and submit them to the
Thematic Domain Group– Use relations to relate your DC to existing
slightly different DCs (see later)
10
Almost identical DCs
• Example: Noun• Noun is a Part of Speech assigned to words which
share specific morphosyntactic (inflectional), morphological, syntactic (and semantic) properties
– morphosyntactic (inflectional) properties: • person, number, gender/class. declension class, case, …• Specific morphological combinatorial potential (derivation,
compounding), in particular diminutives, augmentatives• specific syntactic combinatorial potential
• Where each language selects a specific subset of these properties (as illustrated in the language sections.
11
Language Sections?
• The highly (Polish) language-specific – http://www.isocat.org/datcat/DC-2704 (noun)
• Noun [subst] contains lexemes infecting for number and case, with a lexically determined grammatical gender, which do not have the category of person, e.g., woda `water', profesor `professor', pięciokrotność 'fivefoldness'; this class also contains defective plurale tantum and singulare tantum lexemes, but not depreciative lexemes. Grammatical categories of noun [subst]: number (http://www.isocat.org/datcat/DC-2709), case (http://www.isocat.org/datcat/DC-2720), gender (http://www.isocat.org/datcat/DC-2728).
• Can now be part of the Polish language section of the DC Noun with the definition given in the previous slide
12
Existing Tagsets
• Make sure all DCs of an existing de facto standard tag set are in ISOCAT
– Either existing DCs– Or newly added DCs
• Assign all DCs from such a tag set to a new closed complex category
– E.g. DC d-coiTagset, ipipanTagset, etc.– (and/or to datacategory set?)
13
More…
• Problems and Proposed solutions– Odijk (2009), “Data Categories and ISOCAT: some remarks from a simple
linguist", presentation held at FLaReNet/CLARIN Standards Workshop, Helsinki, September 27, 2009
– Odijk, J. (2010), ""Relations between Data Categories, presentation held at the CLARIN Relation Registry Workshop, MPI, Nijmegen, January 8, 2010
• Both to be found (inter alia) on http://www.clarin.nl/node/80
14
CLARIN-NL
Thanks for your attention!
http://www.clarin.nl/