CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer.
-
Upload
victor-richards -
Category
Documents
-
view
214 -
download
0
Transcript of CLARIN-NL ISOcat workshop 2012 part 2 (10-10-2012) Ineke Schuurman Menzo Windhouwer.
CLARIN-NL ISOcat workshop 2012part 2 (10-10-2012)
Ineke Schuurman
Menzo Windhouwer
• Issues brought up by participants– Which elements are to be included in ISOcat– (CLARIN) standards, TEI etc– Type of DC– When to create a new DC/adapt an existing one– When to create several DCSs– Name of DC, several DCs with same name– How to deal with larger amounts of data
What to include?
• ALL concepts dealing with linguistics/ metadata– Van Dale EN-NE
include (overgankelijk werkwoord)
1) omvatten
2) (mede) opnemen
==> 'overgankelijk werkwoord' / 'transitive verb' is to be included, same for 'overg.ww', 'trns.v.'
• One and the same DC!
What to include?
‘transitive verb’
• Several entries in ISOcat–DC-1405A verb which takes a direct object; that is, a verb that
expresses an action which directly affects another person or thing.
–DC-3532A transitive verb is a verb that takes a direct object,
and describes a relation between two participants [Crystal 1997: 397; Payne 1997: 171]
– And several more, so... which one to select?
• When (not) to adopt an existing DC– It should ‘match’ with the way you use a
specific notion in your annotation scheme, application, …
– It should come with the same profile and type
• That being said– Reuse a CLARIN NL/VL DC when possible
(contact Ineke when such a definition is incorrect)
Same name
• Not really a problem when it are good DCs, not even when coming with the same profile
• PositivePolarity– In general, positive polarity refers to an
assertion that contains no marker of negation [Crystal 1980: 299]. (DC-3405)
– the property of a word or concept to express positive sentiment (myDC-xx)
• Whether you can reuse DC-3405 depends on your use of the concept!
Same name
• Do not avoid reuse of a name when it is the name commonly used!
• Another type of duplicate names where one concept entails the other one:
– meewerkend voorwerp – meewerkend en belanghebbend voorwerp
– event (also called 'eventuality', and including 'state')
– event (sister of 'state')
What defines a good DC?
Reusable definition
NOT
conversation (DC-2661)Communication event with more than two
participants
mother tongue (DC-2955)[…] a speaker’s mother tongue
What defines a good DC?
Correct definition
NOT (?)Actor (DC-4146)
a participant in an action or process
Question: is an addressee to be considered an actor? (used in DC-4158, no proper definition yet)
What defines a good DC?
Meaningful definition
NOT
annotation format (DC-2562)Specifies the annotation format that is used …
source language (DC-2494)Indicates if a language is a source language
Not that good examples
Mother tongue (DC-2955)Specifies whether the language is a speaker’s mother
tongue
Mother’s language (DC-4516)[…] NOT necessarily the mother tongue […]
- There is no definition of concept ‘mother tongue’
(Relation with /home language/ , /primary language/,
/heritage language/?)
- And why ‘speaker’?
Rule
Make your definition• as general as possible• as specific as necessary
Standards
• Within ISOcat currently there are little or no standards,
Therefore
• CLARIN NL and VL will set up their own set of ‘standardized DCs’, Ineke will be in charge, selecting new flag “recommended by CLARIN NL/VL”
Standards
Another issue wrt standards 'included' in ISOcat
- Athens Core DC's (recommended by metadata/CMDI): we are currently adapting them in order to avoid tautologies and/or correct smaller ‘errors’
Target language: indicates if the language is the target language
Conversation: […] three or more participants
Same may be necessary for TEI Headers etc
DC/DCS and profile
• Profiles are not added automatically, a DCS may contain elements with various profiles (although you may decide to create several DCSs) (do select proper names!)
• In case the profile you need is not yet available, contact Menzo and Ineke
Part B: do’s & don’tsDo’s:• Create a DCS for your scheme (name
project, ann.scheme, …)• Provide clear definition (short, to the point)
for your scheme, application, …. • Take care not to leave concepts used in your
definition undefined or vague• Use appropriate vocabulary (per profile)
• Check ‘adopted’ DC’s regularly till standardization !
Do’s (continued)
When creating a DC, fill out• Justification: used in XYZ, part of tagset
N• Language section
– Always English language section– Strong recommendation: sections for object
language(s), for working language manual– Sections in the various languages should
match (+/- be translations of each other)
Do’s (continued)
When creating a DC, fill out
• Example section – Note that *negative* examples may be very
helpful! (jongens, mannen, niet: gelovigen (is form of ADJ))
Example sections
Suppose you want to illustrate a German phenomenon:
• Ex.sec. in EN language section– German ex with transl in English
• Ex.sec. in NL language section– German ex with transl in Dutch
• Ex.sec. in EN linguistic section– EN example
• Ex.sec. in NL linguistic section– NL example with translation in English
Don’ts
• Confuse Language and Linguistic section– Latter contains language specific values for
closed domains
• Be (too) language specific in definition
• Mention scheme in definition
• Use several definitions in one DC
• Circular definitions
• Rely on authority
• Rely on standardized status– Definition should fit YOUR scheme, etc
Procedure - 1
Procedure - 2
.
-- End --