CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.
-
Upload
aubrie-wilcox -
Category
Documents
-
view
215 -
download
0
Transcript of CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.
![Page 1: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/1.jpg)
CLARIN-NL ISOcat workshop 2011part 2
Ineke Schuurman
Menzo Windhouwer
![Page 2: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/2.jpg)
Part A• Issues brought up by participants
– When (not) to adopt an existing DC– What about (CLARIN) standards– What with ‘flagged’ DCs– Relation DCS – profile– What should be included in ISOcat (level of
detail, abbreviations, …)– What about TEI, metadata, webservice?– How to deal with larger amounts of data
![Page 3: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/3.jpg)
Part B
• ISOcat and CLARIN: Do’s and don’ts (version 0.1)
– Introduction and discussion
![Page 4: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/4.jpg)
• Part 1– When (not) to adopt an existing DC– What about (CLARIN) standards– What with ‘flagged’ DCs– Relation DCS – profile– What should be included in ISOcat (level of
detail, abbreviations, …)– What about TEI, metadata, webservice?– How to deal with larger amounts of data
![Page 5: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/5.jpg)
• When (not) to adopt an existing DC
– It should ‘match’ with the way you use a specific notion in your annotation scheme, application, …
– It should come with the same profile– It should handle the same phenomenon,
SpeakerID =/= SingerID
![Page 6: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/6.jpg)
Speaker vs Singer
String→Name→Person→Singer→Opera → Opera singer→Tenor →Tenor in La Bohème
First: too generic, last: too specificThe others are candidates
Note that SingerID and SpeakerID are siblings, whereas SingerID is subclass of both Singer and ID (RELcat!)
![Page 7: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/7.jpg)
– When (not) to adopt an existing DC
– What about (CLARIN) standards– What with ‘flagged’ DCs– Relation DCS – profile– What should be included in ISOcat (level of
detail, abbreviations, …)– What about TEI, metadata, webservice?– How to deal with larger amounts of data
![Page 8: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/8.jpg)
Standards
• Within ISOcat currently there are little or no standards,
Therefore
• CLARIN NL and VL will set up their own set of ‘standardized DCs’, Ineke will be in charge (she will consult with others)
![Page 9: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/9.jpg)
– When (not) to adopt an existing DC– What about (CLARIN) standards
– What with ‘flagged’ DCs– Relation DCS – profile– What should be included in ISOcat (level of
detail, abbreviations, …)– What about TEI, metadata, webservice?– How to deal with larger amounts of data
![Page 10: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/10.jpg)
Flagged DCs
• Never link with ‘deprecated’ DCs !
(in case of doubt: consult with Ineke or Menzo)
• In other cases the flags show whether the DC specification is correct from a technical point of view.
• Note that only DCs with a green marking are qualified for standardization
![Page 11: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/11.jpg)
– When (not) to adopt an existing DC– What about (CLARIN) standards– What with ‘flagged’ DCs
– Relation DCS – profile– What should be included in ISOcat (level of
detail, abbreviations, …)– What about TEI, metadata, webservice?– How to deal with larger amounts of data
![Page 12: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/12.jpg)
DC/DCS and profile
• Profiles are not added automatically, a DCS may contain elements with various profiles
• In case the profile you need is not yet available, contact Menzo and Ineke
![Page 13: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/13.jpg)
– When (not) to adopt an existing DC– What about (CLARIN) standards– What with ‘flagged’ DCs– Relation DCS – profile
– What should be included in ISOcat (level of detail, abbreviations, …)
– What about TEI, metadata, webservice?– How to deal with larger amounts of data
![Page 14: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/14.jpg)
What to include?
• Cf slide on SingerID/SpeakerID
• In general: all linguistically meaningful notions mentioned in your schema, manual, definition (cf part B)
• Abbreviations (PST for /past tense/)
are to be mentioned as Data Element Name
![Page 15: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/15.jpg)
– When (not) to adopt an existing DC– What about (CLARIN) standards– What with ‘flagged’ DCs– Relation DCS – profile– What should be included in ISOcat (level of
detail, abbreviations, …)
– What about TEI, metadata, webservice?
– How to deal with larger amounts of data
![Page 16: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/16.jpg)
TEI, metadata, webservice
• TEI: likely to be taken care of at ‘higher level’, if not YOU are to insert the TEI definitions you use.
• Metadata: new in CMDI? In that case definition in ISOcat to be provided as well
• Webservice: to be taken care of in CMDI
![Page 17: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/17.jpg)
– When (not) to adopt an existing DC– What about (CLARIN) standards– What with ‘flagged’ DCs– Relation DCS – profile– What should be included in ISOcat (level of
detail, abbreviations, …)– What about TEI, metadata, webservice?
– How to deal with larger amounts of data
![Page 19: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/19.jpg)
Part B: do’s & don’tsDo’s:• Create a DCS for your scheme (name
project, ann.scheme, …)• Provide clear definition (short, to the point)
for your scheme, application, …. • Take care not to leave concepts used in your
definition undefined or vague• Use appropriate vocabulary (per profile)
• Check ‘adopted’ DC’s regularly till standardization !
![Page 20: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/20.jpg)
Do’s (continued)
When creating a DC, fill out• Justification: used in XYZ, part of tagset
N• Language section
– Always English language section– Strong recommendation: sections for object
language(s), for working language manual– Sections in the various languages should
match (+/- be translations of each other)
![Page 21: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/21.jpg)
Do’s (continued)
When creating a DC, fill out
• Example section – Note that *negative* examples may be very
helpful! (jongens, mannen, niet: gelovigen (is form of ADJ))
![Page 22: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/22.jpg)
Example sections
Suppose you want to illustrate a German phenomenon:
• Ex.sec. in EN language section– German ex with transl in English
• Ex.sec. in NL language section– German ex with transl in Dutch
• Ex.sec. in EN linguistic section– EN example
• Ex.sec. in NL linguistic section– NL example with translation in English
![Page 23: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/23.jpg)
Don’ts
• Confuse Language and Linguistic section– Latter contains language specific values for
closed domains
• Be (too) language specific in definition
• Mention scheme in definition
• Use several definitions in one DC
• Circular definitions
• Rely on authority
• Rely on standardized status– Definition should fit YOUR scheme, etc
![Page 24: CLARIN-NL ISOcat workshop 2011 part 2 Ineke Schuurman Menzo Windhouwer.](https://reader031.fdocuments.in/reader031/viewer/2022032707/56649e3b5503460f94b2cead/html5/thumbnails/24.jpg)
.
-- End --