Crowdsourcing the Annotation of Rumourous Conversations in Social Media
Tags in the cloud : Crowdsourcing semantic annotation with CATMA
description
Transcript of Tags in the cloud : Crowdsourcing semantic annotation with CATMA
![Page 1: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/1.jpg)
Jan Christoph MeisterUniversity of Hamburg
www.catma.de
![Page 2: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/2.jpg)
CATMA - an integrated textual markup and analysis tool
29.10.2012 2CLARIN's Turn Towards The Literary Text
![Page 3: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/3.jpg)
Text vs. sentence, or: What‘s so different about processing texts?• structural complexity: min TEXT > 2 (SENTENCE)
• structural activity: TEXT processing actualizes paradigmatic cross-reference across sentences
• structural dynamic: TEXT processing represents & simulates cognitive and empirical processes
29.10.2012 CLARIN's Turn Towards The Literary Text 3
TEXT yields more INTERPRETATIONS than SENTENCE
+CONTINGENCY: The more complex & dynamic structure, when activated during processing, results in a higher degree of contingency in functional „outcome“
![Page 4: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/4.jpg)
The what and why of MarkUp procedural, descriptive & discursive
function
• discursive markup: enables human readers to interpret a text and to explore its hermeneutic potential in collaboration „What might this text mean to us?“
• declarative markup: informs a human reader how to process a text as a communicative device „How is this text put together and how does it function in its communicative universe?“
• procedural markup: instructs a (natural or artificial) text processor how to handle a text as a structured character string „What is the correct operation to perfom on this input?“
29.10.2012 4CLARIN's Turn Towards The Literary Text
performative function
discursive function
![Page 5: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/5.jpg)
Hermeneutic „must haves“ of discursive markup
facilitate collaboration & non-deterministic annotation
allow for multiple markup allow for overlap allow for concurrent tagging
conceptualize markup as dynamic & recursive
allow for extensibility allow for multiple (and even contradictory) markup seamlessly integrate markup and analysis & support the hermeneutic loop
29.10.2012 5CLARIN's Turn Towards The Literary Text
![Page 6: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/6.jpg)
MarkUp types & data models
29.10.2012 CLARIN's Turn Towards The Literary Text 6
There is no such thing as “no-mark up”. (Coombs, Renear, DeRose 1987)
opaqueimplicit
<SentenceStart>There</SentenceStart> is no such thing as “no-mark up.”
linearinline, deterministic
<SentenceStart><Adverb>There</Adverb></SentenceStart> is no such thing as “no-mark up”.
nested inline,deterministic sequential
There is no such thing as ”no-mark up”.
<1,5, word class = “Adverb”><1,5, segment = “SentenceStart”><1,5, POS = “verb phrase element”>
relationalstand off, descriptive
<1,5, word class = “Adverb”><1,38, speech act = “declaration”><1,11, POS = “verb phrase”>
There is no such thing as “no-mark up”.
<1,5, word class = “Preposition”><1,5, segment = “SentenceStart”><1,8, POS = “noun phrase”> network
stand off, discursive
![Page 7: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/7.jpg)
Implementation in CATMA
29.10.2012 7CLARIN's Turn Towards The Literary Text
www.catma.de
![Page 8: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/8.jpg)
The CATMA/CLÉA approach to markup
text range based model a tag references a text range with a start and an
end offset external standoff markup
markup is stored in external files or data bases to facilitate tagging and exchange of markup by multiple users
markup is stored in a standoff manner to allow overlapping
markup tolerates non-deterministic tagging & supports analytical operations that exploit semantic ambiguity
29.10.2012 8CLARIN's Turn Towards The Literary Text
![Page 9: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/9.jpg)
Example for overlapping markup in CATMA
29.10.2012 CLARIN's Turn Towards The Literary Text 9
(NB: In CATMA tag sets can be imported/exported; tags can be created / manipulated ad hoc during mark up)
![Page 10: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/10.jpg)
TEI feature structure tag declaration & overlapping markup
<fs xml:id="CATMA_d7251f99-14e9-4c36-8ff7-24058ae81ce5" n="1_7985fdf0-77a5-4060-9a3d-2d977e0ab954" type="catma_tag">
<f xml:id="CATMA_aa9b3727-187e-4fb8-9990-e7880912a409" name="catma_tagname">
<string>Keynote_speaker&affiliation</string>
</f>
<f xml:id="CATMA_564825ba-28b2-4dab-b136-b87c8a3d9e28" name="catma_displaycolor">
<numeric value="-13421569"/>
</f>
</fs>
29.10.2012 CLARIN's Turn Towards The Literary Text 10
<ptr target="Abstracts.doc#range( /.21736, /.21888)" type="inclusion"/>
<seg ana="#CATMA_0a252cc2-96d2-4ed4-8fb8-52380550ec0b #CATMA_d7251f99-14e9-4c36-8ff7-24058ae81ce5 #CATMA_8513fe2d-2e35-4d0a-a3a2-07528bcfa012">
![Page 11: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/11.jpg)
Question 1: How can we model a collaborative mark up practice?
29.10.2012 CLARIN's Turn Towards The Literary Text 11
![Page 12: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/12.jpg)
Answer 1: CATMA’S “n-meta-data set to-1 object data instance”-model
29.10.2012 12CLARIN's Turn Towards The Literary Text
meta-data•procedural•declarative•hermeneutic
object-data
![Page 13: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/13.jpg)
Question 2: But how, on top of that, can we also model the recursive routines that characterize the humanistic workflow?
29.10.2012 CLARIN's Turn Towards The Literary Text 13
![Page 14: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/14.jpg)
Example for recursion: a simple querie across the object data/meta data divide
29.10.2012 CLARIN's Turn Towards The Literary Text 14
Step 1: object data querie
Step 2: refinement by adding ...
... an additional meta-data constraint
![Page 15: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/15.jpg)
... which is why (reg="\b\S*\Qez\E(?=\W)") where (tag="Keynote_speaker&affiliation") generates this:
29.10.2012 CLARIN's Turn Towards The Literary Text 15
![Page 16: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/16.jpg)
Answer 2: CATMA’S dynamic data model, e.g. (n meta-data set to 1 object instance)>n+1
29.10.2012 16CLARIN's Turn Towards The Literary Text
meta-data•procedural•declarative•hermeneutic
object-data
object-data
![Page 17: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/17.jpg)
Question 3: How can we implement this practice in a system?
29.10.2012 CLARIN's Turn Towards The Literary Text 17
![Page 18: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/18.jpg)
Answer 3: Call the big sister – CLÉA!
29.10.2012 CLARIN's Turn Towards The Literary Text 18
CLÉA Data Base Model
![Page 19: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/19.jpg)
CATMA/CLÉA: User and resource administration
29.10.2012 CLARIN's Turn Towards The Literary Text 19
![Page 20: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/20.jpg)
Manage corpora & source documents, markup collections and tag libraries
29.10.2012 CLARIN's Turn Towards The Literary Text 20
![Page 21: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/21.jpg)
Annotate texts or corpora using pre-defined or ready-made tags
29.10.2012 CLARIN's Turn Towards The Literary Text 21
![Page 22: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/22.jpg)
Build and execute queries on source text & tags, or any combination thereof
29.10.2012 CLARIN's Turn Towards The Literary Text 22
![Page 23: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/23.jpg)
Visualize results
29.10.2012 CLARIN's Turn Towards The Literary Text 23
![Page 24: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/24.jpg)
What’s in it for CLARIN?
• Import any text or corpus into CATMA/CLÉA• Run standard analytical procedures automatically
or inter actively on upload (indexing, POS tagging etc.)
• Annotate and analyse texts or corpora collaboratively
• Share and export markup from the CATMA/CLÉA data base in multiple formats
CLÉA = Collaborative Literature Éxploration and Annotation
29.10.2012 CLARIN's Turn Towards The Literary Text 24
![Page 25: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/25.jpg)
29.10.2012 CLARIN's Turn Towards The Literary Text 25
Mille grazie to my CATMA/CLÉA development team
• Evelyn Gius• Malte Meister• Marco Petris• Lena Schüch
and to our funders
• University of Hamburg (2009)• Google DH Awards (2010-2013)• BMBF (2013-2016)
![Page 26: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/26.jpg)
Tag definition
<fsDecl xml:id="CATMA_TAG_ID_1"
type="test"
baseTypes="catma_tag">
<fsDescr>test - Test Tag</fsDescr>
<fDecl xml:id="CATMA_TAG_DEF_1_PROP_1"
name="catma_displaycolor"
optional="false">
<vRange><numeric value="-13408513"/></vRange>
</fDecl>
<fDecl xml:id="CATMA_TAG_DEF_1_PROP_2" name="user_defined_test_property"
optional="false">
<vRange><string/></vRange>
</fDecl>
</fsDecl>
each Tag can haveadditional user defined properties
each Tag has a type
each Tag has a color
29.10.2012 26CLARIN's Turn Towards The Literary Text
![Page 27: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/27.jpg)
Tag instance
<fs xml:id="CATMA_TAG_INSTANCE_1" type="test">
<f xml:id="CATMA_PROPERTY_1_1" name="catma_displaycolor">
<numeric value="-13408513"/>
</f>
<f xml:id="CATMA_PROPERTY_1_2" name="user_defined_test_property">
<string>instance specific test value</string>
</f></fs>
a Tag instance can have individual values for the user defined properties
each Tag instance is of a type
29.10.2012 27CLARIN's Turn Towards The Literary Text
![Page 28: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/28.jpg)
Tag referencing
<seg ana="#CATMA_TAG_INSTANCE_1">
<ptr target="mytext_utf8.txt#char=36168,36185" type="inclusion"/>
</seg>
The content of a range is referenced by a pointer to an external entity.
The URI is based on the RFC 5147 for pointing to plain text.
29.10.2012 28CLARIN's Turn Towards The Literary Text
![Page 29: Tags in the cloud : Crowdsourcing semantic annotation with CATMA](https://reader035.fdocuments.in/reader035/viewer/2022062322/56814cfc550346895dba18bb/html5/thumbnails/29.jpg)
Potential problems and possible solutions
referencing ranges based on character offsets are vulnerable to modifications of the content• possible solution: automated adjustments with
checksums and context information, and• track versioning and revision history in the source
document header
the encoding of the tags is machine readable but not interoperable out of the box possible solution: defining the feature structure
encoding of tags in terms of the open annotation framework
29.10.2012 29CLARIN's Turn Towards The Literary Text