Leif-Jöran Olsson "Dramawebben, The Swedish Drama Web" KB 9 oktober 2015
-
Upload
digisam -
Category
Data & Analytics
-
view
459 -
download
0
Transcript of Leif-Jöran Olsson "Dramawebben, The Swedish Drama Web" KB 9 oktober 2015
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Dramawebben, The Swedish DramaWeb
Leif-Jöran Olsson
Språkbanken, University of Gothenburg, CLARIN-ERIC, SWE-CLARIN
2015-10-09
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Overview
I Dramawebben, The Swedish Drama Web
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Acknowledgements
I For the parts relating to Dramawebben (TheSwedish Drama Web) I gratefully acknowledgefinancial support from the Swedish ResearchCouncil (VR Dnr: 2011-6202).
I Thanks to project co-workers, Riksteatern andMusikverket.
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Text Encoding Initiative (TEI)
I Defacto standard for text encoding in theHumanities
I ModulesI Using XML Schema
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Goals of Dramawebben
I The project includes a baselined corpus of TEIdrama annotated plays
I Development of exploration and visualizationtools
I Engaging a vibrant communityI Educate students in TEI encoding and let them
be ambassadeurs spreading the wordI Target disciplines within the humanities, such as
linguistics, literary and theater history, studies inchildren’s culture, practical and theoreticalresearch in children’s theater, and arts tertiaryinstitutions.
<http://www.dramawebben.se>
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Perspective of Dramawebben
I The perspective is the one of the dramatic textas a working text.
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
The context
I The baseline encoding covers the basicstructure of the drama text.
I On top of that, it is possible to add semanticannotation, which goes beyond the text itself,referring to the action below, behind or beyondthe actual words.
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
The context (continued)
I All plays on Dramawebben(<http://www.dramawebben.se>) printed1880-1900 were selected. It ended up includingsome 70+ plays in all genres, children’s plays,drama and comedy, plays by female as well asby male dramatists.
I To tempt scholars in humanities with semanticencoding, we have started with onetheme–textile handicraft, which was a recurrentfeature of the plays by female playwrights of the1880’s.
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Visualisations
We produce several visualizations automatically byeXist-db extensions and apps.
I One kind of visualization is speeches divisioncharts. To get an indication of a skewed relationbetween female cast and female speeches onecan show the female percentage of speechesand cast side by side like in the following chart.
I Most plays in the selection have an equal shareof speeches in relation to the percentage offemale speaking roles.
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Speaker gender and speaker division
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Speaker gender division per play
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Female and child speeches division perplay
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Roles in cast list (and added) per play
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Result of searching for plays with aspecifc number of roles (excerpt)
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Handicraft
As a thematic coding we used the example oftextile handicraft since we believe that it cangenerate exciting issues and serve as an instructiveexample for other forms of semantic encoding.
I Using feature stucture elements, with key-valuepairs
I They can be tied to anchors to make themdiscontinuous
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
The model (simplified)
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Ongoing handicraft in speeches per play
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Children’s play and food and drink terms
For children’s play we used the same basic parts ofthe model as for handicraft, that is:
I activity,I talk about activity,I and play objects
As an other example of potential thematic codingwe extract food and drink terms. For this I created asimple hierarchical lexical resource with cookingand serving utensils, ingredients, dishes, proceduresetcetera. These concept words were expandedmorphologically by other lexical resources to coverall forms and some spelling variation over time.
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Food and drink terms per play
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Occupations
As an other example of thematic coding we use theHistorical variant of the international standardclassification of occupations (ISCO) called HISCO.
I 10 top level categories 1–0 and five levels ofsubcategories.
I The SCB also adapt/align its svensk standard föryrkesklassificering (SSYK) to the ISCO standard
I This makes it possible to compare occupations inan international context and link (LOD) to otherdatasets and implementations
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Occupations in play text (differencehisco 5-8, isco 0)
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Occupations of role characters
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Most common occupations in oneselection for role characters
18 54020 arbetspiga16 58320 amiralitetslöjtnantdotter13 14120 adjunkt10 14140 elisabetsyster8 15120 f.d. författare8 99900 arbetare7 -1 allmosehjon7 20210 andre legationssekreterare7 17320 aktris6 58220 biträdande vaktkonstapel5 17120 kompositör5 54010 f.d. dräng5 55130 auktionsvaktmästare5 20110 borgarråd4 17140 dragspelerska
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Link to images for occupation
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Link to history of work DB
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Presence and relations
I PresenceI Relations
We combine parts of namesdates module, like<listPerson> and <listOrg> with relations in<listRelation> elements to create graphs of relationsbetween persons (cast and non-cast) or interactionon stage (cast only) sociograms.
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Presence)
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Sociograms
I Sociograms are created dynamically and canbe created based on any criteria of whatconstitutes interaction.
I These can also be weighted by giving a numericvalue to the @sortKey attribute of the <relation>element.
I Of course you can also create other types ofgraphs based on dynamic data.
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Sociogram for “The father”
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Sociogram for “The father” alternativeview
6
5
3
2
4
13
11
512
32 3
1
4
2
1
1
3
7
1
1
1Adolf Svärd
Margret
doktorÖstermark pastorJonas
Bertha
Nöjd
Laura
Mention Gephi, to work with graphs/networks.
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Personal relations
I Personal relations are coded by hand.I Every <person> shall have at least one
<relation> referencing it.I Organisations can also be part of these
relations.I To differentiate between persons and
organisations in the graphs we make the<person> nodes elliptic and the <org> onesrectangular.
I Cast persons have a solid node outline whilenon-cast persons have a dashed outline.
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Personal relations (continued)
I We have followed the default of three <relation>@type values “personal”, “social”, and “other”.
I These are represented by dashed, solid, anddotted edges respectively.
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Person relations in ”Fröken Julie”
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Person relations in ”The Father”
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Training and evaluation
The freely available (open license) resources can beused for training and evaluation
I The hand-coded and proof-read referentialstrings (names, places)
I The hand-coded and proof-read relations
I The occupations resourceI The timespecific complementary lexical
resources in addition to already existing ones
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Named entity recognition (NER) basedon dw-delkorpus1/dw-delkorpus2
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Named entity recognition (NER) resourcebased on dw-delkorpus1
<http://www.dramawebben.se/sites/default/files/dw-delkorpus1/swe-dw1-3class-model.ser.gz>to be used with eXist-db stanford-ner app.NB Fully automatically generated proof of concept,but can still be useful for your purposes.
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Female talk not about men
Overview
Numbers anddivisions
Thematicexamples
Presence andrelations
Training andevaluation
Final Comments
Final Comments
I Resources available with open licenses:<http://www.dramawebben.se>
I Tools used or mentioned (not Dramawebbenspecific) with open licenses: eXist-db apps<https://github.com/ljo/exist-tei-graphing>,<https://github.com/ljo/exist-sparql> and<https://github.com/eXist-db/jfreechart>, moreunder <https://github.com/ljo/> and<https://github.com/eXist-db/>
I Graphs can be used in svg, graphml and gexfoutput