Leif-Jöran Olsson "Dramawebben, The Swedish Drama Web" KB 9 oktober 2015

39
Overview Numbers and divisions Thematic examples Presence and relations Training and evaluation Final Comments Dramawebben, The Swedish Drama Web Leif-Jöran Olsson Språkbanken, University of Gothenburg, CLARIN-ERIC, SWE-CLARIN 2015-10-09

Transcript of Leif-Jöran Olsson "Dramawebben, The Swedish Drama Web" KB 9 oktober 2015

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Dramawebben, The Swedish DramaWeb

Leif-Jöran Olsson

Språkbanken, University of Gothenburg, CLARIN-ERIC, SWE-CLARIN

2015-10-09

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Overview

I Dramawebben, The Swedish Drama Web

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Acknowledgements

I For the parts relating to Dramawebben (TheSwedish Drama Web) I gratefully acknowledgefinancial support from the Swedish ResearchCouncil (VR Dnr: 2011-6202).

I Thanks to project co-workers, Riksteatern andMusikverket.

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Text Encoding Initiative (TEI)

I Defacto standard for text encoding in theHumanities

I ModulesI Using XML Schema

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Goals of Dramawebben

I The project includes a baselined corpus of TEIdrama annotated plays

I Development of exploration and visualizationtools

I Engaging a vibrant communityI Educate students in TEI encoding and let them

be ambassadeurs spreading the wordI Target disciplines within the humanities, such as

linguistics, literary and theater history, studies inchildren’s culture, practical and theoreticalresearch in children’s theater, and arts tertiaryinstitutions.

<http://www.dramawebben.se>

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Perspective of Dramawebben

I The perspective is the one of the dramatic textas a working text.

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

The context

I The baseline encoding covers the basicstructure of the drama text.

I On top of that, it is possible to add semanticannotation, which goes beyond the text itself,referring to the action below, behind or beyondthe actual words.

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

The context (continued)

I All plays on Dramawebben(<http://www.dramawebben.se>) printed1880-1900 were selected. It ended up includingsome 70+ plays in all genres, children’s plays,drama and comedy, plays by female as well asby male dramatists.

I To tempt scholars in humanities with semanticencoding, we have started with onetheme–textile handicraft, which was a recurrentfeature of the plays by female playwrights of the1880’s.

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Visualisations

We produce several visualizations automatically byeXist-db extensions and apps.

I One kind of visualization is speeches divisioncharts. To get an indication of a skewed relationbetween female cast and female speeches onecan show the female percentage of speechesand cast side by side like in the following chart.

I Most plays in the selection have an equal shareof speeches in relation to the percentage offemale speaking roles.

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Speaker gender and speaker division

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Speaker gender division per play

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Female and child speeches division perplay

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Roles in cast list (and added) per play

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Result of searching for plays with aspecifc number of roles (excerpt)

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Handicraft

As a thematic coding we used the example oftextile handicraft since we believe that it cangenerate exciting issues and serve as an instructiveexample for other forms of semantic encoding.

I Using feature stucture elements, with key-valuepairs

I They can be tied to anchors to make themdiscontinuous

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

The model (simplified)

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Ongoing handicraft in speeches per play

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Children’s play and food and drink terms

For children’s play we used the same basic parts ofthe model as for handicraft, that is:

I activity,I talk about activity,I and play objects

As an other example of potential thematic codingwe extract food and drink terms. For this I created asimple hierarchical lexical resource with cookingand serving utensils, ingredients, dishes, proceduresetcetera. These concept words were expandedmorphologically by other lexical resources to coverall forms and some spelling variation over time.

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Food and drink terms per play

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Occupations

As an other example of thematic coding we use theHistorical variant of the international standardclassification of occupations (ISCO) called HISCO.

I 10 top level categories 1–0 and five levels ofsubcategories.

I The SCB also adapt/align its svensk standard föryrkesklassificering (SSYK) to the ISCO standard

I This makes it possible to compare occupations inan international context and link (LOD) to otherdatasets and implementations

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Occupations in play text (differencehisco 5-8, isco 0)

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Occupations of role characters

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Most common occupations in oneselection for role characters

18 54020 arbetspiga16 58320 amiralitetslöjtnantdotter13 14120 adjunkt10 14140 elisabetsyster8 15120 f.d. författare8 99900 arbetare7 -1 allmosehjon7 20210 andre legationssekreterare7 17320 aktris6 58220 biträdande vaktkonstapel5 17120 kompositör5 54010 f.d. dräng5 55130 auktionsvaktmästare5 20110 borgarråd4 17140 dragspelerska

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Link to images for occupation

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Link to history of work DB

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Presence and relations

I PresenceI Relations

We combine parts of namesdates module, like<listPerson> and <listOrg> with relations in<listRelation> elements to create graphs of relationsbetween persons (cast and non-cast) or interactionon stage (cast only) sociograms.

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Presence)

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Sociograms

I Sociograms are created dynamically and canbe created based on any criteria of whatconstitutes interaction.

I These can also be weighted by giving a numericvalue to the @sortKey attribute of the <relation>element.

I Of course you can also create other types ofgraphs based on dynamic data.

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Sociogram for “The father”

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Sociogram for “The father” alternativeview

6

5

3

2

4

13

11

512

32 3

1

4

2

1

1

3

7

1

1

1Adolf Svärd

Margret

doktorÖstermark pastorJonas

Bertha

Nöjd

Laura

Mention Gephi, to work with graphs/networks.

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Personal relations

I Personal relations are coded by hand.I Every <person> shall have at least one

<relation> referencing it.I Organisations can also be part of these

relations.I To differentiate between persons and

organisations in the graphs we make the<person> nodes elliptic and the <org> onesrectangular.

I Cast persons have a solid node outline whilenon-cast persons have a dashed outline.

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Personal relations (continued)

I We have followed the default of three <relation>@type values “personal”, “social”, and “other”.

I These are represented by dashed, solid, anddotted edges respectively.

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Person relations in ”Fröken Julie”

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Person relations in ”The Father”

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Training and evaluation

The freely available (open license) resources can beused for training and evaluation

I The hand-coded and proof-read referentialstrings (names, places)

I The hand-coded and proof-read relations

I The occupations resourceI The timespecific complementary lexical

resources in addition to already existing ones

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Named entity recognition (NER) basedon dw-delkorpus1/dw-delkorpus2

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Named entity recognition (NER) resourcebased on dw-delkorpus1

<http://www.dramawebben.se/sites/default/files/dw-delkorpus1/swe-dw1-3class-model.ser.gz>to be used with eXist-db stanford-ner app.NB Fully automatically generated proof of concept,but can still be useful for your purposes.

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Female talk not about men

Overview

Numbers anddivisions

Thematicexamples

Presence andrelations

Training andevaluation

Final Comments

Final Comments

I Resources available with open licenses:<http://www.dramawebben.se>

I Tools used or mentioned (not Dramawebbenspecific) with open licenses: eXist-db apps<https://github.com/ljo/exist-tei-graphing>,<https://github.com/ljo/exist-sparql> and<https://github.com/eXist-db/jfreechart>, moreunder <https://github.com/ljo/> and<https://github.com/eXist-db/>

I Graphs can be used in svg, graphml and gexfoutput