From crowd sourced collection to digital scholarly edition

Post on 14-Apr-2017

219 views 0 download

Transcript of From crowd sourced collection to digital scholarly edition

From crowd-sourced collection to digital scholarly edition

The example of the Letters of 1916 project

Funding Bodies

Susan Schreibman - Project Director and Editor in Chief

Karolina Badzmierowska - Researcher

Roman Bleier - Researcher

Emma Clarke - Researcher

Vinayak Das Gupta - Researcher

Richard Hadden - Researcher

Hannah Healy - Researcher

Shane McGarry - Software Engineer

Neale Rooney - Researcher

Linda Spinazzè - Researcher

Team

An Foras Feasa Institute, Maynooth

Why 1916?

The Easter Rising

24-29 April 1916

“ Allowing letters from personal collections

to be read alongside official letters

and letters contributed by institutions

will add new perspectives

to the events of the period and allow us

to understand

what it was like to live an ordinary life

through what were extraordinary times ”

Susan Schreibman

1 November 1915 - 31 October 1916

The Letters 1916 - a year in the life

Letters of 1916 - some numbers (from 13 October)Launched: 27 September 2013

Correspondence documents uploaded:

2209

Uploaded items from 42 private

collections and 23 collaborating

institutions

Registered users: 1159

Transcribed characters: 2308911

Diversity of Letters of 1916 correspondence data

Diversity of documents:

Single/Multi-page Letters

Postcards

Greeting cards

Telegrams

Envelopes

...

Variety of topics:

Love letters

Family life

Business

Crime

World War One

...

Crowdsourcing workflow - upload

Crowdsourcing workflow - transcription desk

Facsimile image

Bentham toolbar

Text Editor

Toolbar

About the TrainingTraining of transcribers - Essential part of public outreach

Leads to better quality of transcriptions

Workshop

Seminars

Secondary school history teachers,

students, and general public

Goal : Accuracy

1. Incorrect or incomplete metadata

2. Non-TEI markup (e.g. HTML tagging…)

3. TEI tag abuse - misunderstandings

Facing three main areas with quality issues

Community engagement vs standards of excellence?

Incorrect, incomplete or incoherent metadata

the field correspond to the tag

<note type="summary"> inside the TEI header

non-TEI markup (HTML cases)

non-TEI mark-up (non XML)

Indication of location of a section of text:

NOTE IN LEFT MARGIN Give my regards to Dick when next you meet him

(front of post card)To Lady Clonbrock, Ahascragh, Co.Galway

[Handwritten notes at bottom :I Note annexII Await any application from Prof Collingwood;III Resubmit on 1st March]

Uncertainty and missing text:James McCarthy & Family, Wm Perron. 1.50 Nick Welch, xxxxxxx Jxxx & Mrs Shields. 1.00 Alex xxx, Fred xxxx, M. Barry, L. x.

has told you ?Neeson? is in Sussex. Th? ????? ???? ?????letters? from him, but no

(samples from reliable transcriptions)

TEI tag abuse - misinterpretation of TEI

The transcriber uses the tags in an

attempt to recreate the layout

The Transcriber applies the tags without

comprehending the functionality

Quest for Crowdsourcing Accuracy

Quality check:

● pre-selecting the contributors● a self-regulating community● professional staff hired to ensures the crowdsourced content is fine

The 1916 Letter project tries to go a different way and applies a hybrid and semi-automated approach to proofing

Borrowing a Unix Philosophy

“If you can get 90 percent of the desired effect for 10 percent of

the work, use the simpler solution.”

— Bob Sheifler and Jim Gettys, Early Principles of X-Window

Difficult Letters

Modularity in crowdsourced transcribing and editing

Crowdsourcing needs

discrete tasks to be

carried out —

otherwise, chaos!

Post-Omeka Workflowletters: { 302: { title: ”Letter from Patrick Pearse to his mother”, pages: {

27: { facs: “img27.jpg” transcription: “<p>Dear Mother</p> [...] <salute>Your loving son</salute> Padraic.” } 28: {...} }

other-metadata: {...} }, 303: {...}}

Basic typos with tagsSome examples

Slashes in the wrong place:</pb> → <pb/><address/> → </address>

Accidental angle brackets:<<p> → <p>

Missing angle brackets:<salute → <salute>

Number of ‘tag-typos’ per letter (grouped by number of errors)

Nearly half the letters have at least one tag-typo we can fix like this

Finding types of correspondence“Letter from Patrick Langford Beazley to Piaras Béaslaí, 14 Feb

1916”

“Postcard from Herbert Pim to John Sweetman, 1 October 1916”

“Deportation Order from the Secretary of State to James Gough,

17 June 1916” ??

Envelopes

Page 4

DUBLIN 16 APRIL<address>Diarmid Coffey <sic>Esqu</sic>,<lb/>

Mount Trenchard,<lb/> Foynes,<lb/>

Co. Limerick, Ireland</address>

Page 1

<note>you addressed <lb/><sic>yr</sic> letter to<lb/> Harcourt Terrace<sic>wh</sic> delayed it late <lb/>it came this <lb/>afternoon! <lb/>toolate to<lb/> <hi rend="underline">write</hi></note><address>Langridge,<lb/>Bath</address><date>16.10.16</date><salute>Dearest D.</salute><p> Phyllis &amp; Basil have <lb/>written that they come <lb/> out for weekend so [...] Envelope

address>3 Coast Hill <lb/> Queenstown </address> <date>June.19.1916 </date> <salute>My Own Dearest Jim </salute>

Wish of your loving <lb/> <salute>Mother A. Fitzgerald </salute> xxxxxxx</p>

Adding structural elements to letters <opener> <address> <addrLine>3 Coast Hill </addrLine> <addrLine>Queenstown </addrLine> </address> <dateline> <date>June.19.1916 </date> </dateline> <salute> My Own Dearest Jim </salute> </opener>

<closer> <salute> Wish of your loving <lb/> Mother </salute> <signed> A. Fitzgerald </signed> </closer>

Adding the @when<date>Tues oct 22 1916</date>

>>> a = dateparser.parse('Tues oct 22 1916')

>>> a

datetime.datetime(1916, 10, 22, 0, 0)>>> a.date().isoformat()

'1916-10-22'

<date when=”1916-10-22”>Tues oct 22 1916</date>

Postcards

Type 2

Type 1

Templating

LetEd.

Questions to concludeIs it worth it?

Why the trouble of TEI encoding instead of plain text?

Roman Bleier | bleierr@tcd.ie

Richard Hadden | richard.hadden@nuim.ie | @oculardexterity

Linda Spinazzè | linda.spinazze@nuim.ie

We welcome suggestions, comments, questions.