Crowdsourcing Transcription
with Open Source Software
Ben BrumfieldMAC Fall Symposium 2013
Why Transcribe?
Crowdsourcing can be
Tagging
Georectification
Identification
But if you've got scanned documents, you've got a problem
Serendipity: One Volunteer's Story
Nat WoodingSemi-retired data analyst
200 pages of Julia Brumfield's 1923 diary in nine months
No relation to diarist
Serendipity: One Volunteer's Story
Nat WoodingSemi-retired data analyst
200 pages of Julia Brumfield's 1923 diary in nine months
No relation to diarist
Great uncle was diarist's letter carrier, also named Nat Wooding
Why Crowdsource?
Free Labor!
Why Crowdsource?
Free Labor!
Free as in beerFree as in speechFree as in....
Free as in puppy!
http://www.flickr.com/photos/magnusbrath/7614518858/
Why Crowdsource?
At its best, crowdsourcing is not about getting someone to do work for you, it is about offering your users the opportunity to participate in public memory. Trevor Owens, Crowdsourcing Cultural Heritage: The Objectives are Upside-down
Why Crowdsource?
By engaging the public in digitising our collections, we are
Increasing the scientific literacy of the public
Providing increased access to our collections
Building an advocacy network for our collections and our institutions.
Paul Flemons, Australian Museum
Why Crowdsource?
Convert website visitors into volunteers
Convert volunteers into advocates
What's next?
Questions?
Choosing a Transcription Platform
The good news:More than 30 tools to choose from!
Choosing a Transcription Platform
The good news:More than 30 tools to choose from!
The bad news:More than 30 tools to choose from!
Selection Factors
Source Material Transcript Purpose Organizational/Project Management Fit Financial and Technical Resources
Source Material
Is it of interest to anyone else? Is it under copyright? Does it need restricted access? Is it composed of text or records? How complex is the layout? How important is that layout?
Purpose
How will you be using the transcribed data?Traditional print editions
Searchable online editions
Do you want to use the system to analyze the text?
Do you need to import the transcripts into other systems?
Is public engagement the only goal?
Organizational Fit
How important is traditional editorial workflow?
Will you rely on volunteers? How will you find and motivate them?
What is the duration of the project?
Is there a "final version"?
Is TEI a mandate?
Financial and Technical Resources
System administrators to install non-hosted software?
Money to pay hosting costs?
Programming skills to customize a tool?
Money to pay programmers for customization?
Support for on-going costs to keep the site running, however small?
The Tools
Recent (oldest started in 2005) Influenced by origin Still pretty raw Most require tech expertise for set-up and customization All require making trade-offs
http://tinyurl.com/TranscriptionToolGDoc
Open-source, On-site Tools
Scripto
Bentham Transcription Desk
NARA Transcribr Drupal Module
Zooniverse Scribe
Quick Definitions
MediaWiki: Popular software framework for runnning wiki projects
Wikipedia, Wikisource, Wiktionary, Wikitravel: Projects running on MediaWiki
WikiMedia: Organization running manybut not allMediaWiki-based wiki projects.
Hosted Tools
Virtual Transcription LaboratoryWikisource.orgFromThePage.com
Virtual Transcription Laboratory
Virtual Transcription Laboratory
Wikisource
Live demo of State Library of Queensland on Wikisource showing project page, edit screen, and editorial workflow.
Recommendation of Lori and the GLAMWiki group to help organizations navigate the community.
FromThePage
Live demo of FromThePage showing edit screen, wiki-linking a single term, read pages for a subject, full-text search on name variants, and auto-link.
Thanks!
Ben [email protected]@benwbrum
http://manuscripttranscription.blogspot.com
My transcription tools:FromThePage.com
OpenSourceIndexing.org
http://tinyurl.com/TranscriptionToolGDoc
Click to edit the title text format
Top Related