Let the Public and the Computer do the Metadata Work!

Post on 22-Jan-2018

26 views 2 download

Transcript of Let the Public and the Computer do the Metadata Work!

Karen Cariani

AAPB Project Director, WGBHSenior Director, WGBH Media Library & Archives

Let the Computer,

and the Public,

do the Metadata Work!

The Library of Congress

Packard Campus for Audio Visual Conservation

American Archive of Public Broadcasting

WGBH Educational Foundation

American Archive of Public Broadcasting

the situation

72,000 digitized television and radio programs

incomplete, inaccurate metadata records

limited staff resources

we need to know what we have in the collection

we have a responsibility to users to provide access to the collection

continued growth of the collection (content and sparse metadata)

the potential: transforming content into data

• Computational Tools

• Speech-to-text

• Audio analysis

• Image Analysis

• Visualization of Data

How can we use them?

a crowdsourcing gamefixit.americanarchive.org

Casey Davis KaufmanAssociate Director, WGBH Media Library and Archives

Project Manager, AAPB

AV crowdsourcing precedents

TiltFactor @ Dartmouth:

“Metadata Games”

New York Public Library’s Together

We Listen project & Transcript Editing

Tool

Netherlands Institute for Sound

and Vision

user population

General publicPublic media

fansK-12 students

Senior CitizensPeople seeking

to develop editing skills

People seeking volunteer

opportunities

game pipeline

Identify errors

1

Suggest corrections

2

Validate corrections

3

game improvement targets

Change algorithm and game pipeline to get transcripts through the game quicker

Update Rules page to allow more leniency in corrections. Communicate that we’re looking for acceptable corrections, not perfection.

Add ability for AAPB staff to prioritize transcripts in the game

Remove the preferences feature

Update API to help AAPB staff determine more easily which transcripts are ready to come out of the game.

lessons learned

• Ensure that all team members understand the overall goals of the project from the beginning

• Ensure that all relevant team members are involved in developing the game flow concepts and API

• Stay involved in all decision-making – don’t trust that the developers/contractors will make all the right decisions

• Test, test, test!!

once corrected…

JSON transcripts will be stored on AAPB’s

Amazon S3 account

Transcripts will be indexed for keyword

searching on the AAPB website

Transcripts will be made available alongside the

media on the record page

Transcripts can play as captions within the

player

Transcripts can be harvested via an API

and used as a dataset for research such as a

digital humanities project

usability & ux research questions

Do users understand the workflow of the game?

Do users understand the iconography?

How do users feel about interacting with random transcripts rather than

choosing a specific transcript to work on?

How do users feel about interacting with small bits of transcripts rather than a full transcript at once?

What is the overall user experience when playing

the game?

What is the overall satisfaction level in playing the game?

future plans

facebook.com/amarchivepub

@amarchivepub

americanarchive.org

http://fixit.americanarchive.org

#FixItAAPB

Come to our

editathon!

Friday, 5:45 – 6:45

pm

Room: Arcadian I

Treats and prizes!