Let the Public and the Computer do the Metadata Work!

33

Transcript of Let the Public and the Computer do the Metadata Work!

Page 1: Let the Public and the Computer do the Metadata Work!
Page 2: Let the Public and the Computer do the Metadata Work!

Karen Cariani

AAPB Project Director, WGBHSenior Director, WGBH Media Library & Archives

Let the Computer,

and the Public,

do the Metadata Work!

Page 3: Let the Public and the Computer do the Metadata Work!
Page 4: Let the Public and the Computer do the Metadata Work!

The Library of Congress

Packard Campus for Audio Visual Conservation

American Archive of Public Broadcasting

Page 5: Let the Public and the Computer do the Metadata Work!

WGBH Educational Foundation

American Archive of Public Broadcasting

Page 6: Let the Public and the Computer do the Metadata Work!
Page 7: Let the Public and the Computer do the Metadata Work!
Page 8: Let the Public and the Computer do the Metadata Work!

the situation

72,000 digitized television and radio programs

incomplete, inaccurate metadata records

limited staff resources

we need to know what we have in the collection

we have a responsibility to users to provide access to the collection

continued growth of the collection (content and sparse metadata)

Page 9: Let the Public and the Computer do the Metadata Work!

the potential: transforming content into data

• Computational Tools

• Speech-to-text

• Audio analysis

• Image Analysis

• Visualization of Data

How can we use them?

Page 10: Let the Public and the Computer do the Metadata Work!

a crowdsourcing gamefixit.americanarchive.org

Casey Davis KaufmanAssociate Director, WGBH Media Library and Archives

Project Manager, AAPB

Page 11: Let the Public and the Computer do the Metadata Work!

AV crowdsourcing precedents

TiltFactor @ Dartmouth:

“Metadata Games”

New York Public Library’s Together

We Listen project & Transcript Editing

Tool

Netherlands Institute for Sound

and Vision

Page 12: Let the Public and the Computer do the Metadata Work!
Page 13: Let the Public and the Computer do the Metadata Work!

user population

General publicPublic media

fansK-12 students

Senior CitizensPeople seeking

to develop editing skills

People seeking volunteer

opportunities

Page 14: Let the Public and the Computer do the Metadata Work!

game pipeline

Identify errors

1

Suggest corrections

2

Validate corrections

3

Page 15: Let the Public and the Computer do the Metadata Work!
Page 16: Let the Public and the Computer do the Metadata Work!
Page 17: Let the Public and the Computer do the Metadata Work!
Page 18: Let the Public and the Computer do the Metadata Work!
Page 19: Let the Public and the Computer do the Metadata Work!
Page 20: Let the Public and the Computer do the Metadata Work!
Page 21: Let the Public and the Computer do the Metadata Work!
Page 22: Let the Public and the Computer do the Metadata Work!
Page 23: Let the Public and the Computer do the Metadata Work!
Page 24: Let the Public and the Computer do the Metadata Work!
Page 25: Let the Public and the Computer do the Metadata Work!
Page 26: Let the Public and the Computer do the Metadata Work!

game improvement targets

Change algorithm and game pipeline to get transcripts through the game quicker

Update Rules page to allow more leniency in corrections. Communicate that we’re looking for acceptable corrections, not perfection.

Add ability for AAPB staff to prioritize transcripts in the game

Remove the preferences feature

Update API to help AAPB staff determine more easily which transcripts are ready to come out of the game.

Page 27: Let the Public and the Computer do the Metadata Work!

lessons learned

• Ensure that all team members understand the overall goals of the project from the beginning

• Ensure that all relevant team members are involved in developing the game flow concepts and API

• Stay involved in all decision-making – don’t trust that the developers/contractors will make all the right decisions

• Test, test, test!!

Page 28: Let the Public and the Computer do the Metadata Work!

once corrected…

JSON transcripts will be stored on AAPB’s

Amazon S3 account

Transcripts will be indexed for keyword

searching on the AAPB website

Transcripts will be made available alongside the

media on the record page

Transcripts can play as captions within the

player

Transcripts can be harvested via an API

and used as a dataset for research such as a

digital humanities project

Page 29: Let the Public and the Computer do the Metadata Work!

usability & ux research questions

Do users understand the workflow of the game?

Do users understand the iconography?

How do users feel about interacting with random transcripts rather than

choosing a specific transcript to work on?

How do users feel about interacting with small bits of transcripts rather than a full transcript at once?

What is the overall user experience when playing

the game?

What is the overall satisfaction level in playing the game?

Page 30: Let the Public and the Computer do the Metadata Work!

future plans

Page 31: Let the Public and the Computer do the Metadata Work!
Page 32: Let the Public and the Computer do the Metadata Work!
Page 33: Let the Public and the Computer do the Metadata Work!

facebook.com/amarchivepub

@amarchivepub

americanarchive.org

http://fixit.americanarchive.org

#FixItAAPB

Come to our

editathon!

Friday, 5:45 – 6:45

pm

Room: Arcadian I

Treats and prizes!