Archives, algorithms and people
-
Upload
tristan-ferne -
Category
Technology
-
view
244 -
download
0
description
Transcript of Archives, algorithms and people
Tristan Ferne / @tristanfExecutive Producer
BBC Research & Development
Archives, algorithms and peopleor
How we put the BBC World Service radio archive online using machines and
crowdsourcing
The BBC World Service archive
1947-2012
Spelling mistake
Missing data
Sometimes incorrect dataNo semantic data
The missing metadata
How it works
Listening machines
Noisy transcripts
Algorithms
Algorithms and people
The prototype
worldservice.prototyping.bbc.co.uk
Show Synopsis editing version
worldservice.prototyping.bbc.co.uk
Machine learning
Results
70000tag edits
How much data?
1000synopsis edits
71000edits
36000listenableprogrammes
1mmachine tags
70000programmes
3000users
of programmes listened to36%
of programmes tagged21%
And four lost programmes
Tags are a large and sparse space
When is a tag correct?
When is a programme tagged completely?
How do you measure crowd-sourced data?
How good is the data?
Who does the work?
1 person = 30% of edits
10 people = 70% of edits
10% of people = 98% of edits
The shape of the archive
Places mentioned
Linking from the News
The Last Danish Christmas Broadcast
“Entirely in Danish”
We can significantly improve the data
It’s cost-effective with re-usable technology
A crowdsourcing approach
What we’ve learnt
How good are the machine tags?
How much crowdsourcing do you need?
When is your data good enough?
Open questions
worldservice.prototyping.bbc.co.ukwww.bbc.co.uk/rdgithub.com/bbrd
[email protected]@tristanf