Archives, algorithms and people

Tristan Ferne / @tristanfExecutive Producer

BBC Research & Development

Archives, algorithms and peopleor

How we put the BBC World Service radio archive online using machines and

crowdsourcing

The BBC World Service archive

1947-2012

Spelling mistake

Missing data

Sometimes incorrect dataNo semantic data

The missing metadata

How it works

Listening machines

Noisy transcripts

Algorithms

Algorithms and people

The prototype

worldservice.prototyping.bbc.co.uk

http://worldservice.prototyping.bbc.co.uk/


Show Synopsis editing version

worldservice.prototyping.bbc.co.uk



Machine learning

Results

70000tag edits

How much data?

1000synopsis edits

71000edits

36000listenableprogrammes

1mmachine tags

70000programmes

3000users

of programmes listened to36%

of programmes tagged21%

And four lost programmes

Tags are a large and sparse space

When is a tag correct?

When is a programme tagged completely?

How do you measure crowd-sourced data?

How good is the data?

Who does the work?

1 person = 30% of edits

10 people = 70% of edits

10% of people = 98% of edits

The shape of the archive

Places mentioned

Linking from the News

The Last Danish Christmas Broadcast

“Entirely in Danish”

We can significantly improve the data

It’s cost-effective with re-usable technology

A crowdsourcing approach

What we’ve learnt

How good are the machine tags?

How much crowdsourcing do you need?

When is your data good enough?

Open questions

worldservice.prototyping.bbc.co.ukwww.bbc.co.uk/rdgithub.com/bbrd

[email protected]@tristanf



https://github.com/bbcrd

mailto:[email protected]

Archives, algorithms and people

Technology

Transcript of Archives, algorithms and people