Bbc semantic

10
www.sti-innsbruck.at © Copyright 2008 STI INNSBRUCK www.sti- innsbruck.at Media Meets Semantic Web – How the BBC Uses DBpedia and Linked Data to Make Connections Georgi Kobilarov et. al. ESWC 2009

description

 

Transcript of Bbc semantic

Page 1: Bbc semantic

www.sti-innsbruck.at © Copyright 2008 STI INNSBRUCK www.sti-innsbruck.at

Media Meets Semantic Web – How the BBCUses DBpedia and Linked Data to Make

Connections

Georgi Kobilarov et. al. ESWC 2009

Page 2: Bbc semantic

www.sti-innsbruck.at

• BBC working to integrate data and linking documents across BBC domains

• Collaboration with Freie Universität Berlin, Rattle Research (and Ontotext)

• Semantic Web context: usage of Linked Data from MusicBrainz and DBpedia

2

Page 3: Bbc semantic

www.sti-innsbruck.at

Problem

• BBC publishes large amounts of online content text/videos/audio

• Mostly data for broadcast brands and domain specific microsites

• Division of its services by domain, e.g. food, music, news etc.

No interlinking between these domain specific sites – not using the full potential of available data

3

Page 4: Bbc semantic

www.sti-innsbruck.at

Objectives

• DBpedia to provide a common ”controlled” vocabulary and equivalency service, which in turn is used to add ”topic badges” to existing, legacy web pages

• Soft transition of the old to the new system

– Developing a new service that supports the branding of our Radio stations, TV channels and programmes (bbc.co.uk/programmes)

– Developing a new music offering (bbc.co.uk/music/beta) that builds on existing open web standards and is fully integrated with programme support service

– Simple navigational elements (i.e. Topic Badges and term extraction) to support contextual, semantic navigation

– Common set of web scale identifiers to help classify all BBC online content (and external URLs) and to help create equivalency between multiple vocabularies

4

Page 5: Bbc semantic

www.sti-innsbruck.at

Cross-Linking Legacy Content with Legacy Systems

• Desire to link to further BBC domains (apart from programmes and music)

– Through an about-relationship between programmes, people, places and subjects

• Data was created with a legacy auto-categorization system called CIS.

• CIS holds a hierarchy of terms in five main top-level classes: – Proper names– Subjects– Brands– Time periods – Places

Objects identified with /programmes and /music are also to be found within other domains: Mechanism to map between equivalent terms

Linking CIS Concepts to DBpedia

5

Page 6: Bbc semantic

www.sti-innsbruck.at

Linking BBC Domains

6

Page 7: Bbc semantic

www.sti-innsbruck.at

Linking BBC Domains

• DBpedia weighted Label Lookup using Wikipedia inter-article-links as weight indicator

– links(redirect)*log2(weight(article))

• Context-Based Disambiguation

– Disambiguate possible concept matches to identify similarity contexts of CIS terms by clustering matches and finding according contexts in DBpedia

7

Page 8: Bbc semantic

www.sti-innsbruck.at

Linking Documents to Concepts

• Named entity extraction system Muddy Boots

– Instead of solutions from OpenCalais, Twine and Zemanta because it reuses existing web identifiers, i.e. Wikipedia/Dbpedia URIs

• BBC News articles, recognize entities in those articles

• Use DBpedia identifier for those entities

• Content Link Tool to add or remove DBpedia identifiers from any given BBC URL

8

Page 9: Bbc semantic

www.sti-innsbruck.at

Create User Journeys:Topic Pages and Navigation Badges

• Topic pages

– Creation of aggregation pages of unstructured and structured content– Pull together the modeled world of BBC programmes (CIS identifiers mapped to

DBpedia) and unstructured world of BBC News articles

• Navigational Badges

– Once a user has entered an area of BBC content there are few links through to other related content

– Providing this link is the role of the navigation badge

9

Page 10: Bbc semantic

www.sti-innsbruck.at

Conclusions

• User experience in the center of BBC efforts

• Semantics as enabler

• What we can learn form the BBC

– User should be in the center of efforts– Pages not strictly structured according to domain model – Semantics primarily enable smart interlinking to additional content– Well hidden magic– Simplicity of domain models is beauty

• For more information refer to “Beyond the polar bear presentation”

– http://www.slideshare.net/reduxd/beyond-the-polar-bear

10