Schema.org and One Hundred Years of Search

44
'Schema.org and One Hundred Years of Search' Libraries, Media and The Semantic Web BBC Academy, March 28th 2012, London Dan Brickley <[email protected] > Friday, March 30, 2012

description

A talk from London SemWeb meetup hosted by the BBC Academy in London, Mar 30 2012. Video of the talk: http://www.youtube.com/watch?v=_-6mhdjE1XE See also http://www.meetup.com/LondonSWGroup/events/56987682/ http://openglam.org/2012/03/29/libraries-media-and-the-semantic-web-event-at-the-bbc/ https://twitter.com/#!/kansandhaus/status/185064835694862337

Transcript of Schema.org and One Hundred Years of Search

Page 1: Schema.org and One Hundred Years of Search

'Schema.org and One Hundred Years of Search'

Libraries, Media and The Semantic Web

BBC Academy, March 28th 2012, London

Dan Brickley <[email protected]>

Friday, March 30, 2012

Page 2: Schema.org and One Hundred Years of Search

In 20 minutes

• Introduce you to the schema.org initiative

• Revisit 'the Web before the Web' of 1912

• Use this to describe what's new with schema.org, ... and the practical choices we face when scaling to billions of users and pages

Friday, March 30, 2012

Page 3: Schema.org and One Hundred Years of Search

Intro: Dan Brickley

• Ex-W3C, helped start Semantic Web project

• Worked on RDF/S, FOAF, SKOS & other standards around W3C

• Currently [email protected] working on <http://schema.org/> project

• See also <http://danbri.org/>, @danbri

Friday, March 30, 2012

Page 4: Schema.org and One Hundred Years of Search

Back to 1912

Friday, March 30, 2012

Page 5: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 6: Schema.org and One Hundred Years of Search

■ The Republic of China is proclaimed.■ Albert Berry makes the first parachute jump from a moving airplane.■ Prague Party Conference: Vladimir Lenin and the Bolshevik Party

break away from the rest of the Russian Social Democratic Labour Party.

■ France establishes a protectorate over Morocco.■ RMS Titanic strikes an iceberg in the northern Atlantic Ocean. ■ Paramount Pictures, the oldest American motion picture studio still in

operation, is founded■ Albania declares independence from the Ottoman Empire.■ First Balkan War■ Alan Turing, British mathematician is born

■ Semantic search over structured data goes mainstream, in Belgium.

source: http://en.wikipedia.org/wiki/1912

Friday, March 30, 2012

Page 7: Schema.org and One Hundred Years of Search

Credit and thanks: W. Boyd RaywardFriday, March 30, 2012

Page 8: Schema.org and One Hundred Years of Search

Moteur Diesel. Philosophie des mathematiques. Les pecheries au Maroc et sur la cote d'Espagne. Finances Bulgares. Gyroscope. Culte de feu. Motocolture (garden). Evolution de la dent humaine. Emigration italienne. Casier civil. Chemin de fer de bagdad (railroad...). Planete Mars. Suffrage universel. Nevrose traumatique. Eugenism. Le saumon; Saumons manques et repeches. Boomerang. Fabrication del la cyanamide. Emigration des Juifs. Intoxications par le tabac. Quantite d'huile d'olive importee en Belgique. Jurisprudence des compagnies d'assurances en Angleterre, Hollande et Danemark...

Sample queries from 1912

Friday, March 30, 2012

Page 9: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 10: Schema.org and One Hundred Years of Search

Search before search

• Paul Otlet, "the man who dreamed the Internet", http://www.youtube.com/watch?v=fmsOI5SdLkE

• "The International Centre organises collections of world-wide importance. These collections are the International Museum, the International Library, the International Bibliographic Catalogue and the Universal Documentary Archives. These collections are conceived as parts of one universal body of documentation, as an encyclopedic survey of human knowledge, as an enormous intellectual warehouse of books, documents, catalogues and scientific objects."

• Start at http://en.wikipedia.org/wiki/Mundaneum for full whole story

Friday, March 30, 2012

Page 11: Schema.org and One Hundred Years of Search

Libraries, media & ...?

• Universal Decimal Classification (UDC) used in many 1000s of libraries today

• In BBC archive for 40 years, as 'Lonclass'

• Shows the challenge and promise of structured description

• So what's in Lonclass? What's not in Lonclass!

Friday, March 30, 2012

Page 12: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 13: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 14: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 15: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 16: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 17: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 18: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 19: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 20: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 21: Schema.org and One Hundred Years of Search

Lonclass by example

• R672:32.007(47)YELTSIN:342.518.1THATCHER “TWO SHOTS OF MARGARET THATCHER AND BORIS YELTSIN”

• [BRITISH AEROSPACE].007.11PEARCE: 656.881:342.518.1THATCHER “LETTER TO MRS THATCHER FROM SIR AUSTIN PEARCE”

• 656.881:301.162.721:32.007THATCHER: 654.192.731TV-AM “MARGARET THATCHER'S LETTER OF APOLOGY TO TV AM”

Friday, March 30, 2012

Page 22: Schema.org and One Hundred Years of Search

Compositional Semantics

• 656.881:301.162.721 “LETTERS OF APOLOGY”

• 656.881 “LETTERS (POSTAL SERVICES)”

• 656.881:06.022.6 “RESIGNATION LETTERS”

• 654.192.731TV-AM “TV AM (TELEVISION AM)”

(this work pre-dated modern linguistics, never mind computing...)

Friday, March 30, 2012

Page 23: Schema.org and One Hundred Years of Search

Archives and classification

• Lonclass tells a story of the world; of this country at least; and a lot about the rest

• It is huge - 1000s of terms, composite sentence-like codes, and rather sparse

• It began with UDC in 1890s, and remains key to BBC's media archives even today

Friday, March 30, 2012

Page 24: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 25: Schema.org and One Hundred Years of Search

And now for something new.

Friday, March 30, 2012

Page 26: Schema.org and One Hundred Years of Search

Schema.org

• Search engine collaboration:

• Google, Bing, Yahoo! & Yandex

• Simple factual data for better search

• Launched June 2011, schema.org schema

• 300 classes, 261 properties & growing

• discussions: W3C WebSchemas group

Friday, March 30, 2012

Page 27: Schema.org and One Hundred Years of Search

Example: Google Rich Snippets

See also Yandex's http://webmaster.yandex.ru/microtest.xml

From: http://www.google.com/webmasters/tools/richsnippets

Friday, March 30, 2012

Page 28: Schema.org and One Hundred Years of Search

<div id="content-2-wide" itemscope itemtype="http://schema.org/CreativeWork">

<div class="star-box" itemprop="aggregateRating" itemscope itemtype="http://schema.org/AggregateRating">

<div class="txt-block"> <h4 class="inline">Stars:</h4> <a onclick="(new Image()).src='/rg/title-overview/star-1/images/b.gif?link=%2Fname%2Fnm0010930%2F';" href="/name/nm0010930/" itemprop="actors">Douglas Adams</a>, <a onclick="(new Image()).src='/rg/title-overview/star-2/images/b.gif?link=%2Fname%2Fnm0048982%2F';" href="/name/nm0048982/" itemprop="actors">Tom Baker</a> and <a onclick="(new Image()).src='/rg/title-overview/star-3/images/b.gif?link=%2Fname%2Fnm3035100%2F';" href="/name/nm3035100/" itemprop="actors">Hans Peter Brondmo</a></div>

Linked Data: see http://www.imdb.com/name/nm0010930/ for schema.org markup describing Douglas Adams as a http://schema.org/Person (jobTitle, birthDate, description, performerIn, ...).

On IMDB:

Friday, March 30, 2012

Page 29: Schema.org and One Hundred Years of Search

What’s in the schema?

• Classes (types) e.g. LocalBusiness, Person, Organization, VideoObject, TVSeries...

• Properties (attributes) e.g. openingHours, transcript, productionCompany, streetAddress

• That’s all - a dictionary of terms, used for annotating data within normal Web pages

Friday, March 30, 2012

Page 30: Schema.org and One Hundred Years of Search

event

place

intangibleLocalBusiness

Organization

CivicStructure

CreativeWork

Landform

UserInteraction

Friday, March 30, 2012

Page 31: Schema.org and One Hundred Years of Search

Another example:

Friday, March 30, 2012

Page 32: Schema.org and One Hundred Years of Search

<div itemscope itemtype="http://schema.org/Restaurant">

<span itemprop="name">GreatFood</span>

<div itemprop="address" itemscope itemtype="http://schema.org/PostalAddress"> <span itemprop="streetAddress">1901 Lemur Ave</span> <span itemprop="addressLocality">Sunnyvale</span>, <span itemprop="addressRegion">CA</span> <span itemprop="postalCode">94086</span> </div> <span itemprop="telephone">(408) 714-1489</span> <a itemprop="url" href="http://www.dishdash.com">www.greatfood.com</a>

Hours: <meta itemprop="openingHours" content="Mo-Sa 11:00-14:30">Mon-Sat 11am-2:30pm <meta itemprop="openingHours" content="Mo-Th 17:00-21:30">Mon-Thu 5pm-9:30pm <meta itemprop="openingHours" content="Fr-Sa 17:00-22:00">Fri-Sat 5pm-10:00pm

Categories: <span itemprop="servesCuisine">Middle Eastern</span>, <span itemprop="servesCuisine">Mediterranean</span>

</div>

Friday, March 30, 2012

Page 33: Schema.org and One Hundred Years of Search

Schema.org scope

• In-page structured data for search

• Not asking an unconstrained “so, how do we describe cars?”, but “how can we improve markup on existing pages that describe cars?” (or Comics, SoftwareApps, Sports, ...)

• Simplify publisher/webmaster experience

• Record agreements between search engines

• Central use case: augmented search results

Friday, March 30, 2012

Page 34: Schema.org and One Hundred Years of Search

Friday, March 30, 2012

Page 35: Schema.org and One Hundred Years of Search

Schema.org and UDC• In many ways the opposite of UDC

• Small (by contrast), pragmatic, Web-based

• Yet by Semantic Web standards and culture, it is a big 'centralised' schema

• The art is finding ways to decentralise without creating chaos

• We don't want to re-invent UDC, or Wikipedia; but integrate such things into simple descriptive templates for search

Friday, March 30, 2012

Page 36: Schema.org and One Hundred Years of Search

Lots missing! e.g. sports

• Current vocabulary emphasizes 'points of interest' on a map and sporting activities rather than sports content 'as entertainment'

• We also have terms to describe videos, TV shows etc., ...but no sports-specifics yet

• How deep to go? How to integrate with existing vocabulary? How to identify players, teams, kinds of 'football'? Video clips for that 'hand of God' goal?

Friday, March 30, 2012

Page 37: Schema.org and One Hundred Years of Search

Job postings (done), rNews(done), Comics, Learning, ScholaryArticle, Software, Events, Genealogy, Real Estate, eCommerce, Health, Sports, Transport, Vehicles, Comments, Datasets, Bio, ... (+bugfixes, integration, ...)

http://www.w3.org/wiki/WebSchemas/SchemaDotOrgProposals

Friday, March 30, 2012

Page 38: Schema.org and One Hundred Years of Search

Everything overlaps

• We added JobPosting; what if the job was sports-related?

• We're adding educational markup; does it help describe sports education, training?

• Is there a sports perspective on the health/medical vocabulary we're working on?

• Can't coordinate everything! Pragmatism...

* 'intertwingularity'

*

Friday, March 30, 2012

Page 39: Schema.org and One Hundred Years of Search

Practicalities

• Delegation to external sources for enumerations and detail

• e.g. country codes from UN FAO or Wikipedia/DBpedia/Wikidata

• We don’t want to create big enumerations

• all the countries? sports? things that go on maps?

• Decentralised subclassing & property values

Friday, March 30, 2012

Page 40: Schema.org and One Hundred Years of Search

Process

• Search partners retain ultimate oversight

• W3C hosts community group, discussion, wiki and proposal tracking

• Web Schemas group - planning monthly telecons at W3C, based around proposals

• Evolving, pragmatic, collaborative

Friday, March 30, 2012

Page 41: Schema.org and One Hundred Years of Search

Compositional Semantics revisited• If we have SportsCentre and Karate, we

can we describe a Karate Club?

• If we have recipes vocab, and medical vocab, and restaurants, can we describe allergy free food?

• If UN have country codes, Wikipedia list religions, ... then we just re-use those

Friday, March 30, 2012

Page 42: Schema.org and One Hundred Years of Search

And libraries

• If the library world share their controlled vocabularies as open SKOS linked data

• ...can we plug them directly into schema.org descriptions?

• of videos? news? scholarly articles? (yes)

• Why re-invent when you can collaborate?

Friday, March 30, 2012

Page 43: Schema.org and One Hundred Years of Search

WebSchemas public-vocabs list

• Schema.org process

• Looking for rough consensus and incremental improvements

• Realistic examples, simplicity for publishers, and re-use of existing vocabulary are important

• <http://www.w3.org/wiki/WebSchemas/>

Friday, March 30, 2012

Page 44: Schema.org and One Hundred Years of Search

Friday, March 30, 2012