Linked Data, Free Pictures, and Markets For Semantic Data

Post on 18-May-2015

1.684 views 0 download

Tags:

description

Ookaboo is a collection of about 1,000,000 Creative Commons images gathered from social media to 500,000 Linked Data concepts from Freebase and DBpedia. Ookaboo’s semantic API and RDF dump let applications connect topic such as people, places, species and things to free pictures with almost perfect precision. To create Ookaboo’s photo collection and user interface, I had to extensively clean Linked Data and construct a knowledge base about “commonsense” topics such as grammar, the relative importance of things, offensiveness, and the categorization and naming of things. Had this knowledge been commercially available, I could have more time acquiring images and building a community. Although free Linked Data defines a shared vocabulary that enables interoperation, next generation text analysis, data integration, and content generation systems will depend on reusable knowledge bases that take resources and specialized skills to create – a market in semantic data will fill this need.

Transcript of Linked Data, Free Pictures, and Markets For Semantic Data

Linked Data, Free Pictures and Markets for Semantic Data

Paul Houlepaul@ontology2.com

Overview

the New taxonomy

Overview

the New taxonomyFreebase and DBpedia

Overview

the New taxonomyfreebase and DBpedia

the social-semantic ecosystem

Overview

the New taxonomyfreebase and DBpedia

the semantic-social ecosystemcommonsense knowledge in practice

Overview

the New taxonomyfreebase and DBpedia

the semantic-social ecosystemcommonsense knowledge in practice

the economics of semantic data

Overview

the New taxonomyfreebase and DBpedia

collecting picturesthe semantic-social ecosystem

commonsense knowledge in practicethe economics of semantic data

proof and trust

virtuous circle

People Use Images

Links

TrafficRevenue

Get Content

animalphotos.info

Scientific Classification of Animals

Vernacular Taxonomy for Animals

Mammals

Primates Rodents Others

Birds

<http://dbpedia.org/resource/Gear>

automating the process

dbpedia flickr

Identify topics search for candidates filter correct images

describe images

amazon mechanical turk

carpictures.cc

201220112010200920082007 …199019891988198719861985

AcuraAlfa RomeoAston MartinAudiBentleyBMW …ScionSubaruSuzukiToyotaVolkswagenVolvo

201220112010200920082007 …199019891988198719861985

CCCC 4MotionEosGtiJettaJetta SportWagenNew BeetleNew Beetle ConvertiblePassatPassat WagonRoutan FWDTiguan 4motionTourareg

201220112010200920082007 …199019891988198719861985

AcuraAlfa RomeoAston MartinAudiBentleyBMW …ScionSubaruSuzukiToyotaVolkswagenVolvo

201220112010200920082007 …199019891988198719861985

AcuraAlfa RomeoAston MartinAudiBentleyBMW …ScionSubaruSuzukiToyotaVolkswagenVolvo

CCCC 4MotionEosGtiJettaJetta SportWagenNew BeetleNew Beetle ConvertiblePassatPassat WagonRoutan FWDTiguan 4motionTourareg

6 speed automatic

5 speed manual

Chevrolet Honda Volkswagen

Civic ElementAccordS360 FCX

Constructed Taxonomy

Good Category…

…Bad Category

“data wiki” -> better data quality

ny-pictures.com

geospatial selection + Wikipedia graph

The only way is no way…

The only limits are no limits…

The only taxonomy is no taxonomy…

network “taxonomy”people

placesinventions

creative works

life forms

What’s out there?Type Count

Person 1,035,529

Location 707,679

Organism Classification 192,632

Organization 177,999

Music Album 118,568

Film 76,681

Structure 74,061

Event 73.992

Written Work 51,937

TV Program 30,094

Fictional Character 29,461

Celestial Object 24,174

Ship 23,006

ookaboo.com

User contributed content

ookaboo semantic API <http://dbpedia.org/resource/Thailand>

API

Thanks: Andyindia, Echiner1, Rene Eherhardt

social-semantic ecosystem

linked data

linked data

human contributions

linked data

human contributions

other online communities

linked data

human contributions

other online communities

knowledge engineering

Text Analysis

Text Analysis

Text Analysis

Car Image CC-BY from http://www.flickr.com/photos/aharden/2618801756/

Text Analysis

commonsense logic?

Number of Facts

Cyc: 3 million Freebase: 600 million

Number of Concepts

SUMO: 1000, DBpedia: 3.9 millionWordNet: 118,000 Freebase: 23 million

Number of Facts

Cyc: 3 million Freebase: 600 million

Number of Concepts

SUMO: 1000, Wikipedia: 3.9 millionWordNet: 118,000 Freebase: 23 million

critical mass?

“Any brain, machine or other thing that has a mind must be composed of smaller things that cannot think at all”

Marvin Minsky

Saturn1

Rome

Deity

Mythology

Saturn1

Rome

Deity

Mythology

Saturn2

Planet

Rings

Astronomy

Saturn1

Rome

Deity

Mythology

Saturn2

Planet

Rings

Astronomy

Saturn1

Rome

Deity

Mythology

Saturn2

Planet

Rings

Astronomy

autocompletion

ad-hoc SPARQL query

a database of names…

… plus subjective importance

yankees vs. red sox

yankees vs. red sox

carbon vs. silicon

yankees vs. red sox

carbon vs. silicon

aerosmith vs. the ramones

yankees vs. red sox

carbon vs. silicon

aerosmith vs. the ramones

Jeopardy vs. family feud

the airports query

Airports in English

空港 の 日本語

A cautionary tale

time

advertisingrevenue

“I know it when I see it”- Supreme Court Justice Potter Stuart

50 offensive categories

50 offensive categories

1000 offensive topics

50 offensive categories

1000 offensive topics

1800 offensive images

50 offensive categories

1000 offensive topics

1800 offensive images

950,000 good images

950,000 good images

99.81% accuracy isn’t good enough!

99.81% accuracy isn’t good enough!

Hyperprecision!

Publishing Knowledge

SPARQL Endpoint Dereferencing

API RDF Dump

Thanks: andrefontana, Isakkk, laynaaa

Clip art licensed from the Clip Art Gallery on DiscoverySchool.com

Dereferencing

Dereferencing<http://rdf.freebase.com/ns/en.graphene>

Dereferencing<http://rdf.freebase.com/ns/en.graphene>

http GET

Dereferencing<http://rdf.freebase.com/ns/en.graphene>

http GET

fbase:en.graphene a fbase:common.topic , fbase:award.award_winning_work , fbase:law.invention; fbase:award.award_winning_work.awards_won fbase:m.0dg75z8 ; fbase:common.topic.article fbase:m.03p5rz ; fbase:common.topic.image fbase:m.089q2k3 , fbase:m.02f5b7f , fbase:m.041wl9z ; fbase:law.invention.inventor fbase:en.andre_geim ...

Thanks: Thomas Shahan

Publishing Knowledge

API RDF Dump

Ookaboo RDF Dump

Metadata for 950,000 Pictures

500,000+ topics

630 MB

50 million facts

Two Challenges

Ookaboo needs better tools to build navigation

Customers need tools to find concepts

:BaseKB is free under CC-BY

Not so “big” …

:BaseKB is 2.8 GB:BaseKB is free under CC-BY

:BaseKB takes an 1 hour to load on a workstation PC

… but very complex

:BaseKB has 11,361 types and 102,949 properties“A isPartOf B” can be expressed in 139 different ways!

Photo credit: http://commons.wikimedia.org/wiki/User:Evan-Amos

RDF Database

N-Triples is compatible with…

RDF Database

awk, sed, grep, …

N-Triples is compatible with…

RDF Database

awk, sed, grep, …

Hadoop

N-Triples is compatible with…

RDF Database

awk, sed, grep, …

Hadoop

Lucene (SIREn)

N-Triples is compatible with…

Data Quality

Data Quality

Quality Perimeter

Repairing Folksonomic Trees

Repairing Folksonomic Trees

Operations ETL Data Warehouse Analytics

Enterprise Data Warehousing

Operations ETL Data Warehouse Analytics

Enterprise Data Warehousing

Knowledge-Based System

Linked Data ETL Data Warehouse Operations

“Businesses often spend five to 10 times more money to correct their data after it is entered into the system than they would have if they had headed the problems off at the source.”

- Larry P. English, Information Impact International

Data Quality Economics

Assume 25 Consumers

Consumers Clean25 x $N = $25 N

Publisher Cleans1 x $N = $N

Reusable Knowledge Baseeffect on schedule

decision point

build

adopt

develop knowledge base

develop knowledge base

time

Build Knowledge

Base

Develop Profitable

Applications

Get Feedback And Revenue

Linked Data Business Models

Free Shared Vocabulary Enables Interconnection……but the profit motive spurs investment to create quality data.

Trust

Proof

Publishers

Publishers

Consumers

A Market in Common Sense

Ookaboo: Free Pictures of Everything On Earth

5000 Topics That Aren’t Safe For Kids

How to conjugate verbs and use the correct article

What is similar to this?

What is this document about?

How well known is this?

… big and ambitious systems

Paul Houlepaul@ontology2.com