Linked Data, Free Pictures, and Markets For Semantic Data
-
Upload
paul-houle -
Category
Technology
-
view
1.684 -
download
0
description
Transcript of Linked Data, Free Pictures, and Markets For Semantic Data
Linked Data, Free Pictures and Markets for Semantic Data
Paul [email protected]
Overview
the New taxonomy
Overview
the New taxonomyFreebase and DBpedia
Overview
the New taxonomyfreebase and DBpedia
the social-semantic ecosystem
Overview
the New taxonomyfreebase and DBpedia
the semantic-social ecosystemcommonsense knowledge in practice
Overview
the New taxonomyfreebase and DBpedia
the semantic-social ecosystemcommonsense knowledge in practice
the economics of semantic data
Overview
the New taxonomyfreebase and DBpedia
collecting picturesthe semantic-social ecosystem
commonsense knowledge in practicethe economics of semantic data
proof and trust
virtuous circle
People Use Images
Links
TrafficRevenue
Get Content
animalphotos.info
Scientific Classification of Animals
Vernacular Taxonomy for Animals
Mammals
Primates Rodents Others
Birds
<http://dbpedia.org/resource/Gear>
automating the process
dbpedia flickr
Identify topics search for candidates filter correct images
describe images
amazon mechanical turk
carpictures.cc
201220112010200920082007 …199019891988198719861985
AcuraAlfa RomeoAston MartinAudiBentleyBMW …ScionSubaruSuzukiToyotaVolkswagenVolvo
201220112010200920082007 …199019891988198719861985
CCCC 4MotionEosGtiJettaJetta SportWagenNew BeetleNew Beetle ConvertiblePassatPassat WagonRoutan FWDTiguan 4motionTourareg
201220112010200920082007 …199019891988198719861985
AcuraAlfa RomeoAston MartinAudiBentleyBMW …ScionSubaruSuzukiToyotaVolkswagenVolvo
201220112010200920082007 …199019891988198719861985
AcuraAlfa RomeoAston MartinAudiBentleyBMW …ScionSubaruSuzukiToyotaVolkswagenVolvo
CCCC 4MotionEosGtiJettaJetta SportWagenNew BeetleNew Beetle ConvertiblePassatPassat WagonRoutan FWDTiguan 4motionTourareg
6 speed automatic
5 speed manual
Chevrolet Honda Volkswagen
Civic ElementAccordS360 FCX
Constructed Taxonomy
Good Category…
…Bad Category
Wikipedia Categories
“data wiki” -> better data quality
ny-pictures.com
geospatial selection + Wikipedia graph
The only way is no way…
The only limits are no limits…
The only taxonomy is no taxonomy…
network “taxonomy”people
placesinventions
creative works
life forms
What’s out there?Type Count
Person 1,035,529
Location 707,679
Organism Classification 192,632
Organization 177,999
Music Album 118,568
Film 76,681
Structure 74,061
Event 73.992
Written Work 51,937
TV Program 30,094
Fictional Character 29,461
Celestial Object 24,174
Ship 23,006
ookaboo.com
User contributed content
ookaboo semantic API <http://dbpedia.org/resource/Thailand>
API
Thanks: Andyindia, Echiner1, Rene Eherhardt
social-semantic ecosystem
linked data
linked data
human contributions
linked data
human contributions
other online communities
linked data
human contributions
other online communities
knowledge engineering
Text Analysis
Text Analysis
Text Analysis
Car Image CC-BY from http://www.flickr.com/photos/aharden/2618801756/
Text Analysis
commonsense logic?
Number of Facts
Cyc: 3 million Freebase: 600 million
Number of Concepts
SUMO: 1000, DBpedia: 3.9 millionWordNet: 118,000 Freebase: 23 million
Number of Facts
Cyc: 3 million Freebase: 600 million
Number of Concepts
SUMO: 1000, Wikipedia: 3.9 millionWordNet: 118,000 Freebase: 23 million
critical mass?
“Any brain, machine or other thing that has a mind must be composed of smaller things that cannot think at all”
Marvin Minsky
Saturn1
Rome
Deity
Mythology
Saturn1
Rome
Deity
Mythology
Saturn2
Planet
Rings
Astronomy
Saturn1
Rome
Deity
Mythology
Saturn2
Planet
Rings
Astronomy
Saturn1
Rome
Deity
Mythology
Saturn2
Planet
Rings
Astronomy
autocompletion
ad-hoc SPARQL query
a database of names…
… plus subjective importance
yankees vs. red sox
yankees vs. red sox
carbon vs. silicon
yankees vs. red sox
carbon vs. silicon
aerosmith vs. the ramones
yankees vs. red sox
carbon vs. silicon
aerosmith vs. the ramones
Jeopardy vs. family feud
the airports query
Airports in English
空港 の 日本語
A cautionary tale
time
advertisingrevenue
“I know it when I see it”- Supreme Court Justice Potter Stuart
50 offensive categories
50 offensive categories
1000 offensive topics
50 offensive categories
1000 offensive topics
1800 offensive images
50 offensive categories
1000 offensive topics
1800 offensive images
950,000 good images
950,000 good images
99.81% accuracy isn’t good enough!
99.81% accuracy isn’t good enough!
Hyperprecision!
Publishing Knowledge
SPARQL Endpoint Dereferencing
API RDF Dump
Thanks: andrefontana, Isakkk, laynaaa
Clip art licensed from the Clip Art Gallery on DiscoverySchool.com
Dereferencing
Dereferencing<http://rdf.freebase.com/ns/en.graphene>
Dereferencing<http://rdf.freebase.com/ns/en.graphene>
http GET
Dereferencing<http://rdf.freebase.com/ns/en.graphene>
http GET
fbase:en.graphene a fbase:common.topic , fbase:award.award_winning_work , fbase:law.invention; fbase:award.award_winning_work.awards_won fbase:m.0dg75z8 ; fbase:common.topic.article fbase:m.03p5rz ; fbase:common.topic.image fbase:m.089q2k3 , fbase:m.02f5b7f , fbase:m.041wl9z ; fbase:law.invention.inventor fbase:en.andre_geim ...
Thanks: Thomas Shahan
Publishing Knowledge
API RDF Dump
Ookaboo RDF Dump
Metadata for 950,000 Pictures
500,000+ topics
630 MB
50 million facts
Two Challenges
Ookaboo needs better tools to build navigation
Customers need tools to find concepts
:BaseKB is free under CC-BY
Not so “big” …
:BaseKB is 2.8 GB:BaseKB is free under CC-BY
:BaseKB takes an 1 hour to load on a workstation PC
… but very complex
:BaseKB has 11,361 types and 102,949 properties“A isPartOf B” can be expressed in 139 different ways!
Photo credit: http://commons.wikimedia.org/wiki/User:Evan-Amos
RDF Database
N-Triples is compatible with…
RDF Database
awk, sed, grep, …
N-Triples is compatible with…
RDF Database
awk, sed, grep, …
Hadoop
N-Triples is compatible with…
RDF Database
awk, sed, grep, …
Hadoop
Lucene (SIREn)
N-Triples is compatible with…
Data Quality
Data Quality
Quality Perimeter
Repairing Folksonomic Trees
Repairing Folksonomic Trees
Operations ETL Data Warehouse Analytics
Enterprise Data Warehousing
Operations ETL Data Warehouse Analytics
Enterprise Data Warehousing
Knowledge-Based System
Linked Data ETL Data Warehouse Operations
“Businesses often spend five to 10 times more money to correct their data after it is entered into the system than they would have if they had headed the problems off at the source.”
- Larry P. English, Information Impact International
Data Quality Economics
Assume 25 Consumers
Consumers Clean25 x $N = $25 N
Publisher Cleans1 x $N = $N
Reusable Knowledge Baseeffect on schedule
decision point
build
adopt
develop knowledge base
develop knowledge base
time
Build Knowledge
Base
Develop Profitable
Applications
Get Feedback And Revenue
Linked Data Business Models
Free Shared Vocabulary Enables Interconnection……but the profit motive spurs investment to create quality data.
Trust
Proof
Publishers
Publishers
Consumers
A Market in Common Sense
Ookaboo: Free Pictures of Everything On Earth
5000 Topics That Aren’t Safe For Kids
How to conjugate verbs and use the correct article
What is similar to this?
What is this document about?
How well known is this?
… big and ambitious systems
Paul [email protected]