Semantic Searchmonkey
-
Upload
paul-tarjan -
Category
Technology
-
view
5.967 -
download
0
description
Transcript of Semantic Searchmonkey
![Page 1: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/1.jpg)
Monkey with the Semantic Web
![Page 2: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/2.jpg)
SearchMonkey
Presentation by:
Paul Tarjan, Chief Technical Monkey
Online at:
http://www.slideshare.net/ptarjan/semantic-searchmonkey
![Page 3: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/3.jpg)
The web was / is fragmented
University event page
Friend’s website
Cool bookmarks
Super secret military site
Funny pictures
![Page 4: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/4.jpg)
So we added search to find stuff
University event page
Friend’s website Cool
bookmarks
Super secret
military site
Funny pictures
Google Yahoo
![Page 5: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/5.jpg)
But there are many similar sites
Facebook Events Evite Events Upcoming Events
Youtube Metacafe Vimeo
Digg Reddit Technorati
Let’s treat these as “views” onto “objects”
![Page 6: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/6.jpg)
Wouldn’t it be cool if you could do:
• object:video creator:”Paul Tarjan” length<=60s
![Page 7: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/7.jpg)
Wouldn’t it be cool if you could do:
• object:video creator:http://paulisageek.com/ length<=60s
![Page 8: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/8.jpg)
Wouldn’t it be cool if you could do:
• object:game name:”Desktop Tower Defense” version:1.5 publishdate:”May 2 2005”
![Page 9: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/9.jpg)
Wouldn’t it be cool if you could do:
• object:video author:”The Escapist” game:”Left 4 Dead”
![Page 10: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/10.jpg)
It gets even cooler
![Page 11: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/11.jpg)
Aggregation:
• object:review type:camera make:canon model:D40
![Page 12: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/12.jpg)
Aggregation:
• object:event date:”May 16, 2008” type:party price<$5
![Page 13: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/13.jpg)
Aggregation:
• object:photo person:“Paul Tarjan”
![Page 14: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/14.jpg)
Aggregation:
• object:photo person:http://paulisageek.com
![Page 15: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/15.jpg)
The Semantic What?
• Web pages are views of data for people to read
• Search Engines are a hack • They treat pages as a bucket of words • Lets turn the web into a database • APIs are good, but there is no “web” of APIs • If you figure out a good way of doing that, let
me know
![Page 16: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/16.jpg)
Ok, I want to do it. Now what?
![Page 17: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/17.jpg)
Recommendation: µF
• If there is a microformat for your data, use it – hcard – hreview – hresume – hcalendar – rel-tag – rel-licence – xfn – hatom – geo
![Page 18: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/18.jpg)
µF in a nutshell
• Change your @class to something that is known • <div>
– <span class=“name”>Paul Tarjan</span> – <span class=‘email’>[email protected]</span>
• </div> • BECOMES • <div class=“vcard”>
– <span class=“fn”>Paul Tarjan</span> – <span class=“email”>[email protected]</span>
• </div>
![Page 19: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/19.jpg)
Recommendation: RDFa
• If you have data that doesn’t really fit in a µF
• Examples: – Markup APIs (YUI, javadoc, etc) – Media (Audios, Videos, Games, Presentations) – Job Postings
![Page 20: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/20.jpg)
RDFa in a nutshell
• Make a namespace • Use @property, @rel and @resource • For DATA: @property makes the node
contents into the value • For URLs: @rel makes the @resource into
the value
![Page 21: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/21.jpg)
Normal HTML
• <html> …
<div class="private”> private static String <strong>_createCookieHash </strong> (hash) …
![Page 22: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/22.jpg)
RDFa: example
• <html xmlns:yui="http://yuilibrary.com/rdf/1.0/yui.rdf#"> …
<div class="private” rel="yui:method" resource="#method__createCookieHash"> private static String <strong property="yui:name"> _createCookieHash </strong> (hash) …
![Page 23: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/23.jpg)
That’s it!
• Automatically picked up by semantic parsers / crawlers
• Can build a SearchMonkey app on it • Can make a mashup way easier than screen
scraping • Can get the data from Yahoo! BOSS
![Page 24: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/24.jpg)
an open platform for using structured data to build more useful and relevant search results
Before After
What is SearchMonkey?
![Page 25: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/25.jpg)
Enhanced Result: Zagat
Key/Value Pairs or Abstract
Links Image
![Page 26: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/26.jpg)
Infobar: Wikipedia Preview
Summary Blob
![Page 27: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/27.jpg)
Part of the puzzle
SearchMonkey
Semantic markup on web pages
Semantic vocabularies
![Page 28: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/28.jpg)
Vocabularies
• Need to speak the same language • I like to see girls of that... caliber. • English, French, Spanish, Esparanto? • URLs to the rescue
– Dublin Core (http://purl.org/dc/elements/1.1/) – Friend of a Friend (http://xmlns.com/foaf/0.1/) – X-Friend Network (http://gmpg.org/xfn/11/) – … (many more)
![Page 29: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/29.jpg)
Syntax
• Nouns, Verbs, and Adjectives, oh my! • All phrases become lots of triples • (Subject, Verb / Adj. / Prep. / etc, Object) • Key / Value pairs ++
– Everything is a URL or String – Subject doesn’t have to be the document
![Page 30: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/30.jpg)
Syntax 2
• Key / Value pair – Title = Awesome SearchMonkey Presentation – Homepage =
http://search.yahoo.com/searchmonkey
• Triples – (self, http://purl.org/dc#title, “Awesome
SearchMonkey Presentation”) – (self, http://vcard#url,
http://search.yahoo.com/searchmonkey)
![Page 31: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/31.jpg)
Decompose to triples
• My friend “Bob” is an idiot. – (self, http://xmlns.com/foaf/0.1/knows,
genid:Ui__152310312_366) – (genid:Ui__152310312_366, http://
www.w3.org/2001/vcard-rdf/3.0#fn, “Bob”) – (genid:Ui__152310312_366, http://
example.org/ptarjan/isInstanceOf, http://example.org/ptarjan/idiot)
• Unnamed nodes are O.K.
![Page 32: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/32.jpg)
Writing URLs takes a lot of work!
• xmlns:foaf=http://xmlns.com/foaf/0.1/ • xmlns:vcard=http://www.w3.org/2001/vcard-rdf/
3.0# • xmlns:junk=http://example.org/ptarjan/ • My friend “Bob” is an idiot.
– (self, foaf:knows, genid:Ui__152310312_366) – (genid:Ui__152310312_366, vcard:fn, “Bob”) – (genid:Ui__152310312_366, junk:isInstanceOf, junk:idiot)
• Unnamed nodes are O.K.
![Page 33: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/33.jpg)
RDFa
• <html xmlns:foaf=“http://xmlns.com/foaf/0.1” xmlns:vcard=http://www.w3.org/2001/vcard-rdf/3.0# xmlns:junk=http://example.org/ptarjan/> <div rel=“foaf:knows”> <span property=“vcard:fn”>Bob</span> <span rel=“junk:isInstanceOf” resource=“junk:idiot” /> </div> </html>
![Page 34: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/34.jpg)
• </SemanticWeb>
• Questions?
![Page 35: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/35.jpg)
Innards of SearchMonkey
• You build a web-service inside our framework
• When a search page renders – We check which SM apps are enabled – We call them
• 50ms for in-page • Long time for AJAX
– They return data in our template – We render them (and cache)
![Page 36: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/36.jpg)
Prototyping with XSLT
• What if I don’t have structured data? – I don’t own the site – I do own the site, but I want to prototype first
• Build an XSLT custom data service first – Write some XSLT to extract the data and
transform it into DataRSS – Mostly about finding the right XPath (use
Firebug or XPather ) – Quick to implement, but brittle – Can’t do a good Enhanced Result
![Page 37: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/37.jpg)
Do it for real
• Demo
![Page 38: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/38.jpg)
Examples
• Rubic’s cube • VTA Bus • API Monkey • BugMeNot • RetailMeNot • Amazon
![Page 39: Semantic Searchmonkey](https://reader038.fdocuments.in/reader038/viewer/2022102901/5555d2c0d8b42a711f8b4af1/html5/thumbnails/39.jpg)
questions?