George James :: Querying The Web

Post on 18-Nov-2014

1.922 views 0 download

Tags:

description

 

Transcript of George James :: Querying The Web

Querying the Web

Out of the Slipstream :: September 27, 2007

Querying the Web

“Information wants to be free” Stewart Brand, Whole Earth Catalogue May 1985

“If the new computer set up allowed folks inside to be more creative and independent, why not open it up to outsiders, too?”

Jeff Bezos, Amazon March 2002

“Data is the Next Intel Inside” Tim O’Reilly September 2005

Open Source has commoditized software Creative Commons will commoditize information Which leaves servers, services and service…

General Medical Council

General Medical Council

General Medical Council

Freebase

Freebase

Freebase

Freebase

Freebase

Metaweb Query Language Request:

{ "type" : "/medicine/physician",

"name" : “Michael Maher“ } Response:

{ "code": "/api/status/ok", "result": { "type": "/medicine/physician", "name": “Michael Maher", “gender”: “Male”,

“education”: “Leeds University”}

}

JSON

Freebase User sourced content API Extensible, dynamic Creative Commons / PD Automatic right to use

Stepwise refinement

GMC Authoritative Website based search Static Restrictive license Even if you pay for the

data you still cannot use it, legally.

Periodic updates

GMC vs Freebase

REST

REpresentational State Transfer Less rigourous equivalent of SOAP Data are considered to be resources Every resource has a unique address Layered over http:

Client/Server separation Stateless Cacheable

Request:GET http://rest.georgejames.com/product/Serenji/

Response:Name=Serenji

Price=195.00

OrderCode=H1001

Amazon S3

S3 :: Simple Storage Service Online storage space $0.15 per Gbyte per month for storage ~ $0.20 per Gbyte data transfer

Storage request:PUT http://s3.amazonaws.com/[bucket-name]/[key-name]

Retrieval request:GET http://s3.amazonaws.com/[bucket-name]/[key-name]

EC2 :: Elastic Compute Clouds

Microformats

Microformats

Without Microformats:<div class=‘opaque’> Out of the Slipstream is a one-day conference on

Thursday 27 September 2007 at Brooklands Museum, Surrey, UK.

</div>

With Microformats:<div class=‘opaque vevent’> <span class='summary'>Out of the Slipstream</span>

is a one-day conference on <abbr class="dtstart" title="20070927"> Thursday 27 September 2007 </abbr> at Brooklands Museum, Surrey, UK. </div>

Microformats

Astoria

Astoria in action

Request:http://astoria.sandbox.live.com/northwind/northwind.rse/Categories

Response:

Astoria in action

Request:http://astoria.sandbox.live.com/northwind/northwind.rse/Customers

Response:

Astoria in action

Request:/Customers[FRANK]

Response:

Astoria in action

Request:/Customers[FRANK]/Orders

Response:

Astoria in action A variety of response formats:

POX Web3S (Web, Structured, Schema’d and Searchable) ATOM JSON

JSON request:/Customers[FRANK]?$format=json

Response:

Astoria is still evolving

Ongoing discussion about the format of requests: /Customers!’FRANK’ /Customers!’FRANK’/Orders!10267 /Customers!CustomerID=‘FRANK’ /Customers(‘FRANK’) /Customers(‘FRANK’)/Orders(10267)

Qualifiers control the response format: /Customers(‘FRANK’)/CustomerName /Customers(‘FRANK’)/CustomerName/$value /Customers(‘FRANK’)/$format=json /Customers/$skip=30&$take=10

Currently being Microsoftened…

Where is all this information going to come from?

Crowdsourcing

Jeff Howe, Wired Magazine, June 2006 Delegating an activity to a large number of

unidentified individuals Small finite tasks Quantity more important than quality The sum is greater than the parts Examples:

Wikipedia

Crowdsourcing

Crowdsourcing

Google Maps

Google Maps

Crowdsourcing

Jeff Howe, June 2006, Wired Magazine Delegating an activity to a large number of unidentified

individuals Small finite tasks Quantity more important than quality The sum is greater than the parts

Examples: Wikipedia Galaxy Zoo Amazon Mechanical Turk Google route planner

Consequences: Drives down the cost of data Ownership may not be the traditional incubents Client / user needs to discriminate

The Power of Information Review Commissioned by the Cabinet Office, published in June

2007, to review and advise on the use of public sector information.

Recommendation 9:By Budget 2008, government should commission and publish an independent review of thecosts and benefits of the current trading fund charging model for the re-use of public sector information, including the role of the five largest trading funds, the balance of direct versus downstream economic revenue, and the impact on the quality of public sector information.

US: Public Domain UK: Crown Copyright

AND - Automotive Navigation Data

Press release:July 4, 2007Rotterdam - AND Automotive Navigation Data hasagreed ... to donate digital maps of the Netherlands, China and India to the community.

More ways of querying the web

Google Search Google Events Google Base Yahoo! Pipes RSS – Really Simple Syndication KML BBC Backstage

The Internet is the Database

Thank you

Questions?