web designing tutorials,web designing companies,web design company,web development company
Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of...
Transcript of Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of...
![Page 1: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/1.jpg)
Architecture of the World Wide Web Web Informa4on Systems
CS/INFO 431
January 30, 2008 Carl Lagoze – Spring 2008
![Page 2: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/2.jpg)
Acknowledgments
• Erik Wilde – UC Berkeley – h@p://dret.net/lectures/infosys‐ws06/h@p
![Page 3: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/3.jpg)
index.htm
cjl.jpg
base.css
index.htm
cjl.jpg
base.css
Looking beyond the Browser
![Page 4: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/4.jpg)
Think….
• What is the informa4on unit here? – What is the binding principle?
• What is the difference between: – Human percep4on – Machine interpreta4on – Architectural support.
![Page 5: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/5.jpg)
h@p://www.w3.org/History/1989/proposal.html
![Page 6: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/6.jpg)
Web as a graph
![Page 7: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/7.jpg)
Web as a graph
![Page 8: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/8.jpg)
h@p://www.w3.org/TR/webarch/
Make Note: Three URLs for the “same” Intellectual object
![Page 9: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/9.jpg)
Naïve view of web graph
URL
URL
URL
URL
URL
URL
![Page 10: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/10.jpg)
Think for a second…
• When I access google.com on my cell phone it looks different than on my desktop
• When I access google.com from Paris it looks different than when I access it from Ithaca
![Page 11: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/11.jpg)
Architectural Components of the Web
• Par4cipants – Servers – Web Agents
• General agents such as robots – e.g., google crawler • specialized as User Agents
• Iden4fica4on – URIs to iden4fy Resources – URIs have schemes (e.g., HTTP, FTP) that define the “mechanism” for resolu4on to a resource
• Some URIs resolve URLs • Some do not resolve – INFO URIs
• Interac4on – Standardized protocols with exchange of messages
• One example HTTP
– Requests result in return of Representa4ons • Formats
– Representa4on is in form of sequence of bytes with a media type that provides hints to the agent about processing
– One example of a format is a MIME type
![Page 12: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/12.jpg)
A resource is…
• An en4ty that has an iden4ty (a URI) – Some resources are digital – URI <‐> URL – Some are non‐digital – people, ins4tu4ons, etc.
• Abstract – you can’t examine/touch/see a resource – Informa4on hiding
• A service point for ini4a4ng protocol (HTTP) ac4ons • Target of links (you make hyperlinks to a URI)
– <a href=“h@p://google.com”>
• Each URI iden4fies only ONE resource
![Page 13: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/13.jpg)
A representa4on is…
• The result of applying a service request upon a resource • What the server determines to be the state of the resource
– Parameterized • Time, space • Request parameters
– Many to one mapping from resource to representa4on • A package:
– Metadata – about request, server ac4ons, agent – Data – the “content”
• The en4ty that is processed by a web agent – In the case of a browser (user agent) rendered and displayed – Note that many agents such as crawlers make extensive use of metadata (last‐
modified) • The en4ty that is the source of links
– <a href=“h@p://google.com”>
![Page 14: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/14.jpg)
Use cases of representa4ons in increasing complexity
• Sta4c transcrip4on of a file – foo.html is disseminated from h@p://my.org/foo.html
• Dynamic dissemina4on of data based on sta4c file – Transla4on from base jpeg to thumbnail – Result of PHP program
• Mul4ple representa4ons from resource based on user agent or other parameters – google for cell phones and desktops – Content nego4a4on
• Totally 4me dependent representa4ons – h@p://cnn.com
• Ques4ons: – What does a URI iden4fy? – What is a resource as an informa4on en4ty?
![Page 15: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/15.jpg)
Real nature of web graph
URL
URL
URL
URL
R1
R2
R1
R2
R1
R1 R2 R3
URL
URL
URL
URL
URL
URL
![Page 16: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/16.jpg)
Think about it…
The web graph is completely ephemeral: Based on representa4ons
that are 4me and agent context
![Page 17: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/17.jpg)
Closer look at Resource/Representa4on Rela4onship
![Page 18: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/18.jpg)
Content Nego4a4on
Representa4on 1
Representa4on 2
Represents
Represents
Resource
URI
Iden4fies
Content Nego4a4on
![Page 19: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/19.jpg)
HTTP (Hypertext Transfer Protocol)
• HTTP 1.1 – RFC 2516 mp://mp.isi.edu/in‐notes/rfc2616.txt
• Basis of interac4on between web agents and servers • Layered on top of TCP • Uses DNS • Text‐based
– All messages and requests are human readable
• Stateless – No persistent client/server connec4on – All state carried in protocol request/reply (cookies)
![Page 20: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/20.jpg)
Simple HTTP interac4on h@p://google.com
DNS
TCP open circuit
Web Server
HTTP GET index.html
TCP close circuit
![Page 21: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/21.jpg)
HTTP Request Types
• GET – ini4ate a retrieval ac4on based on the URI
• POST – transmit informa4on package to URI at server
• HEAD – same as GET but return only header (metadata) informa4on
• PUT – store the request content at the URI • DELETE – Remove resource at URI
![Page 22: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/22.jpg)
HTTP Example
TCP Connec4on
HTTP Request
HTTP Response Headers (metadata)
HTTP Response Data (chunked)
![Page 23: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/23.jpg)
HTTP Request
• Start line – Consists of method, path, version GET index.html HTTP/1.1 – Valid methods include:
• GET, POST, HEAD, PUT, DELETE • Headers – HTTP/1.1 requires a Host: header Host: cs.cornell.edu
– Many other headers • Op4onal body content
![Page 24: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/24.jpg)
HTTP Response
• Start line – consists of HTTP version, status code, and descrip4on
HTTP/1.1 200 OK
HTTP/1.1 404 Not Found
• Headers (Many header defini4ons) Content-type: text/html
• Content
![Page 25: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/25.jpg)
HTTP Response Codes
• Response coded by first digit – 1xx: informa4onal, request received
– 2xx: success, request accepted – 3xx: redirec4on – 4xx: client error – 5xx: server error
![Page 26: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/26.jpg)
Simple HTTP GET ‐ Response
GET /path/file.html HTTP/1.1 Host: cs.cornell.edu User‐Agent: Mozilla/3.0 [Blank line]
HTTP/1.1 200 OK Content‐Type: text/html Date: Wed, 31 Jan 2007 14:58:57 GMT Content‐Length: 1354
<html> <head> …
![Page 27: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/27.jpg)
Simple HTTP Post
POST /path/script.php HTTP/1.1 Host: cs.cornell.edu User‐Agent: Mozilla/3.0 Content‐Type: applica4on/x‐www‐form‐urlencoded Content‐Length: 32
home=Cosby&favorite+flavor=flies [Blank line]
![Page 28: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/28.jpg)
HTTP Content Nego4a4on (server‐side)
• Resources may have mul4ple dimensions • General idea
– Web agent makes HTTP requests sta4ng constraints – Using constraints server decides on “best” representa4on
• HTTP defined constraints are language, encoding, format, character encoding – Accept, Accept‐Charset, Accept‐Encoding, Accept‐Language
• Server may also use: – User‐agent: device specificity – Host: localiza4on – Cookies
![Page 29: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/29.jpg)
Header request examples for content nego4a4on
![Page 30: Architecture of the World Wide Web Web Informaon Systems · 2008-01-30 · rfc2616.txt • Basis of interacon between web agents and servers • Layered on top of TCP • Uses DNS](https://reader034.fdocuments.in/reader034/viewer/2022042405/5f1ef2136e17e96a0a149998/html5/thumbnails/30.jpg)
Other types of content nego4a4on
• Client‐side – Server response with list of different representa4ons
– Client (or user) makes a choice
• Transparent – Cache plays a role in client‐side nego4a4on