Lecture 4 Basic Web Concepts

25
1 herbert van de sompel CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel [email protected] Lecture 4 Basic Web Concepts

description

CS 502 Computing Methods for Digital Libraries Cornell University – Computer Science Herbert Van de Sompel [email protected]. Lecture 4 Basic Web Concepts. IP address 1. IP address 2. TCP/IP network. HypertexT Transfer Protocol (HTTP). HTTP request. HTTP response. web browser - PowerPoint PPT Presentation

Transcript of Lecture 4 Basic Web Concepts

Page 1: Lecture 4  Basic Web Concepts

1 herbert van de sompel

CS 502 Computing Methods for Digital

Libraries

Cornell University – Computer ScienceHerbert Van de [email protected]

Lecture 4 Basic Web Concepts

Page 2: Lecture 4  Basic Web Concepts

2 herbert van de sompel

HypertexT Transfer Protocol (HTTP)

web serverHTTP server

web browserHTTP client

renders response

HTTP request

HTTP response

IP address 1 IP address 2

TCP/IP network

Page 3: Lecture 4  Basic Web Concepts

3 herbert van de sompel

Transmission Control Protocol/Internet Protocol (TCP/IP )

• is the protocol suite that drives the Internet

• handles network communications between network nodes (computers, printers, webcams, … connected to the Internet)

• protocol suite:

• TCP: communication of data between applications

• IP: communication of data between nodes

• UDP: communication between applications

• ICMP: error and stats

Page 4: Lecture 4  Basic Web Concepts

4 herbert van de sompel

TCP/IP protocol architecture

Application layerClient sends HTTP request

Server receivesHTTP request

Transport layer TCP

Internet layer IP

Network Accesslayer

Ethernet, …

Page 5: Lecture 4  Basic Web Concepts

5 herbert van de sompel

Transmission Control Protocol (TCP)

• breaks message up into chunks

• chunks get sequence number and IP address of addressee

• opens connection with addressee (handshake)

• hands chunks over to IP layer

• guarantees error-free delivery of chunks at addressee (through connection)

Page 6: Lecture 4  Basic Web Concepts

6 herbert van de sompel

Internet Protocol (IP)

• handles the routing of chunks towards addressee (through routers)• IP Addressing:

• each node has an IP address: 157.193.101.6• each node can have readable name erlserv.rug.ac.be • DNS connects IP and readable name

• IP Data Transmission:• sender delivers chunk to router (via lower level protocol)• router delivers chunk to router or host• individual chunks can be delivered via different paths• routers decide on the path of least resistance• at addressee delivers chunk to TCP layer

Page 7: Lecture 4  Basic Web Concepts

7 herbert van de sompel

Application layer

Transport layer

Internet layer

Network Accesslayer

IP, ICMP

TCP, UDP

HTTP, FTP, telnet

Ethernet, …

TCP/IP protocol architecture

Page 8: Lecture 4  Basic Web Concepts

8 herbert van de sompel

HTTP request

web browserHTTP client

HTTP request

no.good.com

GET / HTTP/1.1

Date: Wednesday, 02-Feb-99 23:04:12 GMTAccept-Language: en-usUser-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: no.good.comConnection: Keep-Alive* a blank line *

method

header

entity-body

web serverHTTP server

Page 9: Lecture 4  Basic Web Concepts

9 herbert van de sompel

HTTP request

method URI HTTP-version GET - POST - HEAD – PUT - … GET / HTTP/1.1

header

entity-body

method

• general-header: optional, general informationDate: Wednesday, 02-Feb-99 23:04:12 GMTConnection: Keep-Alive

• request-header: about clientAccept-Language: en-usUser-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT)

• entity-header: about entity-body

What is sent to the server

Page 10: Lecture 4  Basic Web Concepts

10 herbert van de sompel

HTTP response

HTTP response

status

header

HTTP/1.1 200 OKDate: Wednesday, 02-Feb-99 23:04:25 GMTServer: Apache/1.3.6 (Unix)Last-Modified: Sun, 01 Feb 1999 13:54:26 GMTETag: “2f5cd-964-38js8”Content-length: 327Connection: closeContent-Type: text/html* a blank line * <title>Welcome to nogood</title><img src=“/images/nogood-logo.gif”>

web browserHTTP client

no.good.com

web serverHTTP server

entity-body

Page 11: Lecture 4  Basic Web Concepts

11 herbert van de sompel

HTTP response

HTTP-version Status-code Reason-phrase HTTP/1.1 200 OK

header

entity-body

status

• general-header: optional, general informationDate: Wednesday, 02-Feb-99 23:04:25 GMT

• response-header: about serverServer: Apache/1.3.6 (Unix)

• entity-header: about entity-bodyContent-Type: text/htmlETag: “2f5cd-964-38js8”Content-length: 327What is sent to the clienttitle>Welcome to nogood</title><img src=“/images/nogood-logo.gif”>

Page 12: Lecture 4  Basic Web Concepts

12 herbert van de sompel

HTTP request

web browserHTTP client

HTTP request

no.good.com

GET /images/nogood-logo.gif HTTP/1.1

Date: Wednesday, 02-Feb-99 23:04:27 GMTAccept-Language: en-usUser-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: no.good.comConnection: Keep-Alive* a blank line *

web serverHTTP server

Page 13: Lecture 4  Basic Web Concepts

13 herbert van de sompel

HTTP response

HTTP response

HTTP/1.1 200 OKDate: Wednesday, 02-Feb-99 23:04:29 GMTServer: Apache/1.3.6 (Unix)Last-Modified: Sun, 01 Feb 1999 08:20:00 GMTETag: “2f5cd-964-445e”Content-length: 220Connection: closeContent-Type: image/gif* a blank line * the GIF file

web browserHTTP client

no.good.com

web serverHTTP server

Page 14: Lecture 4  Basic Web Concepts

14 herbert van de sompel

HypertexT Transfer Protocol (HTTP)

web serverHTTP server

web browserHTTP client

renders response

HTTP request

HTTP response

MIME type + file

Page 15: Lecture 4  Basic Web Concepts

15 herbert van de sompel

Browser

file

Presentation software

Display

MIME type

• built into browser• plug-in• helper application

Page 16: Lecture 4  Basic Web Concepts

16 herbert van de sompel

HTTP Proxies

web browserHTTP client

no.good.com

web serverHTTP serverHTTP proxy

• Reduce network traffic: caching (Etag, Last-Modified)• IP-based authentication

server

client

cache

Page 17: Lecture 4  Basic Web Concepts

17 herbert van de sompel

HTTP cookies

• HTTP protocol is stateless: once a server has given a response to a client, it forgets about it. No session information.• Fake state with cookies:

• server sends token to client• client sends token back to server• server understands the meaning of the token• for instance: server avoids to require input of username/password with every request by reading authorization from cookie

Page 18: Lecture 4  Basic Web Concepts

18 herbert van de sompel

Dynamic content: Common Gateway Interface (CGI)

web browserHTTP client

HTTP request

no.good.com

web serverHTTP server

program

HTTP response

CGI

• Client interaction with non-web servers

Page 19: Lecture 4  Basic Web Concepts

19 herbert van de sompel

CGI -- HTTP POST request

web browserHTTP client

HTTP request

no.good.com

POST /cgi-bin/find HTTP/1.1

Date: Wednesday, 02-Feb-99 23:04:27 GMTAccept-Language: en-usUser-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: no.good.comConnection: Keep-AliveContent-length: 26Content-type: application/x-www-form-urlencoded* a blank line * search=herbert&type=author

web serverHTTP server

programfind

CGI

Page 20: Lecture 4  Basic Web Concepts

20 herbert van de sompel

CGI -- HTTP GET request

web browserHTTP client

HTTP request

no.good.com

GET /cgi-bin/find?search=herbert&type=author HTTP/1.1

Date: Wednesday, 02-Feb-99 23:04:27 GMTAccept-Language: en-usUser-Agent: Mozilla/4.0 (compatible; MSIE 5.01; Windows NT) Host: no.good.comConnection: Keep-Alive* a blank line *

web serverHTTP server

programfind

CGI

Page 21: Lecture 4  Basic Web Concepts

21 herbert van de sompel

CGI - the interface

programfind

CGI

no.good.com

web serverHTTP server

find receives input from• STDIN• environment variables (about client, server, request …

search=herbert&type=authorSERVER-NAME server.good.comREMOTE-HOST 157.193.101.6…

Page 22: Lecture 4  Basic Web Concepts

22 herbert van de sompel

CGI - the interface

programfind

CGI

no.good.com

web serverHTTP server

find outputs to STDOUTContent-type: text/html

<title>Search results</title>…

web server adds header informationsends response to client

Page 23: Lecture 4  Basic Web Concepts

23 herbert van de sompel

Dynamic content: Mobile code - JavaScript

web browserHTTP client

no.good.com

web serverHTTP server

HTTP response

• Executed by the browser• User interface, client-side validation, …

HTML

JavaScript

Page 24: Lecture 4  Basic Web Concepts

24 herbert van de sompel

Dynamic content: Mobile code – Java applets

web browserHTTP client

no.good.com

web serverHTTP server

HTTP response

• Executed by virtual machine• Interaction with find not via HTTP

Java

programfind

Page 25: Lecture 4  Basic Web Concepts

25 herbert van de sompel

Want to read a bit more?

• on Web Characterization http://www.w3.org/1999/05/WCA-terms/01

• on CGI http://www.ukans.edu/~acs/docs/other/forms-intro.shtml

• on Web, TCP/IP, CGI http://www.wdvl.com/Authoring/Tools/Tutorial/index4.html

• HTTP http://www.ietf.org/rfc/rfc1945.txt?number=1945 ; http://www.jmarshall.com/easy/http/