Hypertext Transfer Protocol IS 373Web Standards Todd Will
Slide 2
CIS 373---Web Standards-HTTP 2 of 31 Topics Intro to HTTP
Following links What actually happens during a request Content Tips
and Tricks For Next Week
Slide 3
CIS 373---Web Standards-HTTP 3 of 31 Intro HTTP is the
Hypertext Transfer Protocol When you browse the web, you transfer
data between the server and your client machine using http Major
steps performed You start up your browser that can understand and
display html text You either click on a link or type a link into
the address space You make a request of a web server (it listens to
and responds to requests for data from the client) This request can
be any digital resource The web server executes the request and
delivers the returned document to the user The web server
identifies the type of document to the browser The browser displays
the document Images, JavaScript, style sheets are downloaded if
referenced Each additional item that is retrieved generates an
additional request to the server HTTP only defines how the browser
and the web server communicate with each other Actual data moved
using the TCP/IP protocol Simplified version of how HTTP works
Slide 4
CIS 373---Web Standards-HTTP 4 of 31 HTTP Versions HTTP/0.9
Very primitive standard Earliest version HTTP/1.0 In common usage
today HTTP/0.9 very rarely used anymore HTTP/1.1 Extends and
improves HTTP/1.0 Supported by few browsers Client can keep request
open after downloading the file so that a new request does not have
to be generated Decreases server load Reduces bandwidth
Slide 5
CIS 373---Web Standards-HTTP 5 of 31 What happens in HTTP?
Parse the URL The browser must identify the url of the request Most
urls have the form: protocol://server/request-URI Protocol tells
the server the document you want and how to retrieve it Server part
tells the web server which server to query to find the document
Request-uri tells the specific document to retrieve Sending the
Request Most usually, the protocol will be http Sometimes it can be
https to request the data over a secure connection Assume you
wanted the document http://web.njit.edu/~txw5999/index.html GET
/~txw5999/index.html HTTP/1.0 Note the request is all the server
sees, independent of where the request originated, whether it be by
a robot, link validator, or browser
Slide 6
CIS 373---Web Standards-HTTP 6 of 31 Server Response Step 3:
The server response Upon receiving the request, the web server must
identify the document and return it to the user Sample header
content returned to the browser HTTP/1.0 200 OK Server:
Netscape-Communications/1.1 Date: Tuesday, 25-Nov- 97 01:22:04 GMT
Last-modified: Thursday, 20-Nov-97 10:44:53 GMT Content- length:
6372 Content-type: text/html... Followed by the html page HTTP/1.0
tells the browser the version of http used 200 OK is the most
common response, this is the code returned by the server to say all
is well (more on this later) Server: Netscape-Communications/1.1 is
the web server that returns the document Date: Tuesday, 25-Nov-97
01:22:04 GMT is the date and time of the request Last-modified:
Thursday, 20-Nov-97 10:44:53 GMT tells the last time the document
was modified (useful in caching) Content-length: 6372 is how many
bytes the document is Content-type: text/html tells the browser the
returned document type, could be image/gif or something else is the
version of html to be used The browser does not care how the page
was produced, could be by scripts or straight html
Slide 7
CIS 373---Web Standards-HTTP 7 of 31 The Client Request All
requests follow the same basic pattern [METH] [REQUEST-URI]
HTTP/[VER] [fieldname1]: [field-value1] [fieldname2]:
[field-value2] [request body, if any] The METH (for request method)
The request body uri is the url to be retrieved Ver is the http
version used Fieldname and values are on the next slide Getting a
document Get request means to send me a document Assume you wanted
the document http://web.njit.edu/~txw5999/index.html GET
/~txw5999/index.html HTTP/1.0 Longer version request GET / HTTP/1.0
User-Agent: Mozilla/3.0 (compatible; Opera/3.0; Windows 95/NT4)
Accept: */* Host: web.njit.edu:81 Head works just like GET except
just the header will be retrieved
Slide 8
CIS 373---Web Standards-HTTP 8 of 31 Get Header Fields Some of
the header fields that can be used with GET are: User-Agent
Identifies the user-agent Examples: "Mozilla/4.03 [en] (WinNT; I
;Nav) Referer The referer field (yes the standard spells it this
way) Logs where the page request came from Useful to find out where
your audience is located If-Modified-Since If the browser has the
document in its cache, this field can be set to the last time this
version was received If the document is out of date, then it can be
reloaded from the web server Checks to make sure that the cache is
current From The from field contains the email address of the
person who is using the agent SPAMMERs DREAM Web robots use it
sometimes so that webmasters can contact the sender of the robot
Authorization Holds the username and password of the user if
authorization is required to access the page
Slide 9
CIS 373---Web Standards-HTTP 9 of 31 HTTP Status Codes No need
to memorize, just know they exist Codes are the same, but text can
be different 1xx Informational Request received, continuing
process. 100: Continue 101: Switching Protocols 2xx Success The
action was successfully received, understood, and accepted. 200: OK
201: Created 202: Accepted 203: Non-Authoritative Information 204:
No Content 205: Reset Content 206: Partial Content 207:
Multi-Status
Slide 10
CIS 373---Web Standards-HTTP 10 of 31 3xx Status Codes 3xx
Redirection The client must take additional action to complete the
request. 300: Multiple Choices 301: Moved Permanently 302: Found
303: See Other (since HTTP/1.1) 304: Not Modified 305: Use Proxy
(since HTTP/1.1) 306 is no longer used, but reserved. Was used for
'Switch Proxy'. 307: Temporary Redirect (since HTTP/1.1)
Slide 11
CIS 373---Web Standards-HTTP 11 of 31 4xx Status Codes The
request contains bad syntax or cannot be fulfilled. 400: Bad
Request 401: Unauthorized 402: Payment Required 403: Forbidden 404:
Not Found404: Not Found 405: Method Not Allowed 406: Not Acceptable
407: Proxy Authentication Required 408: Request Timeout 409:
Conflict 410: Gone 411: Length Required 412: Precondition Failed
413: Request Entity Too Large 414: Request-URI Too Long 415:
Unsupported Media Type 416: Requested Range Not Satisfiable 417:
Expectation Failed 449: Retry With
Slide 12
CIS 373---Web Standards-HTTP 12 of 31 5xx Status Codes Server
Error The server failed to fulfill an apparently valid request.
500: Internal Server Error 501: Not Implemented 502: Bad Gateway
503: Service Unavailable 504: Gateway Timeout 505: HTTP Version Not
Supported 509: Bandwidth Limit Exceeded
Slide 13
CIS 373---Web Standards-HTTP 13 of 31 Browser Cache If a page
has already been retrieved by your browser, it is usually stored in
your cache If you return to that page, your browser will first
check to see if the data on that page has already been downloaded
and on your local drive If it finds the page or images, the browser
will load those images from your cache and only make the request to
the web server for the changed information Usually set a max size
or a time limit to keep stored pages in your cache Most browsers
have a refresh button that can be selected to force a reload of the
page Reduce the number of requests and the server load as well as
reducing bandwidth costs substantially
Slide 14
CIS 373---Web Standards-HTTP 14 of 31 Proxy Cache Browser
caches are stored on the local machine whereas a proxy cache is
stored on a proxy server The proxy is essentially a cache for many
different users The users browser now checks the proxy to see if a
page is already loaded into its cache If the page is found, the
page is loaded into the users browser cache If the page is not
found, the request is made of the web server After getting the new
page from the server, it is loaded into the proxy cache for anyone
else that may request that page The proxy then returns the cached
page or item to the users local cache Proxy cache reduces network
traffic dramatically and substantially reduces the load on the web
server Skews log statistics dramatically as the requests if they
can be filled by the proxy cache are not seen by the web
server
Slide 15
CIS 373---Web Standards-HTTP 15 of 31 Proxy Cache Hierarchy You
can also have a hierarchy of proxy caches, as in each department in
a company could have its own smaller proxy cache. The page would be
loaded from the local proxy cache if it can be found, and if not,
then make the request of the company cache. If the company cache
cannot fulfill the request, then the request is sent to the web
server to be filled. The returned page or item will then be sent to
the local proxy cache and then to the local cache. This method has
an even larger reduction in network traffic However, the pages may
not be the most current version of the page that the web server
would return The cache should be cleaned out as pages go more out
of date
Slide 16
CIS 373---Web Standards-HTTP 16 of 31 Caching Diagram
Slide 17
CIS 373---Web Standards-HTTP 17 of 31 Cache Replacement
Algorithms LRU : the algorithm replaces the least recently used
document FBR (Frequency Based) : the algorithm takes into account
both the recency and frequency of access to a page LRU/2 : the
algorithm replaces the page whose penultimate (second-to-last)
access is least recent among all penultimate accesses SLRU : the
algorithm combines both the recency and frequency of access when
making a replacement decision
Slide 18
CIS 373---Web Standards-HTTP 18 of 31 Replacement Algorithms
(hit rate) of all algorithms increases with cache size For caches
larger than 1 Gbyte all algorithms perform very close to the best.
For very small caches ( 1 Gbyte) all algorithms have similar
performance.
Slide 19
CIS 373---Web Standards-HTTP 19 of 31 Server Side Programming
Server side scripts run on the web server to respond to requests
from the client There is no way for the client to know whether the
page has been generated from a script or was a straight html file
Used to dynamically change the output of a page based on some type
of input Can accept input from cookies to identify the user for
example and check for authorization to download a file Can also
accept parameters as passed in the address bar
Slide 20
CIS 373---Web Standards-HTTP 20 of 31 Server Side Programming
When to use server side instead of client side Client side will be
much faster to run since it does not need to generate a new request
to the web server every time something changes User server side
when data that needs to be accessed is on the web server and not on
the client machine Use server side to interact with a database on
the web server Best used when infrequent interactions are required
with the server Need to use server side when gathering information
over time and the data is stored on the web server Take for example
Google It would not be good to download Googles entire catalog of
pages to the client Better to send the search query to the web
server at Google and only return those documents that match the
user query Checking to ensure that the user has entered a search
query before sending the request to the web server would be best
served by using a client side script
Slide 21
CIS 373---Web Standards-HTTP 21 of 31 CGI Stands for Common
Gateway Interface A method that allows for web servers and client
side pages to interact with each other Used in the same way by
almost all web servers in existence Web server needs to
differentiate between scripts and ordinary html files CGI scripts
are placed in different cgi directories on the server The web
server is configured to identify all files in a particular folder
as cgi scripts Default directory is cgi-bin
Slide 22
CIS 373---Web Standards-HTTP 22 of 31 More About CGI CGI
programs are ordinary executable programs written in some language
and compiled The CGI script contains a number of environment
variables Think of the ?variable=value seen on web pages Example
the developer could require that the ip address be a variable to
ensure that a hit counter only counts unique visitors The CGI
script returns a text string that can be used to identify the image
to be displayed as above New image source would be:
Slide 23
CIS 373---Web Standards-HTTP 23 of 31 Server Side Programming
CGI is one way to develop sever side scripts Slow and inefficient
to use Better way is to use a server Application Programming
Interface (API) The program essentially is a part of the server
process The programming language is server dependent Much faster
since the program is already in memory and the data that is
required can be easily inputted and results obtained Examples
include ASP, Java Server Pages, Python
Slide 24
CIS 373---Web Standards-HTTP 24 of 31 Server Logs Most servers
keep a log of all requests and responses generated by the server A
sample log is as follows: rip.axis.se - - [04/Jan/1998:21:24:46
+0100] "HEAD /ftp/pub/software/ HTTP/1.0" 200 6312 - "Mozilla/4.04
[en] (WinNT; I)" tide14.microsoft.com - - [04/Jan/1998:21:30:32
+0100] "GET /robots.txt HTTP/1.0" 304 158 - "Mozilla/4.0
(compatible; MSIE 4.0; MSIECrawler; Windows 95)"
microsnot.HIP.Berkeley.EDU - - [04/Jan/1998:22:28:21 +0100] "GET
/cgi-bin/wwwbrowser.pl HTTP/1.0" 200 1445
"http://www.ifi.uio.no/~larsga/download/stats/" "Mozilla/4.03 [en]
(Win95; U)" isdn69.ppp.uib.no - - [05/Jan/1998:00:13:53 +0100] "GET
/download/RFCsearch.html HTTP/1.0" 200 2399
"http://www.kvarteret.uib.no/~pas/" "Mozilla/4.04 [en] (Win95; I)"
isdn69.ppp.uib.no - - [05/Jan/1998:00:13:53 +0100] "GET
/standard.css HTTP/1.0" 200 1064 - "Mozilla/4.04 [en] (Win95;
I)"
Slide 25
CIS 373---Web Standards-HTTP 25 of 31 Server Logs (cont) This
log can be useful in troubleshooting or finding dead links You can
also track page views to determine the popularity of a page Good
practice to review these to see if your web server is having any
problems Caching of web pages can cause problems as they will be
viewed but not counted in the server log
Slide 26
CIS 373---Web Standards-HTTP 26 of 31 Cookies In HTTP, each
request is counted as an individual request Only way to transfer
data between each request is by passing parameters or by using
cookies Say you want to allow a user to log in to your website and
maintain that login across several different pages In straight
http, you cannot pass the user information between different pages
You would need to generate a cookie to store the userid of the
current logged in user Cookies store data Cookies have an expiry
date (can be days, weeks, or session) Need a script to generate and
read data from cookies (html cannot do this)
Slide 27
CIS 373---Web Standards-HTTP 27 of 31 Cookies (cont) Useful to
keep track of user data, but privacy issues are involved that need
to be resolved Keep in mind that the user can turn off cookies in
his or her browser, so you may want to design so that the system
will not fail if a cookie is failed to be read Can check to see if
the browser supports cookies by trying to write a cookie and then
read it back
Slide 28
CIS 373---Web Standards-HTTP 28 of 31 Tips and Hints about HTTP
Hiding the source No but you can try to hide it by putting blank
lines at the top Can make the source look messy so the user has a
hard time finding a particular part Web crawlers can save the html
page without even loading it in browser Downloading images You
cannot stop the user from downloading images from your site Can
watermark images to show where they cam from HTML request does not
care what kind of document it is and will return it anyway Passing
parameters between web pages Cant do this if the type is plain html
Need to design a dynamic web page using asp or java in order to use
the parameters HTML pages will just drop the parameters and do
nothing with them
Slide 29
CIS 373---Web Standards-HTTP 29 of 31 Tips (cont) Preventing
browsers from caching pages Set the expiration date of the content
to a past date Advantage of caching is that the browser can fetch
the page from the cache without generating a new request to the web
server Using slash at the end of a url If the url points to a
directory then yes If it is not included, then the web server must
first check for the file and then if the file does not exist then
try to find the directory Some web servers can automatically direct
you to an index or default file, but only if you include the slash
Good practice to do so
Slide 30
CIS 373---Web Standards-HTTP 30 of 31 Remember HTTP Verbs spell
CRUD Create PUT Read GET Update POST Delete DELETE Of these, GET
and POST are the most important
Slide 31
CIS 373---Web Standards-HTTP 31 of 31 For Next Week Read
Zeldman Chapter 14 HTTP web log reading Next week Web Accessibility
(making the web accessible to everyone, including those with
disabilities)