Web Engineering Basic Technologies: Protocols and Web · PDF fileWeb Engineering Basic...

45
Web Engineering Basic Technologies: Protocols and Web Servers Husni [email protected]

Transcript of Web Engineering Basic Technologies: Protocols and Web · PDF fileWeb Engineering Basic...

Page 1: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Web Engineering Basic Technologies: Protocols and Web Servers

Husni

[email protected]

Page 2: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Basic Web Technologies

• HTTP and HTML

• Web Servers

• Proxy Servers

• Content Delivery Networks

Page 3: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Where we will be later in the course ...

Page 4: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Where we will be later in the course .......

• Supporting a range of client devices

Page 5: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

World-Wide Web

• Series of Protocols• URL/URI unique identification of resources

• URI examples• http://www.inf.ethz.ch/education• mailto: [email protected]• ftp://ftp.inf.ethz.ch/ed/report.txt• tel:+41-44-6321234• ....

• URL is a URI that provides information about how to locate a resource

• HTTP Hypertext Transfer Protocol• HTML Hypertext Mark-Up Language

• Web Browsers• Internet Explorer, Mozilla Firefox …..

Page 6: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTML<html>

<head>

<title>Michael's Personal Home Page</title>

</head>

<body bgcolor="#FFFFFF" text="#000000">

<h1> Michael </h1>

<img src="michael.bmp" align="right"/>

<h2>Work</h2>

Michael works at <a href="http://www.ethz.ch"> ETH Zurich </a>

<h2>Personal</h2>

<address> CNB E106 <br/> Zurich <br/> Switzerland </address>

</body>

</html>

Page 7: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTML …

• Based on Hypertext Style of Navigation

• Simple and Easy to Publish on Web

• Structure, Content or Presentation?• wide use of table elements for formatting layout• address elements describe the content

• Problems of• link maintenance• document interpretation

• Flexible• unknown tags ignored by browsers => easy to extend with customised tags

Page 8: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTML ......

• Document meta data can be included in header

<head>

<title>Michael's Personal Home Page</title>

<meta name="keywords" content="web, databases, java">

<meta name= "authors" content="michael">

<meta http-equiv="expires" content="25 Mar 2006">

<meta http-equiv="Refresh" content="15">

</head>

• Keywords used by search engines

Page 9: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTML5 : The Next Generation of HTML

• New standard for HTML, XHTML and HTML DOM

• Work in progress, most browsers now have some support

• Cooperation between W3C and Web Hypertext Application Technology Working Group (WHATWG)

• One goal was to have a clearer separation of content andpresentation• HTML5 - content• CSS3 - layout as well as look and feel

• Second goal to make it easier to process documents and their content

• Third goal to reduce the need for plug-ins

Page 10: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Key Features of HTML5

• Tags to support a stronger document model to make it easier to identify logical elements of documents• section, article, aside, details, header, footer …

• Support for other media types• video, audio …

• Take over some of the things normally handled by JavaScript such as form field validation• form field input types such as email, url, dates, numbers ….• far richer set of event attributes

• Support for client-side storage• replacing cookies

Page 11: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTTP 1.0

• hypertext transfer protocol

• one object transferred per connection

Page 12: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTTP request

GET /www/globis.html HTTP/1.0

Accept: www/source

Accept: text/html

Accept: image/gif

User-Agent: Lynx/2.4 libwww/2.14

Page 13: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTTP result

HTTP/1.0 200 OKDate: Thursday, 23-April-98 09:00:05 GMTServer: NCSA/1.4.2MIME-version: 1.0Content-type: text/htmlContent-length: 3500

<html>……</html>

note blank linebetween headerand body ofmessage

Page 14: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTML Form

• GET /cgi-bin/globis.pl?user=moira&pass=fred

Page 15: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTML Form

<html>

...

<form action="/cgi-bin/globis.pl" method="GET">

Name: <input type="text" name="user" size=10>

<br/>

Password: <input type="password" name="pass" size=6>

<br/>

<input type="submit" value="ENTER">

<input type="reset" value="CLEAR">

</form>

</html>

Page 16: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Introducing Dynamic Content

• need to introduce some mechanism to execute programs on theserver side and dynamically generate HTML documents

Page 17: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

CGI Programming

• Common Gateway Interface

• Executes Programs on Server Side

Page 18: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

CGI Result

Page 19: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

CGI Programs

• Can be written in any language

• Desirable Features• ease of text manipulation

• ability to access environment variables

• ability to interface with other services

• Commonly Used Languages• Perl, C/C++, Tcl, Java

Page 20: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Accessing Form Data

#!/usr/local/bin/perl

print "Content-type: text/html", "\n\n";

$query_string=$ENV {'QUERY_STRING'};

…..

($field_name, $param) = split (/=/, $query_string);

…..

if ($user eq "moira" ) {

print "Location: /globis.html", "\n\n";

} else ……

Page 21: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Unix Environment Variables

SERVER_NAME

REMOTE_HOST

REMOTE_ADDR

REMOTE_USER

REQUEST_METHOD

QUERY_STRING

.......

Page 22: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

GET and POST

• Two methods for sending Form Data

• GET• appends form data to url

GET /cgi-bin/globis.pl?name=globis HTTP/1.0

• POST• form data read from standard input

POST /cgi-bin/globis.pl HTTP/1.0....user =moirapass=fred

Page 23: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Server Side Includes

• Directives included in HTML Documents• execute programs

• output data such as environment variables

Page 24: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Server Side Includes ...

<html>

<head><title>Globis</title></head>

<body>

<h1>Welcome to <!--#echo var=“SERVER_NAME”--></h1>

......

<address>Moira(<!--#echo var=“DATE_LOCAL”-->)</address>

</body>

</html>

Page 25: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Server Side Includes ......

Configure Server to say

• documents which should be parsed• AddType text/x-server-parsed-html .shtml

• AddType text/x-server-parsed-html .html

• directives supported• Includes - display environment variables etc.

• Exec - execute External Programs

Page 26: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Where to Cache?

• Caching can occur at many different levels and locations in web architectures

• Four fundamental ways for implementing a caching mechanism• browser caching

• proxy caching

• reverse proxy caching/server accelerators

• content delivery networks (CDN)

• We will go on to look at each of these in turn

Page 27: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Browser Caching

• Every browser contains cache of HTML docs & multimedia files

• Browser cache is a directory in user’s hard disk

• Advantages• simple

• universal

• Disadvantages• applies only to static resources

• can be by-passed by content provider who can add suitable HTTP headers to response or directives to HTML page forcing browser not to cache

Page 28: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Proxy Caching

• A proxy cache lies between a community of users (e.g. D-INFK, ETHZ) and the public internet

• Works on same basic principles as browser cache, but on much larger scale (may be hundreds or thousands of users)

• Proxy caches sometimes implemented together with firewalls which control flow of requests/responses between intranet and internet

• Client requests have to somehow be routed to proxy server• can be done through browser’s proxy setting

• interception proxies have requests redirected to them by underlying network

Page 29: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id
Page 30: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

proxy server: cache miss

• http://some.host/path/doc.html

• http_proxy=http://www_proxy.my.domain

Page 31: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

proxy server: cache hit

Page 32: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTTP/1.0

• GET URL

• HEAD URLHTTP/1.1 200 OK

Date: Wed, 10 May 2000 09:33:08 GMT

Server: Apache/1.3.12 (Win32)

Last-Modified: Mon, 01 May 2000 13:37:40 GMT

Content-type: text/html

Content-length: 907

• GET URLIf-Modified-Since: Sunday, 05 Mar 2000 13:00:00 GMT

HEAD similar to GET

but only asks forresponse header

rather than content

Page 33: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Browser and proxy caches

• All caches have a set of rules used to determine what can be cachedand when to use cached resources• some rules set in protocols• some set by cache software (e.g. browser)• some set by cache administrator

• Many of these rules based on information in the HTTPrequest/response header• added by server/browser• explicitly generated by content provider• may be based on type of request or type of content• example: HTTP header containing Cache-Control: no-cache

Page 34: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

General caching rules

• If response’s header says not to keep it, it won’t be cached

• If no validator (e.g. a Last-Modified header) is present on a response,it will be considered uncacheable

• If request is authenticated or secure, it won’t be cached

• A cached object is considered fresh (i.e. able to be sent to clientwithout checking with origin server) if• it has an expiry time or other age-controlling header set and is still fresh• if object already seen and browser cache set to check once a session• if proxy cache has seen object recently and long time since modified

• If a representation is stale, the server will be asked to validate it

Page 35: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTTP header information for caching

• Example

Page 36: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTTP validators and validation

• Validation used by servers and caches to communicate when an object has changed

• Most common validator is Last-Modified time• if cache has object with last-modified time t, generate If-Modified-Since t

request to server to check if object still current

• HTTP 1.1 introduced ETags as another kind of validator• every time object changes, server generates a unique identifier ETag which is

included in HTTP response header of object request• server controls how ETags generated

• Most modern web servers generate both ETags and Last-Modified validators automatically for static content

Page 37: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

HTTP cache-control

• max-age=[seconds]

max amount of time page considered fresh; relative to time of request

• s-maxage=con [seds]

similar to max-age, except only applies to shared caches (e.g. proxies)

• public

marks authenticated responses as cacheable; normally if HTTP authentication required, responses uncacheable

• no-cache

forces cache to submit request to original server every time for validation before releasing cached copy

• no-store

instructs cache not to keep a copy under any circumstances

• must-revalidate

instructs cache to obey any freshness information given about an object; counteracts some conditions in which cache may serve stale representations

• proxy-revalidate

similar to must-revalidate, but only applies to proxy caches

Page 38: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

What doesn’t work

• HTML metatags example<meta http-equiv="expires" content="Thu, 26 May 2005 10:50:00

GMT"><meta http-equiv="pragma" content="no-cache">

• easy to use, but are not very effective• HTML not usually read by proxy servers• few browsers honour such specifications

• Pragma HTTP headers• can include in HTTP header

Pragma: no-cache• HTTP specification does not specify how these should be handled and

many browsers ignore it

Page 39: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Problems of proxy servers

• Connections to servers still required

• Still a high server load

• Servers lose control over their documents• authorisation

• billing

• access statistics

Page 40: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Prefetching caches

• Request objects from the server without an explicit request

• Based on• access patterns• object analysis (HTML documents, ...)• explicit subscriptions

• Reduces latency

• If level of prefetching too high then may pay severe penalties in terms of• increased network traffic• server load

Page 41: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Proxy servers

• advantages• reduce latency, network bandwidth and server load

• opportunity to analyse an organisation's usage patterns

• transparent to clients and servers

• disadvantages• additional resources

• single point of failure

• chance that users get stale data from the cache

Page 42: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Reverse proxy caching

• Reverse proxy caches are also intermediaries, but instead of being deployed by network administrators are deployed by the webmasters themselves (i.e. server side)

• Improve web site’s• performance• reliability• scalability

• Typically some form of load balancer used to make one or more gateway caches look like the origin server to clients

• Sometimes known as “gateway caches”, “surrogate caches”, or “server accelerators”

Page 43: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Content Delivery Networks (CDNs)

• A content delivery network distributes gateway caches throughout the Internet (or part of it) and sells caching to interested web sites• Akamai (http://www.akamia.com)

• Original idea:• when a client requests a page to the origin server, the server returns a page

with rewritten links that point to a node of the CDN so that further client requests are handled by the CDN

• CDN serves requests using multiple cache nodes, selecting theoptimal copy of the page given the geographical location of the user and the real-time network traffic conditions

Page 44: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Content Delivery Networks (CDNs) ...

• CDNS now perform dynamic request routing using the Internet's Domain Name System (DNS)

• DNS is a distributed directory mapping fully qualified domain names(FQDN) to IP addresses

• To determine an FQDN's address, a DNS client sends a request to its local DNS server which then queries a set of authoritative servers

• When local DNS receives a response, it sends address to DNS client and saves it in cache

• Each DNS record has a time-to-live (TTL) field that tells DNS server how long it may cache result

• Normally the association of FQDN to IP address is static

Page 45: Web Engineering Basic Technologies: Protocols and Web  · PDF fileWeb Engineering Basic Technologies: Protocols and Web Servers Husni Husni@trunojoyo.ac.id

Content Delivery Networks (CDNs) ......

• CDNs use modified DNS servers for CDN server selection

• Results of a DNS query to one of these servers may vary depending on source of request and network condition

• To enable fast reaction to dynamic resource changes, the answer returned by the CDN's DNS server has a small TTL

• This approach is largely transparent to client and works for any web content

• Issues• usually assumed client close to their local DNS servers• single request from a local DNS server can represent differing number of web clients

(hidden load factor)