Database-Powered Web Servers

21
Database-Powered Web Servers Alexandros Labrinidis Univ. of Pittsburgh [email protected]

Transcript of Database-Powered Web Servers

Database-Powered Web Servers

Alexandros LabrinidisUniv. of Pittsburgh

[email protected]

Alexandros Labrinidis, Univ. of Pittsburgh 2 CMU . CS 415 . 07 November 2002

Examples of db-powered web sites

n Google.com (search engine)n Amazon.com (shopping)n eBay.com (auctions)n WellsFargo.com (online banking)n weather.com (forecasts)n expedia.com (travel)n NSF.gov (proposal submission)n my.yahoo.con (personalized newspaper)n NYTimes.com (electronic newspaper)

Alexandros Labrinidis, Univ. of Pittsburgh 3 CMU . CS 415 . 07 November 2002

Motivating Example: nytimes.com

Alexandros Labrinidis, Univ. of Pittsburgh 4 CMU . CS 415 . 07 November 2002

Why is WebDB so popular?

n Arguments from web “producers”:n easy to “publish” databases over the Webn wealth of information availablen enable personalizationn allow targeted advertising

n Arguments from web “consumers”:n no need to install special softwaren easy to learn - uniform user interface n personalized content

n Today – even seemingly static sites have WebDB

Alexandros Labrinidis, Univ. of Pittsburgh 5 CMU . CS 415 . 07 November 2002

Typical WebDB architecture – 3 tiers

n Web server: handle HTTP requestsn Application server: web workflown DB server: data storage & queries

User 1

User 2

User 3

User …

User n

internet

web server

app server

db server

Alexandros Labrinidis, Univ. of Pittsburgh 6 CMU . CS 415 . 07 November 2002

Typical WebDB architecture – 2 tiers

2-tiers: incorporate application server within web server

User 1

User 2

User 3

User …

User n

internet

web & appserver

db server

Alexandros Labrinidis, Univ. of Pittsburgh 7 CMU . CS 415 . 07 November 2002

HTTP: HyperText Transfer Protocol

Client request:

nitrogen{4} telnet www.google.com 80

Trying 216.239.37.101...

Connected to www.google.com.

Escape character is '^]'.

GET /index.html HTTP/1.0

[two carriage returns]

Alexandros Labrinidis, Univ. of Pittsburgh 8 CMU . CS 415 . 07 November 2002

HTTP: HyperText Transfer Protocol

Server response:

HTTP/1.0 200 OKContent-Length: 2532Connection: CloseServer: GWS/2.0Date: Thu, 07 Nov 2002 16:57:59 GMTContent-Type: text/htmlCache-control: private

Set-Cookie: PREF=ID=24bce47555c1db8b:TM=1036688279:LM=1036688279:S=t4XqRr3VPTPwKMEp; expires=Sun, 17-Jan-2038 19:14:07 GMT; path=/; domain=.google.com

<html>….</html>Connection closed by foreign host.

Alexandros Labrinidis, Univ. of Pittsburgh 9 CMU . CS 415 . 07 November 2002

HTML: HyperText Markup Language

n page wrapped in <html></html>

n many formatting commands:n <font color=“red”>SOMETHING IN RED</font>n CSS: Cascading StyleSheets

n Form example:<form action=“/activate.cgi” method=POST>Please give your name:<input type=text name=“username”

size=15 maxlength=30><input type=submit value=“Submit Name”></form>

Alexandros Labrinidis, Univ. of Pittsburgh 10 CMU . CS 415 . 07 November 2002

CGI: Common Gateway Interface

n Introduced in the early 90sn Protocol for exchanging form data between

client and server

User requests formWeb Server sends form to user

User submits form CGI forwards formdata to application

process data &generate output

Web Server sends output to userUser receives output

network db server

Alexandros Labrinidis, Univ. of Pittsburgh 11 CMU . CS 415 . 07 November 2002

CGI encoding

n method=GETn Form data are encoded as part of the URL

(e.q. http://www.google.com/search?q=cmu)

n method=POSTn Form data are passed via environment variables

n encoding of query string:n (name,value) pairs from FORMn name1=value1&name2=value2&…n space à +n other controls characters in hexadecimal format

Alexandros Labrinidis, Univ. of Pittsburgh 12 CMU . CS 415 . 07 November 2002

Cookies

n HTTP is connection-lessn a new connection is established with every requestn HTTP/1.1 supports persistent connections (but not popular)

n Q: How to maintain state over a session?n A: Cookies

n Cookies are text-only strings that are stored at the browsers’ memory (and disk)

n .google.com TRUE / FALSE 2147368447 PREF ID=0a2612f3162aa05f:TM=1035999053:LM=1035999053:S=ko0-i9cOsU6nMaW6

Alexandros Labrinidis, Univ. of Pittsburgh 13 CMU . CS 415 . 07 November 2002

Cookies and Databases

n Use cookie to store client-ID

n Storing client-ID enables personalizationn Example: NYTimes.com

n Storing client-ID enables access to restricted areasn Example: NYTimes.com (subscription-based access to articles)

n Use cookie to prohibit duplicate submissions on pollsn Randomly generated IDn Matched against database of those already “voted”

Alexandros Labrinidis, Univ. of Pittsburgh 14 CMU . CS 415 . 07 November 2002

Cookies and Privacy

n Cookies only store text-strings given by serversn Include domain, expiration date, etc

n Only a server from the same domain can access a previously stored cookie

n The case of Doubleclick:n Same cookie used by an advertising agencyn Match individuals with browsing profiles that span multiple sitesn Huge data mining opportunityn Huge controversy

Alexandros Labrinidis, Univ. of Pittsburgh 15 CMU . CS 415 . 07 November 2002

Client-side programs

n Extend web user interface by running programs at clientsn Allow for sophisticated UIsn Must be careful of malicious code (from untrusted servers)

n Javan Full-fledged programming languagen Protection capabilities

n Other client-side scripting languages:n JavaScript (SUN)n VBScript (MSFT)n Flash/Shockwave (Macromedia)

Alexandros Labrinidis, Univ. of Pittsburgh 16 CMU . CS 415 . 07 November 2002

Server-side programs

n Implement server applications and workflown Programs that generate HTMLn Embedded HTML

n Java:n Servletsn Java Server Pages (JSP – SUN)

n Server-side scripting:n Active Server Pages (ASP – MSFT)n PHPn mod_perl

Alexandros Labrinidis, Univ. of Pittsburgh 17 CMU . CS 415 . 07 November 2002

Embedded HTML: PHP

n PHP stands for “PHP Hypertext Processor”n http://www.php.net

n Example:

<html> <head> <title>Example</title> </head> <body> <?php echo "Hi, I'm a PHP script!"; ?> </body> </html>

Alexandros Labrinidis, Univ. of Pittsburgh 18 CMU . CS 415 . 07 November 2002

Generating HTML: mod_perl

n mod_perl: efficiently use perl to generate HTMLn http://perl.apache.org

n Example:

#!/bin/perlprint << EOF;<html> <head> <title>Example</title> </head> <body>Hi, I was generated by mod_perl! </body> </html>EOF

Alexandros Labrinidis, Univ. of Pittsburgh 19 CMU . CS 415 . 07 November 2002

DB interface: mod_perl

n Two modules – allow for portability:n DBI: database independent libraryn DBD: database dependent driver

n Example:

use DBI;my $dbh = DBI->connect(“oracle”, “user”, “pass”);my $stmt = $dbh->prepare(“SELECT * from foo;”);$stmt->execute();while (@row = $stmt->fetchrow_array() ) {

print “Row: @row\n”;}$dbh->disconnect();

Alexandros Labrinidis, Univ. of Pittsburgh 20 CMU . CS 415 . 07 November 2002

Improving Performance: CGI

n CGI forks new process on every calln context switchn re-connect to db server

n mod_perl: maintain pool of processesn no context switch – assign call to existing processn scales better – 10 times faster

[Labrinidis & Roussopoulos, SIGMOD Record, Mar 2000]

n mod_perl: re-use db connectionsn no need to re-connect to db servern twice faster than mod_perl

Alexandros Labrinidis, Univ. of Pittsburgh 21 CMU . CS 415 . 07 November 2002

Improving Performance: Caching

n Web Cachingn store static content close to the usersn avoid re-transmitting over the networkn consistency: Time-To-Live (TTL)

n Dynamic Web Cachingn store dynamic content (from db) and re-use if possiblen avoid re-computing from databasen what to cache:

n entire web pagesn query resultsn HTML fragments