Evaluation of Delivery Techniques for Dynamic
Web Content
Mor Naaman, Hector Garcia-Molina, Andreas Paepcke
Department of Computer Science
Stanford University
{mor, hector, paepcke}@cs.stanford.edu
http://www-db.stanford.edu/
Dynamic Web is Ubiquitous
Problems with Dynamic Pages• Generation of pages is resource-intensive• Pages are too dynamic, or too personalized,
to be cached
• Higher load on servers (page generation and delivery)
• More network traffic
We Evaluate Two Competing Solutions(Both address at least the network load)
ESI (Oracle, Akamai)
•Enables assembly of pages from small fragments
•Fragments can be cached on specialized network caches (edge servers)
•Fragments are assembled on the edge server
Class Based Delta Encoding
•Computes delta of generated page from a chosen base file
•Base files can be cached on network caches
•Client receives delta from the server and base file from cache; applies delta to base file to get final page
A Page Content Model• Page composed from groups; groups
include items.• Page construction
modeled as two-phase selection (groups, then items)
Groups
Items
Our Simulation
Book pages in Amazon-style website
MyYahoo-type personalized pages
Personalized stock portfolio pages
A simple personalized weather page
Test-case web pages:
Simulation of ESI• Assuming Zipf-like distribution for groups and items (popularityi=k/i)• Performance highly dependant on (ranging from 0.7-1.5 in our
simulations)• Hit rate estimates for items:
=Arrival rate; TTL = item time-to-live; = constant
Sample simulation results(bookstore-type resource, With “backend” servers)
Alpha = 0.8
0
50
100
150
200
250
300
0 2000 4000 6000 8000Time-to-live (seconds)
Traf
fic
(Gb
per
Day
)
Client-EdgeEdge-backendBackend-Main site
Traffic vs. TTL
0%
20%
40%
60%
80%
100%
0.7 0.9 1.1 1.3 1.5Alpha
Hit
rate
Edge hit rate
System hit rate
Hit-rate vs. value of Zipfian parameter
Class-Based Delta Encoding Simulation
• For some pages, client likely to be able to re-use base files
0
50
100
150
200
250
300
0 200 400 600 800 1000Number of base files
Traf
fic
(Gb
per
Day
)
Traff ic Without DEAggregate Client Traff icMain Site Traff ic
Traffic vs. number of base files
0
50
100
150
200
250
300
0 2500 5000 7500 10000 12500 15000Threshold (Bytes)
Tra
ffic
(G
b p
er D
ay)
Aggregate Client Traff ic
Main Site Traff ic
Traffic vs. Same-Base threshold
• For other pages, client-cache link traffic is higher than before. To minimize client traffic, use same base file owned by client if delta is larger than threshold
Sample Comparison Numbers
MyYahoo-type pages
Amazon-style Book pages
Savings – client link
Savings – server link
Edge cache usage
ESI 0% 62% 1.5Mb
DE 66% 87% 3.2Mb
Savings – client link
Savings – server link
Edge cache usage
ESI 0% 30% 1.2Mb
DE -8% 82% 2.2Mb
Conclusions
Excellent *, Good +, Bad -, Sometimes ~
All the details: http://dbpubs.stanford.edu/pub/2003-7
ESI DE
Reduces server traffic + *
Reduces client traffic - ~
Reduces computational load on web server * -
Performance dependent on web page structure Yes Yes
Performance dependent on characteristics of data
Yes No
Benefits greater when popularity rises Yes Less
Requires main site hardware/software installation
No Yes
Requires web-page code changes Yes No
Requires network infrastructure (CDN services) Yes No
Can exploit information available from CDN for page construction
Yes No
Top Related