Caching Solutions to increase availability of Web Content

40
Caching Solutions to increase availability of Web Content Krithi Ramamritham IIT Bombay [email protected]

description

Caching Solutions to increase availability of Web Content. Krithi Ramamritham IIT Bombay [email protected]. Web sites have traditionally served static content But, dynamic content generation has come into vogue - PowerPoint PPT Presentation

Transcript of Caching Solutions to increase availability of Web Content

Page 1: Caching Solutions   to increase availability of  Web Content

Caching Solutions to increase availability of

Web Content

Krithi RamamrithamIIT Bombay

[email protected]

Page 2: Caching Solutions   to increase availability of  Web Content

2

Web Content• Web sites have traditionally served static

content

• But, dynamic content generation has come into vogue– generated on the fly by running dynamic scripts, e.g., Active

Server Pages (ASP), Java Server Pages (JSP), Servlets– allows generation of different content for the same request

Page 3: Caching Solutions   to increase availability of  Web Content

3

Web PageAd Component

Headline Component

Headline Component

Headline Component

Headline Component

Personalized Component

Navi

gatio

n Co

mpo

nent

A News content site

Dynamic Web Pages…

Page 4: Caching Solutions   to increase availability of  Web Content

IIT Bombay’s aAQUA Community Forum

Farmers get information and

get their questions answered

-- In the local context

-- In their local language

www.aAQUA.org

Capitalizes on existing human and infrastructural resources:

Agri-extension center – KVK, Baramati

NGO – Vigyan Ashram, Pabal

Corporate infrastructure -- ITC e-chaupal

Government – MCIT

Page 5: Caching Solutions   to increase availability of  Web Content

5

Typical End-to-end Web Site Architecture

Users

ApplicationServerCluster

Data

WebServerCluster

.

.

.

.

Page 6: Caching Solutions   to increase availability of  Web Content

6

WS vs. AS

• Web servers– Do well defined and quantifiable local work

• e.g., processing HTTP headers, serving static content • Application servers

– Run multi-layer programs• e.g., scripts involving calls to backends

Page 7: Caching Solutions   to increase availability of  Web Content

7

WEB BROWSER CLIENT

JAVACLIENT

WEBSERVER

with plug-in

zzzzz

APPLICATION SERVER

CORE FUNCTIONALITY

Presentation Logic

Business Logic

ASPJSP

Servlets

COM+CORBA

EJB

Connectors

ADOJDBCODBC

LEGACYAPPLICATIONS

DIRECTORYSERVICES

(LDAP)RMI/IIOP

HTTP(Internet)

DATABASE

VALUE-ADDED SERVICESCommerce

Content ManagementPersonalization

DynamicContent

Accelerator

Servlets

Application Layer Details

Page 8: Caching Solutions   to increase availability of  Web Content

8

The Problem: Page Generation Delays• Causes of page generation delays include

(in addition to pure processing overhead):– Remote database accesses: Heavy I/O loads, Network delays – XML-HTML transformations: Extensive processing delays– Personalization logic: e.g., Broadvision, Vignette, etc.– Interaction bottlenecks: e.g., database connection pools

=> serious performance and scalability problems for web sites due to increased load on server-side infrastructure

Page 9: Caching Solutions   to increase availability of  Web Content

10

Reducing delays• Approaches fall into 3 broad categories:

– Database caching

– Page level caching

– Fragment level caching

Page 10: Caching Solutions   to increase availability of  Web Content

11

Alternative: CDNs

Sources

Repositories

Clients

ContentDistributionNetworks

Page 11: Caching Solutions   to increase availability of  Web Content

12

Push Based Core Infrastructure

• Resilient and efficient content distribution network

(CDN) for dynamic data.

• Existing CDNs : Akamai, Dynamai

Sources

CooperatingRepositories

Clients

Page 12: Caching Solutions   to increase availability of  Web Content

13

Generic Architecture

Data sourcesEnd-hosts

servers

sensors

wired hosts

mobile hosts

Net

wor

k

Net

wor

k

Page 13: Caching Solutions   to increase availability of  Web Content

14

Generic Architecture

Data sources

Proxies/caches

End-hosts

servers

sensors

wired host

mobile host

Net

wor

k

Net

wor

k

Page 14: Caching Solutions   to increase availability of  Web Content

15

The Push Approach

• Proxy registers the data item of interest and the coherency requirement with the server

• Server pushes interesting changes

+ Achieves Strong Consistency + Keeps network overhead minimum-- Poor Scalability (has to maintain state

and has to keep connections open)-- Low Resiliency

Server Proxy UserPush Push

Page 15: Caching Solutions   to increase availability of  Web Content

16

The Pull Approach

Proxy Pulls after Time to Live (TTL) Time To next Refresh (TTR / TNR)

+ Can be implemented using the HTTP protocol+ Stateless and hence is generally scalable with respect to state

space and computation– Weak cache consistency – Heavy polling for stringent coherence requirement or highly

dynamic data– Network overheads higher than for Push

Server Proxy UserPull Push

Page 16: Caching Solutions   to increase availability of  Web Content

17

Database Caching

Two broad types:

• Query result caching

• Middle tier database caching– caching database tables in main memory

Page 17: Caching Solutions   to increase availability of  Web Content

18

Query result caching

• Many application server products offer this feature

• [Luo et. al., 2000] proposed query result caching at Web proxy caches

-- mitigates only local database access latency-- only a subset of query results may be reused in

page generation-- page fragments may not all be from databases

Page 18: Caching Solutions   to increase availability of  Web Content

19

Middle tier database caching• Caching database tables in main memory

Oracle 9i CacheMain-memory databases, e.g., TimesTen

-- mitigates only database access latency-- caching at table granularity results in poor

cache utilization-- main-memory databases are difficult to integrate

and maintain and can be expensive

Page 19: Caching Solutions   to increase availability of  Web Content

20

Page Level Caching• Dynamically generated HTML pages are cached

[Iyengar & Challenger, 1997; Zhu & Yang, 2000]

• Several commercially available products follow this approach, e.g., SpiderCache, Xcache, Dynamai

+ Can completely offload work from web/app server– Low reusability for highly personalized web pages– URL may not uniquely identify a page -- increasing the risk of delivering incorrect pages– Often introduces excessive invalidations -- e.g., even if a single element on the page changes

Page 20: Caching Solutions   to increase availability of  Web Content

21

Reducing page generation delays• Approaches fall into 3 broad categories:

– Database caching

– Page level caching

– Fragment level caching

Page 21: Caching Solutions   to increase availability of  Web Content

22

Page generation script

...

Codeblock

Write to Out

Codeblock

Write to OutHTML Buffer

HTML sent to user

How Dynamic Scripting Works

Page 22: Caching Solutions   to increase availability of  Web Content

23

Page generation script

...

Codeblock

Write to Out

Codeblock

Write to Out

Applicationlogic

Databasecalls

HTMLformatting...

Code Blocks Perform Work

Page 23: Caching Solutions   to increase availability of  Web Content

24

Page generation script

...

Codeblock

Write to Out

Codeblock

Write to Out

Web Page

Ad Component

Headline Component

Headline Component

Headline Component

Headline Component

Personalized Component

Navi

gatio

n Co

mpo

nent

(Example: News content site)Certain components can be cached

Code Blocks <-> Components

Page 24: Caching Solutions   to increase availability of  Web Content

25

DCA: Our Solution

Codeblock

Applicationlogic

Databasecalls

HTMLformatting

Page generation scriptCodeblock

...

Request

Code Block Output

End tag

Start tag

Wor

kby

pass

ed

DynamicContent

Accelerator

Page 25: Caching Solutions   to increase availability of  Web Content

26

DCA in a Typical End-to-end Web Site Architecture

• A single instance of the DCA serves a rack of application servers

• Application servers communicate with DCA through a lightweight API

Users DynamicContent

Accelerator

ApplicationServerCluster

DataWeb

ServerCluster

Page 26: Caching Solutions   to increase availability of  Web Content

27

Cache Management

• A critical aspect of any caching solution

• DCA supports novel cache management strategies:

– Prediction-based cache replacement– Observation-based cache invalidation

Page 27: Caching Solutions   to increase availability of  Web Content

28

Cache Replacement• Prediction-based

replacement⁻ fragments having lowest

probability of access replaced⁻ Least-Likely-to-be-Used (LLU)

– Access probabilities based on:• Current user navigational

patterns over site graph (in the form of clickstreams)• Historical user navigational

patterns over site graph (in the form of association rules)

News

Sports

Hockey

Schedules ScoresPlayers Teams

Site Graph

(News, Sports, Hockey) Schedules = 20%(News, Sports, Hockey) Players = 15%(News, Sports, Hockey) Teams = 10%(News, Sports, Hockey) Scores = 55%

LLU

Page 28: Caching Solutions   to increase availability of  Web Content

29

Cache Invalidation

• DCA supports common cache invalidation techniques:

– Time-based: Each cache element assigned a TTL– Event-based: Updates to the database send an invalidation

message to the cache– On demand: Manual invalidation of selected elements

• DCA supports additional invalidation techniques….

Page 29: Caching Solutions   to increase availability of  Web Content

30

Cache Invalidation…• Other invalidation techniques supported:

– Observation-based• User-initiated updates are observed in scripts; each

such update sends an invalidation message to the cache

• Most appropriate for auction sites, online trading sites• Invalidation does not require communication with the

databases– Keyword-based:

• Elements can be associated with keywords; e.g., a retailer may wish to invalidate all “seasonal” items

– Regular expression-based: • Elements can be invalidated based on regular

expression matching

Page 30: Caching Solutions   to increase availability of  Web Content

31

Other Fragment Level Caching…

+ can offload presentation layer tasks– runs in the application server process space => competes for server resources– application server cluster => multiple cache instances, duplication of content, additional synchronization overhead

app servers (e.g., BEA’s WebLogic, IBM’s WebSphere) cache fragments produced by JSP scripts

ApplicationServerCluster

Page 31: Caching Solutions   to increase availability of  Web Content

32

Other Fragment Level Caching….

• Weave system [VLDB 2000] caches XML fragments, as well as query results and HTML pages– Requires use of declarative web site

specification language

Page 32: Caching Solutions   to increase availability of  Web Content

33

Performance Study

Metric:– Average page generation time time required to construct HTML page

Page 33: Caching Solutions   to increase availability of  Web Content

34

Performance Study…

Test Site

– Fictitious online retail site, allows browsing of product catalog

– Pages generated using JSP scripts– Site content stored in Oracle database– Database schema based on Dublin Core Metadata Open

Standard– Contains 200,000 products and 44,000 categories– Each page consists of 3 components, each involving a

database call

Page 34: Caching Solutions   to increase availability of  Web Content

35

Performance Study…

Test Setup

– Content Database Server: Oracle 8.1.6

– Web/Application Server: WebLogic 6.0 running on cluster of 2 machines

– Server machines:have 1 GB RAM, dual P III-933 Mhz processorsrun Windows 2K Advanced Server

Page 35: Caching Solutions   to increase availability of  Web Content

36

Testing Methodology• DCA compared to 2 middle tier caching solutions:

Main Memory Database: TimesTen used to cache the content database (entire database is cached, runs on database server machine)

Application Server Cache: WebLogic Server JSP caching (WLS Cache)

• For both WLS and DCA, 2 (of 3) page components are cached

• Usually, DCA runs on a separate machine (512 MB RAM, P III-600Mhz processor, running Windows 2K Advanced Server)

Page 36: Caching Solutions   to increase availability of  Web Content

37

Testing Methodology...

• Baseline Parameters:– Cache Size, i.e., percentage of fragments that fit into

cache: 75%– Cache replacement policy: LLU for DCA

• User load is varied by sending requests from client machines running Radview’s WebLoad

• Simulated users navigate site according to Zipf 80-20 distribution (i.e., 80% of users follow 20% of navigation links)

Page 37: Caching Solutions   to increase availability of  Web Content

38

Page Gen. Times vs. Number of Users

TimesTen vs. DCA -- 3x to 9x improvement

TimesTen only mitigates local database access latency -- still requires query processing, formatting operations

0

500

1000

1500

2000

2500

3000

0 200 400 600 800 1000

Load (Number of Users)

Page

Gen

erat

ion

Tim

e (m

illis

econ

ds)

No Cache

TimesTen

WLS Cache

DCA

Page 38: Caching Solutions   to increase availability of  Web Content

39

Page Generation Times...

WLS vs. DCA -- 2x to 5x improvement

WLS runs in application server process space, competes for server resourcesWLS utilizes multiple caches, causing redundant cachingDCA runs as single, standalone logical cache

0

500

1000

1500

2000

2500

3000

0 200 400 600 800 1000

Load (Number of Users)

Page

Gen

erat

ion

Tim

e (m

illis

econ

ds)

No Cache

TimesTen

WLS Cache

DCA

Page 39: Caching Solutions   to increase availability of  Web Content

40

Sensitivity to Cache Size

0

50

100

150

200

250

0 200 400 600 800 1000

Load (Number of Users)

Page

Gen

erat

ion

Tim

e (m

illis

econ

ds)

Cache Size 75%Cache Size 60%Cache Size 90%

As expected, performance improves as cache size increases

Since cached elements are typically quite small (e.g., a few hundred bytes), larger cache sizes are feasible in practice

Page 40: Caching Solutions   to increase availability of  Web Content

41

Conclusion• Increased use of dynamic page generation technologies => increases load on application servers => serious performance and scalability problems for e-business sites • DCA (Dynamic Content Acceleration) => significantly reduces the load on the server side

infrastructure, allows e-business sites to scale => significantly outperforms existing middle tier caching

solutions