Dynamic Web Content
description
Transcript of Dynamic Web Content
IBM Research
© 2006 IBM Corporation
Caching Dynamic Web Content: Designing and Analyzing an Aspect-Oriented Solution
Sara Bouchenak – INRIA, France
Alan Cox – Rice University, HoustonSteven Dropsho – EPFL, LausanneSumit Mittal – IBM Research, IndiaWilly Zwaenepoel – EPFL, Lausanne
© 2006 IBM Corporation
Dynamic Web Caching
2
Dynamic Web Content
Database server
Client Web server Application server
SQL req.
SQL res.
HTTP response
HTTP request
Internet
Web tier Business tier Database tier
Cache
Motivation for Caching Represents large portion of web requests
Stock quotes, bidding-buying status on auction site, best-sellers on bookstore
Generation places huge burden on application servers
© 2006 IBM Corporation
Dynamic Web Caching
3
Caching Dynamic Web Content
Dynamic Content Not easy to Cache
– Ensure consistency, invalidate cached entries due to updates
• Write requests can modify entries used by read requests
– Caching logic inserted at different points in the application
• Entry and exit of requests, access to underlying database
• Correlation between requests and their database accesses
Most solutions rely on “manually” understanding complex application logic
© 2006 IBM Corporation
Dynamic Web Caching
4
Our Contributions
Design a cache “AutoWebCache” that• Ensures consistency of cached documents
• Insertion of caching logic transparent to application– Make use of aspect-oriented programming
Analysis of the cache• Transparency of injecting caching logic• Improvement in response time for test-bed applications
© 2006 IBM Corporation
Dynamic Web Caching
5
Dynamic Web Caching – Solution Approach
Database server
Client Web server Application server
SQL req.
SQL res.
HTTP response
HTTP request
Internet
Caching
Logic
AutoWebCache
Request info
Database access
TransparencyCapture information flow
Consistency Correlation between read and write requests
Web Page
Cache
Cache inserts, invalidations
Cache Check
© 2006 IBM Corporation
Dynamic Web Caching
6
Outline
Design of AutoWebCache
– Maintaining cache consistency• Determine relationship between reads and updates
– Cache Structure
Aspectizing Web Caching
– Insertion of caching logic transparently
Evaluation
– Analysis of effectiveness, transparency
Conclusion
© 2006 IBM Corporation
Dynamic Web Caching
7
Maintaining Cache Consistency – Read Requests
Response to read-only requests cached
Read SQL queries recorded with cache entry
……
WebPage2URI2
WebPage1URI1
Cached web page
Index: URI (readHandlerName + readHandlerArgs)
Associated Read Queries
{ Read Query 11, Read Query 12, ….}
{ Read Query 21, Read Query 22, ….}
© 2006 IBM Corporation
Dynamic Web Caching
8
Maintaining Cache Consistency – Write Requests
Result not cached
Write SQL queries recorded
Intersect write SQL queries with read queries of cached pages
Invalidate if non-zero intersection
WSWS RSRS
WSWS RSRS
No No InvalidationInvalidation
InvalidationInvalidation
© 2006 IBM Corporation
Dynamic Web Caching
9
Invalidating Cache Entries
WebPage3
WebPage2URI2
WebPage1URI1
Cached web page
Index: URI (readHandlerName + readHandlerArgs)
Associated Read Queries
{ Read Query 11, Read Query 12, ….}
{ Read Query 21, Read Query 22, ….}
URIn Write Query
{ Read Query 31, Read Query 32, ….}URI3
Remove
© 2006 IBM Corporation
Dynamic Web Caching
10
Query Analysis Engine
Determines intersection between SQL queries
Three levels of granularity for intersection
– Column based
– Value based
– Extra query based
Balance precision with complexity
© 2006 IBM Corporation
Dynamic Web Caching
11
a b c
5
1 10 9
8 7
SELECT T.a FROM T WHERE T.b = 8
UPDATE T SET T.c = 7 WHERE T.b = 10
UPDATE T SET T.a = 12 WHERE T.b = 10
Column Based Intersection
Ok
Invalidate
Invalidate if Column_Read = Column_Updated
© 2006 IBM Corporation
Dynamic Web Caching
12
a b c
5
1 10 9
8 7
SELECT T.a FROM T WHERE T.b = 8
UPDATE T SET T.a = 7 WHERE T.b = 10
Value Based Intersection
UPDATE T SET T.a = 12 WHERE T.b = 8
OkInvalidate
Invalidate with column-
based
Invalidate if Rows_Read = Rows_Updated
© 2006 IBM Corporation
Dynamic Web Caching
13
a b c
5
1 10 9
8 7
SELECT T.a FROM T WHERE T.b = 8
UPDATE T SET T.a = 3 WHERE T.c = 9
SELECT T.b FROM T WHERE T.c = 9
Extra Query Based Intersection
Invalidate with value-
based
Ok
Generate extra query to find missing values
??
© 2006 IBM Corporation
Dynamic Web Caching
14
Outline
Design of AutoWebCache
– Maintaining cache consistency• Determine relationship between reads and updates
– Cache Structure
Aspectizing Web Caching
– Insertion of caching logic transparently
Evaluation
– Analysis of effectiveness, transparency
Conclusion
© 2006 IBM Corporation
Dynamic Web Caching
15
Dynamic Web Caching – Solution Approach
Database server
Client Web server Application server
SQL req.
SQL res.
HTTP response
HTTP request
Internet
Caching
Logic
AutoWebCache
Request info
Database access
TransparencyCapture information flow
Web Page
Cache
Cache inserts, invalidations
Cache Check
© 2006 IBM Corporation
Dynamic Web Caching
16
Aspect-Oriented Programming (AOP)
Modularize cross-cutting concerns - Aspects
– Logging, billing, exception handling
Works on three principles
– Capture the execution points of interest – Pointcuts (1)
• Method calls, exception points, read/write accesses
– Determine what to do at these pointcuts – Advice (2)
• Encode cross-cutting logic (before/ after/ around)
– Bind Pointcuts and Advice together – Weaving (3)
• AspectJ compiler for Java
© 2006 IBM Corporation
Dynamic Web Caching
17
Insertion of Caching Logic
Original web
application
Original web
application
Cache-enabled web application
version
Cache-enabled web application
version
Weaving Rules
Caching library
Caching library
Aspect Weaving (Aspect J)
© 2006 IBM Corporation
Dynamic Web Caching
18
Aspectizing Read Requests
// Execute SQL queries…SQL query 1SQL query 2…
// Generate a web documentwebDoc = …
// Return the web document…
// Execute SQL queries…SQL query 1SQL query 2…
// Generate a web documentwebDoc = …
// Return the web document…
Original code of a read-only request handler String cachedDoc = Cache.get (uri,
inputInfo);if (cachedDoc != null) return cachedDoc; // Cache hit
Cache checkCapturing request entry
Cache.add(webDoc, uri, inputInfo, dependencyInfo); // Cache miss
Cache insertCapturing request exit
Capture main Collect SQL query info
Collecting dependency info
Capturing SQL queries
© 2006 IBM Corporation
Dynamic Web Caching
19
Aspectizing Write Requests
// Execute SQL queries…SQL query 1SQL query 2……
// Return
// Execute SQL queries…SQL query 1SQL query 2……
// Return
Original code of a write request handler Collecting
invalidation info
Collect SQL query info
Capturing SQL queries
// Cache consistencyCache.remove(invalidationInfo);
Cache invalidation
Capturing request exit
Capture main
© 2006 IBM Corporation
Dynamic Web Caching
20
Capturing Servlet’s main Method
// Pointcut for Servlets’ main method pointcut servletMainMethodExecution(...) : execution( void HttpServlet+.doGet( HttpServletRequest, HttpServletResponse)) || execution( void HttpServlet+.doPost( HttpServletRequest, HttpServletResponse));
Pointcut captures entry and exit points of web request handlers
Cache Checks and Inserts for Read Requests
Invalidations for Update Requests
© 2006 IBM Corporation
Dynamic Web Caching
21
Weaving Rules for Cache Checks and Inserts
// Advice for read-only requests around(...) : servletMainMethodExecution (...) {
// Pre-processing: Cache check String cachedDoc; cachedDoc = ... call Cache.get of AutoWebCache if (cachedDoc != null) { ... return cachedDoc }
// Normal execution of the request proceed(...);
// Post-processing: Cache insert ... call Cache.add of AutoWebCache
}
© 2006 IBM Corporation
Dynamic Web Caching
22
Weaving Rules for Cache Invalidations
// Advice for write requests after(...) : servletMainMethodExecution (...) {
// Cache invalidation ... call Cache.remove of AutoWebCache
}
© 2006 IBM Corporation
Dynamic Web Caching
23
Weaving Rules for Collecting Consistency Information
// Pointcut for SQL query calls pointcut sqlQueryCall( ) : call(ResultSet PreparedStatement.executeQuery())
|| call(int PreparedStatement.executeUpdate());
// Advice for SQL query calls after( ) : sqlQueryCall ( ) { ... collect consistency info ...}
After each SQL query, note
Query template
Query instance values
© 2006 IBM Corporation
Dynamic Web Caching
24
Transparency of AutoWebCache
Ability to Capture Information Flow
– Entry and exit points of request handlers
• e.g. doGet(), doPost() APIs for Java Servlets
– Modification to underlying data sets
• e.g. JDBC calls for SQL requests
– Multiple sources of dynamic behavior
• Currently handle dynamic behavior from SQL queries• Need standard interfaces for all sources
© 2006 IBM Corporation
Dynamic Web Caching
25
Hidden State Problem…
…
Number number = getRandom ( );
Image img = getImage (number);
displayImage (img);
request execution
Request does not contain all information for response creation
Occurs when random nos., timers etc. used by application
Subsequent requests result in different responses
Duty of developer to declare such requests non-cacheable
© 2006 IBM Corporation
Dynamic Web Caching
26
Use of Application Semantics
Aspect-orientedness relies on code syntax
– Cannot capture semantic concepts
In TPC-W application
– Best Seller requests allows dirty reads for 30 sec
– Conforms to specification clauses 3.1.4.1 and 6.3.3.1
Application semantics can be used to improve performance
– Best seller cache entry time-out set for 30 sec
© 2006 IBM Corporation
Dynamic Web Caching
27
Outline
Design of AutoWebCache
– Maintaining cache consistency• Determine relationship between reads and updates
– Cache Structure
Aspectizing Web Caching
– Insertion of caching logic transparently
Evaluation
– Analysis of effectiveness
Conclusion
© 2006 IBM Corporation
Dynamic Web Caching
28
Evaluation Environment
RUBiS
– Auction site based on eBay
– Browsing items, bidding, leaving comments etc.
– Large number of requests that can be satisfied quickly
TPC-W
– Models an on-line bookstore
– Listing new products, best-sellers, shopping cart etc.
– Small number of requests that are database intensive
Client Emulator
– Client browser emulator generates requests
– Average think time, session time conform to TPCW v1.8 specification
– Cache warmed for 15 min, statistics gathered over 30 min
© 2006 IBM Corporation
Dynamic Web Caching
29
Response Time for RUBiS – Bidding Mix
No cache
AutoWebCache
0
20
40
60
80
100
120
140
0 200 400 600 800 1000
Number of Clients
Resp
on
se T
ime (
ms)
© 2006 IBM Corporation
Dynamic Web Caching
30
Relative Benefits for different Requests in RUBiS
0
5
10
15
20
25
Searc
h Cat
Searc
h Rgn
View B
ids
View It
em
View U
ser
Per
cen
t o
f R
equ
ests
About
Me
Browse
Cat
Browse
Rgn
Buy N
ow
Put B
id
Put C
mt
Request Type
Hits Misses
© 2006 IBM Corporation
Dynamic Web Caching
31
Response Time for TPC-W – Shopping Mix
1
10
100
1000
10000
50 100 150 200 250 300 350 400
Number of Clients
No cache AutoWebCache Optimization for Semantics
Resp
on
se T
ime (
ms)
© 2006 IBM Corporation
Dynamic Web Caching
32
Relative Benefits for different Requests in TPC-W
0
5
10
15
20
25
adm
in re
ques
t
best
selle
rs
exec
ute
sear
ch
hom
e int
erac
tion
new p
rodu
cts
orde
r disp
lay
orde
r inq
uiry
prod
uct d
etail
sear
ch re
ques
t
Per
cen
t o
f R
equ
ests
Request Type
Hits MissesHits based on app. semantics
© 2006 IBM Corporation
Dynamic Web Caching
33
Implementation of AutoWebCache
Web application Caching library AOP-based caching
Application# Java classes
Java code size
# Javaclasses
Java code size
# AspectJ files (weaving rules)
Size of AspectJ code
TPC-W 46 12K lines
13 4.6K lines 1 150 lines
RUBiS 25 5.8K lines
© 2006 IBM Corporation
Dynamic Web Caching
34
Conclusion
AutoWebCache - a cache that• Ensures consistency of cached documents
– Query Analysis
• Insertion of caching logic transparent to application– Make use of aspect-oriented programming
Transparency of AutoWebCache• Well-defined, standard interfaces for information flow• Presence of hidden states• Use of application semantics
IBM Research
© 2006 IBM Corporation
Questions / Comments / Suggestions !
IBM Research
© 2006 IBM Corporation
Thank You!!
© 2006 IBM Corporation
Dynamic Web Caching
37
SQL Query Structure
SELECT T.a FROM T WHERE T.b=10
UPDATE T SET T.c WHERE 20 < T.d < 35
Column(s) Updated
Column(s) Selected Table Concerned Predicate Condition
© 2006 IBM Corporation
Dynamic Web Caching
38
Response Time for RUBiS – Bidding Mix
0
20
40
60
80
100
120
140
0 200 400 600 800 1000
Number of Clients
Res
po
nse
tim
e (m
s)
No cache
Hand-coded AC extra queryAC value basedAC column based
© 2006 IBM Corporation
Dynamic Web Caching
39
Response Time for TPCW – Shopping Mix
1
10
100
1000
10000
0 50 100 150 200 250 300 350 400 450
Number of Clients
Res
po
nse
tim
e (m
s)
No cache
Hand-coded AC extra queryAC value basedAC column based
© 2006 IBM Corporation
Dynamic Web Caching
40
Cache Structure in AutoWebCache
……
WebPage2URI2
WebPage1URI1
Cached web page
Index: URI (readHandlerName + readHandlerArgs)
<instance values2a, URI7>ReadQueryTemplate2
<instance values3a, URI12>ReadQueryTemplate3
……
<instance values1a, URI1><instance values1b, URI41><instance values1c, URI57>
ReadQueryTemplate1
<value vector, URI> pairIndex: SQL String
Remove
If a Write Query invalidates ReadQueryTemplate1 with instances values1a
© 2006 IBM Corporation
Dynamic Web Caching
41
Evaluation
Analysis of AutoWebCache
– Effect on performance of applications
– Relation of application semantics to cache efficiency
– Relative benefit of caching on different read-only requests
– Usefulness of AOP techniques in implementing the caching system
© 2006 IBM Corporation
Dynamic Web Caching
42
Breakdown of Response Times for Requests in RUBiS
Extra time for a Miss (on top of overall response time)
Overall avg. response time
050
100150200250300350
About
Me
Browse
Cat
Browse
Rgn
Buy N
ow
Put B
id
Put C
mt
Searc
h Cat
Searc
h Rgn
View B
ids
View It
em
View U
ser
Request Type
Re
sp
on
se
Tim
e (
ms
)
© 2006 IBM Corporation
Dynamic Web Caching
43
Breakdown of Response Times for Requests in TPC-W
0
50
100
150
200
250
300
350
adm
in re
ques
t
best
selle
rs
exec
ute
sear
ch
hom
e int
erac
tion
new p
rodu
cts
orde
r disp
lay
orde
r inq
uiry
prod
uct d
etail
sear
ch re
ques
t
Request Type
Res
po
nse
Tim
e (m
s)
Extra time for a Miss (on top of overall response time)
Overall avg. response time
© 2006 IBM Corporation
Dynamic Web Caching
44
Key Aspect-Oriented Programming Concepts
“Join points” identify executable points in system
– Method calls, read and write accesses, invocations
“Pointcuts” allow capturing of various join points
“Advice” specifies actions to be performed at pointcuts
– Before or after the execution of a pointcut
– Encode the cross-cutting logic
© 2006 IBM Corporation
Dynamic Web Caching
45
Conclusion
Dynamic Content Not easy to Cache
– Ensure consistency, invalidate cached entries as a result of updates
AutoWebCache – Query Analysis
– Caching logic inserted at different points in the application• Entry and exit of requests, access to underlying database
– Most solutions rely on understanding complex application logic
AutoWebCache – Transparent insertion of caching logic using AOP
Transparency affected by• Well-defined, standard interfaces for information flow• Presence of hidden states• Use of application semantics
© 2006 IBM Corporation
Dynamic Web Caching
46
Web Caching versus Query Caching
The two are complimentary
Web caching useful when app server is bottleneck
Documents can be cached nearer to the client, distributed
Can make use of application semantics with web page caching (best seller for TPC-W)