Outperforming LRU with an Adaptive Replacement Cache Algorithm
Cache Tables: Paving the way for an Adaptive Database Cache
description
Transcript of Cache Tables: Paving the way for an Adaptive Database Cache
Cache Tables: Paving the way for an Adaptive
Database CacheMehmet Altınel, Christof Bornhövd, C. Mohan,
Hamid Pirahesh, Berthold Reinwald (IBM Almaden Research Center)
Sailesh Krishnamurthy(Computer Science Division,UC Berkeley)
Presented by:Umar Farooq Minhas
October 04, 2006
2
Motivation
Issues Response time Scalability
Wide-spread use of Transactional Web Applications (TWA) in enterprise applications Broad range of components e.g. network load balancers, HTTP servers,
application servers, … , databases etc.
Solutions Caching of static HTML pages Multiple level caches
3
Motivation contd.. Static Caching, Drawbacks
TWAs tend to be more & more dynamic High volumes of data Highly personalized contents
Run business logic in remote application servers close to end users Reduced response time Reduced load on in-house systems Benefits are limited by the frequency with which remote server needs
to access backend DB
Proposed Solution: DBCache Allows DB caching at mid-tier nodes, remote data centers and edge
servers
4
DBCache: Overview Built using full-fledged DBMS, DB2
Reduced development effort Allows caching of related DB objects
Triggers, constraints, indices , stored procedures, … Makes use of existing distributed query execution
Provides cache transparency
Supports both full-table and partial-table caching
On-demand caching Adapts to dynamically changing loads Exploits typical characteristics of TWA queries
5
DBCache: Contributions Database cache model
Introduces a new DB object ‘Cache Table’ Dynamic/static caching support
Novel query re-write scheme
Cache load and maintenance mechanisms
6
Outline
Motivation DBCache: Overview Cache Tables Dynamic Cache Model Query Compilation Cache Table Population and Maintenance Performance Evaluation Conclusions & Future Work Discussion
7
Cache Tables A Cache Table is a database object by which an end user can specify
that a table (cache table) in a database (cache database) is a cache of a table (backend table) in another database (backend database)
CacheTable
Cache DB
Backend
Table
Backend DB
Two types of cache tables supported: Declarative/Static Cache tables Dynamic Cache tables
8
Declarative/Static Cache Tables When table contents static and known upfront
Use declarative cache tables Similar to materialized views
Entire table cached in absence of predicate definition Exploits existing materialized view support in DB2
9
Dynamic Cache Tables Populated on-demand Provides adaptability Can choose to cache only “hot” items
10
DBCache Schema Setup Cache schema exact mirror of backend DB schema Each backend DB table represented by
Cache Table or Nickname (caching disabled)
Requires no change in existing queries Allows caching of other relevant logical and physical objects
11
Outline
Motivation DBCache: Overview Cache Tables Dynamic Cache Model Query Compilation Cache Table Population and Maintenance Performance Evaluation Conclusions & Future Work Discussion
12
Dynamic Cache Model Key concepts
Cache Keys Defined on cache table column Can be non-unique Must be ‘domain-complete’
Unique/Primary key columns complete by definition Guarantees correctness of equality predicates
13
Dynamic Cache Model Key concepts contd..
Referential Cache Constraints (RCCs) Defined between any cols of two cache tables Creates a cache-parent/cache-child relationship Guarantees the correctness of equi-join predicates Somewhat similar to referential integrity constraints
14
Dynamic Cache Model Key concepts contd..
Cache Groups Set of related cache tables whose content is (directly or transitively)
populated by the values of one or more cache keys of a single cache table, called the root table.
Tables reachable by RCC constraints from the root table are called member tables
Advantages Application context recognized more easily Helps avoiding conflicting cache constraints
15
Dynamic Cache Model Key concepts contd..
Cache Groups contd.. Represented by a directed graph called cache group graph, nodes
denote cache tables and edges denote RCCs Direction of an edge for RCC is from a cache-parent to a cache-child Bi-directional edges possible Two or more groups can be overlapping
Captured in connectivity graphs
16
Dynamic Cache Model Issues with Cache Constraints
Can cause unexpected cache loads resulting in a phenomena called recursive cache load problem
A cache group is called safe if it avoids this problem How to ensure group safety ?
17
Dynamic Cache Model Rules for cache group safety
Rule-1: A cache group graph must not include any heterogeneous cycles.
Rule-2: A cache table must not have more than one non-unique domain-complete column.
A new cache constraint is created only if it doesn’t violate Rule 1 and Rule 2.
18
Outline
Motivation DBCache: Overview Cache Tables Dynamic Cache Model Query Compilation Cache Table Population and Maintenance Performance Evaluation Conclusions & Future Work Discussion
19
Query Compilation Declarative Cache Tables
Existing materialized view matching mechanism in DB2 is exploited Name switching
Dynamic Cache Tables Generate two plans local plan and remote plan Choose at run-time through a switch operator which uses the probe query to
decide which leg to execute Janus (two-headed) plan: derived from Roman Mythology God of gates, doors, doorways, beginnings and endings. Month of January ?
http://en.wikipedia.org/wiki/Janus_%28mythology%29
20
Query Compilation Constructing a Janus Plan:
Initial QueryPlan
Remote QueryPlan
Replace Cache Table
names with Nicknames
1
2 Generate a probe query by checking all equality predicates thatcan potentially participate in probe query conditionif none found then ABORT ( remote query plan gets executed )
Cloned Input Query Graph
Local QueryPlan
Replace Nicknames with
eligible Cache Table namesfrom step - 2
3
4 Insert switch operator on top of remote, local and probe query plans
21
Outline
Motivation DBCache: Overview Cache Tables Dynamic Cache Model Query Compilation Cache Table Population and Maintenance Performance Evaluation Conclusions & Future Work Discussion
22
Cache Table Population & Maintenance Declarative Cache Tables
Relies on DPropR utility: IBM’s asynchronous data replication tool
Dynamic Cache Tables On-demand loading
Cache key values failing probe query are used to extract data
Extracted data populated asynchronously by a cache daemon
Cache invalidation Generate invalidation
messages and send to cache daemon
Cache daemon generates and executes deletes against cacheDB
Updated rows get loaded with new requests
23
Outline
Motivation DBCache: Overview Cache Tables Dynamic Cache Model Query Compilation Cache Table Population and Maintenance Performance Evaluation Conclusions & Future Work Discussion
24
Performance Evaluation Focus: Evaluate overhead of Janus plans for dynamic tables
Overhead of probe query and switch operator Overhead of on-demand loading
Experimental settings
25
Performance Evaluation Cache Hit Case
Janus plan vs. pure local queries Difference gives the overhead for probe query and the switch operator Cache table loaded with all the data from backend table
26
Performance Evaluation Cache Miss Case
Janus plan vs. pure remote queries Difference gives the overhead Cache table initially empty
27
Outline
Motivation DBCache: Overview Cache Tables Dynamic Cache Model Query Compilation Cache Table Population and Maintenance Performance Evaluation Conclusions & Future Work Discussion
28
Conclusions & Future Work Significant contributions
Provides a new frame-work to implement DB caching for TWAs and tends to provide: Seamless integration with current applications Supports static/dynamic cache tables Adapts to the changing workloads in TWAs Re-uses the functionality of a full-fledged DBMS i.e. DB2
What next ? Provide efficient, scalable, zero-admin DBCache Development of new tools to ease deployment Improve adaptability and maintenance
29
Comparison
vs. amco05: Relies on asynchronous data propagation utility Not completely transparent May not work for heterogeneous DBMSs Allows stale data
vs. gula04: Cache constraints against C&C constraints Doesn’t provide any guarantees of freshness/consistency Relatively more transparent Maintenance-centric vs. query-centric Both deployed as mid-tier level caches Both use a full-fledged DBMS Both use Materialized views Both use two-headed query plans
30
Discussion Is it really that good ?
Using full-fledged DBMS at each middle-tier node, drawbacks ? How is data freshness specified/guaranteed ? Is it adaptable ? Weakly ? Strongly ? When can cache constraints become bottleneck ? Size of dynamic cache tables ?
Cache replacement policies/cleansing mechanisms? Caching of other physical & logical DB Objects ?
Updates to those objects in backend DB? Message traffic between Cache Daemon & Backend DB ?
Very frequent updates in backend DB Local updates ? Flaws in performance evaluation ?