Cache Tables: Paving the way for an Adaptive Database Cache Mehmet Altınel, Christof Bornhövd, C....

30
Cache Tables: Paving the way for an Adaptive Database Cache Mehmet Altınel, Christof Bornhövd, C. Mohan, Hamid Pirahesh, Berthold Reinwald (IBM Almaden Research Center) Sailesh Krishnamurthy (Computer Science Division,UC Berkeley) Presented by: Umar Farooq Minhas October 04, 2006

Transcript of Cache Tables: Paving the way for an Adaptive Database Cache Mehmet Altınel, Christof Bornhövd, C....

Cache Tables: Paving the way for an Adaptive

Database Cache

Mehmet Altınel, Christof Bornhövd, C. Mohan, Hamid Pirahesh, Berthold Reinwald

(IBM Almaden Research Center)

Sailesh Krishnamurthy(Computer Science Division,UC Berkeley)

Presented by:Umar Farooq Minhas

October 04, 2006

2

Motivation

Issues Response time Scalability

Wide-spread use of Transactional Web Applications (TWA) in enterprise applications Broad range of components e.g. network load balancers, HTTP servers,

application servers, … , databases etc.

Solutions Caching of static HTML pages Multiple level caches

3

Motivation contd..

Static Caching, Drawbacks TWAs tend to be more & more dynamic High volumes of data Highly personalized contents

Run business logic in remote application servers close to end users Reduced response time Reduced load on in-house systems Benefits are limited by the frequency with which remote server needs

to access backend DB

Proposed Solution: DBCache Allows DB caching at mid-tier nodes, remote data centers and edge

servers

4

DBCache: Overview

Built using full-fledged DBMS, DB2 Reduced development effort Allows caching of related DB objects

Triggers, constraints, indices , stored procedures, … Makes use of existing distributed query execution

Provides cache transparency

Supports both full-table and partial-table caching

On-demand caching Adapts to dynamically changing loads Exploits typical characteristics of TWA queries

5

DBCache: Contributions

Database cache model Introduces a new DB object ‘Cache Table’ Dynamic/static caching support

Novel query re-write scheme

Cache load and maintenance mechanisms

6

Outline

Motivation DBCache: Overview Cache Tables Dynamic Cache Model Query Compilation Cache Table Population and Maintenance Performance Evaluation Conclusions & Future Work Discussion

7

Cache Tables

A Cache Table is a database object by which an end user can specify that a table (cache table) in a database (cache database) is a cache of a table (backend table) in another database (backend database)

CacheTable

Cache DB

Backend

Table

Backend DB

Two types of cache tables supported: Declarative/Static Cache tables Dynamic Cache tables

8

Declarative/Static Cache Tables

When table contents static and known upfront Use declarative cache tables Similar to materialized views

Entire table cached in absence of predicate definition Exploits existing materialized view support in DB2

9

Dynamic Cache Tables

Populated on-demand Provides adaptability Can choose to cache only “hot” items

10

DBCache Schema Setup

Cache schema exact mirror of backend DB schema Each backend DB table represented by

Cache Table or Nickname (caching disabled)

Requires no change in existing queries Allows caching of other relevant logical and physical objects

11

Outline

Motivation DBCache: Overview Cache Tables Dynamic Cache Model Query Compilation Cache Table Population and Maintenance Performance Evaluation Conclusions & Future Work Discussion

12

Dynamic Cache Model

Key concepts Cache Keys

Defined on cache table column Can be non-unique Must be ‘domain-complete’

Unique/Primary key columns complete by definition Guarantees correctness of equality predicates

13

Dynamic Cache Model

Key concepts contd.. Referential Cache Constraints (RCCs)

Defined between any cols of two cache tables Creates a cache-parent/cache-child relationship Guarantees the correctness of equi-join predicates Somewhat similar to referential integrity constraints

14

Dynamic Cache Model

Key concepts contd.. Cache Groups

Set of related cache tables whose content is (directly or transitively) populated by the values of one or more cache keys of a single cache table, called the root table.

Tables reachable by RCC constraints from the root table are called member tables

Advantages Application context recognized more easily Helps avoiding conflicting cache constraints

15

Dynamic Cache Model

Key concepts contd.. Cache Groups contd..

Represented by a directed graph called cache group graph, nodes denote cache tables and edges denote RCCs

Direction of an edge for RCC is from a cache-parent to a cache-child Bi-directional edges possible Two or more groups can be overlapping

Captured in connectivity graphs

16

Dynamic Cache Model

Issues with Cache Constraints Can cause unexpected cache loads resulting in a phenomena called

recursive cache load problem A cache group is called safe if it avoids this problem How to ensure group safety ?

17

Dynamic Cache Model

Rules for cache group safety

Rule-1: A cache group graph must not include any heterogeneous cycles.

Rule-2: A cache table must not have more than one non-unique domain-complete column.

A new cache constraint is created only if it doesn’t violate Rule 1 and Rule 2.

18

Outline

Motivation DBCache: Overview Cache Tables Dynamic Cache Model Query Compilation Cache Table Population and Maintenance Performance Evaluation Conclusions & Future Work Discussion

19

Query Compilation Declarative Cache Tables

Existing materialized view matching mechanism in DB2 is exploited Name switching

Dynamic Cache Tables Generate two plans local plan and remote plan Choose at run-time through a switch operator which uses the probe query to

decide which leg to execute Janus (two-headed) plan: derived from Roman Mythology God of gates, doors, doorways, beginnings and endings. Month of January ?

http://en.wikipedia.org/wiki/Janus_%28mythology%29

20

Query Compilation Constructing a Janus Plan:

Initial QueryPlan

Remote QueryPlan

Replace Cache Table

names with Nicknames

1

2 Generate a probe query by checking all equality predicates thatcan potentially participate in probe query conditionif none found then ABORT ( remote query plan gets executed )

Cloned Input Query Graph

Local QueryPlan

Replace Nicknames with

eligible Cache Table namesfrom step - 2

3

4 Insert switch operator on top of remote, local and probe query plans

21

Outline

Motivation DBCache: Overview Cache Tables Dynamic Cache Model Query Compilation Cache Table Population and Maintenance Performance Evaluation Conclusions & Future Work Discussion

22

Cache Table Population & Maintenance

Declarative Cache Tables Relies on DPropR utility:

IBM’s asynchronous data replication tool

Dynamic Cache Tables On-demand loading

Cache key values failing probe query are used to extract data

Extracted data populated asynchronously by a cache daemon

Cache invalidation Generate invalidation

messages and send to cache daemon

Cache daemon generates and executes deletes against cacheDB

Updated rows get loaded with new requests

23

Outline

Motivation DBCache: Overview Cache Tables Dynamic Cache Model Query Compilation Cache Table Population and Maintenance Performance Evaluation Conclusions & Future Work Discussion

24

Performance Evaluation

Focus: Evaluate overhead of Janus plans for dynamic tables Overhead of probe query and switch operator Overhead of on-demand loading

Experimental settings

25

Performance Evaluation

Cache Hit Case Janus plan vs. pure local queries Difference gives the overhead for probe query and the switch operator Cache table loaded with all the data from backend table

26

Performance Evaluation

Cache Miss Case Janus plan vs. pure remote queries Difference gives the overhead Cache table initially empty

27

Outline

Motivation DBCache: Overview Cache Tables Dynamic Cache Model Query Compilation Cache Table Population and Maintenance Performance Evaluation Conclusions & Future Work Discussion

28

Conclusions & Future Work

Significant contributions Provides a new frame-work to implement DB caching for TWAs and tends to

provide: Seamless integration with current applications Supports static/dynamic cache tables Adapts to the changing workloads in TWAs Re-uses the functionality of a full-fledged DBMS i.e. DB2

What next ? Provide efficient, scalable, zero-admin DBCache Development of new tools to ease deployment Improve adaptability and maintenance

29

Comparison

vs. amco05: Relies on asynchronous data propagation utility Not completely transparent May not work for heterogeneous DBMSs Allows stale data

vs. gula04: Cache constraints against C&C constraints Doesn’t provide any guarantees of freshness/consistency Relatively more transparent Maintenance-centric vs. query-centric Both deployed as mid-tier level caches Both use a full-fledged DBMS Both use Materialized views Both use two-headed query plans

30

Discussion

Is it really that good ? Using full-fledged DBMS at each middle-tier node, drawbacks ? How is data freshness specified/guaranteed ? Is it adaptable ? Weakly ? Strongly ? When can cache constraints become bottleneck ? Size of dynamic cache tables ?

Cache replacement policies/cleansing mechanisms? Caching of other physical & logical DB Objects ?

Updates to those objects in backend DB? Message traffic between Cache Daemon & Backend DB ?

Very frequent updates in backend DB Local updates ? Flaws in performance evaluation ?