Query Caching and View Selection for XML Databases

25
Query Caching and View Selection for XML Databases Bhushan Mandhani Dan Suciu University of Washington Seattle, USA

description

Query Caching and View Selection for XML Databases. Bhushan Mandhani Dan Suciu University of Washington Seattle, USA. Caching in Query Processing. Buffer Cache: In-memory pool of disk pages. Semantic Cache: Stores query results instead. Can be in-memory or on disk. - PowerPoint PPT Presentation

Transcript of Query Caching and View Selection for XML Databases

Page 1: Query Caching and View Selection for XML Databases

Query Caching and View Selection for XML Databases

Bhushan MandhaniDan SuciuUniversity of WashingtonSeattle, USA

Page 2: Query Caching and View Selection for XML Databases

Caching in Query Processing

Buffer Cache: In-memory pool of disk pages.

Semantic Cache: Stores query results instead. Can be in-memory or on disk. Proposed in [Franklin et al., VLDB 96] for use in

client-server systems.

Page 3: Query Caching and View Selection for XML Databases

Semantic XPath Cache

A cached query result is a materialized view.

View V answers query Q if there exists C s.t C º V = Q

C, V & Q are all XPath here. Cache contains some views {V1,…,Vn}. Query Q is a hit if some Vi answers it.

Page 4: Query Caching and View Selection for XML Databases

Motivation for the Semantic Cache

C º V = Q : a cache hit results in a simpler query, on a much smaller XML fragment.

Cache hits were processed two orders of magnitude faster than misses.

Can also be maintained outside the database Ex: application tier.

Application tier caching for database-driven websites has become increasingly popular [Mohan et al., SIGMOD 02].

Page 5: Query Caching and View Selection for XML Databases

Example Illustrating C º V = Q

V = /a[u[@v]/w]/b[x//y] Q = /a[u[w]/@v]/b[x/y//z]/c C = /b[x/y//z]/c

Page 6: Query Caching and View Selection for XML Databases

The Query/View Answerability Problem

Given view V and query Q, does V answer Q, and if yes, then what should C be?

Page 7: Query Caching and View Selection for XML Databases

XPath Tree Patterns

XPath queries have natural representation as tree patterns.

E.g. If Q = a[v]/b[@w=''val1''][x[.//y]]//c[z>val2]

The Query Axis is the path from the root node to the result node.

The Query Depth is the number of axis nodes.

Prefix(Q, k) is the query obtained by truncating Q at its k-th axis node. E.g. Prefix(Q, 2) = a[v]/b

Preds(Q, k) is the set of predicates of the k-th axis node of Q. E.g. Preds(Q, 2) ={[@w=''val1''], [x[.//y]]}

Page 8: Query Caching and View Selection for XML Databases

XPath Containment

A С B if the result of A will always be a subset of the result of B.

Checking XPath containment is coNP-complete [Miklau & Suciu, PODS 02].

A containment mapping maps nodes in B’s tree pattern to those in A’s, and establishes A С B. It is a sound but incomplete condition. It can be determined in polynomial time.

Containment is different from answerability. E.g. let A = /a[x]/b, B = /a/b

Page 9: Query Caching and View Selection for XML Databases

Criteria for Query/View Answerability

“Rewriting XPath Queries using Materialized Views”, Xu and Ozsoyoglu, VLDB 2005

Page 10: Query Caching and View Selection for XML Databases

An Example of Answerability Checking

Let V = /a[u[@v]/w]/b[x//y]

Let Q = /a[u[w]/@v]/b[x/y//z]/c

Prefix(V, 2) = /a[u[@v]/w]/b and Prefix(Q, 2) = /a[u[w]/@v]/b. The first condition is satisfied.

Preds(V, 2) = {[x//y]} and Preds(Q, 2) = {[x/y//z]}. The second condition is also satisfied.

Page 11: Query Caching and View Selection for XML Databases

Two Theoretical Results

Theorem: If two tree patterns are minimal, and containment mappings exist both ways, then they are isomorphic.

Theorem: If some view V answers query Q, then C can be set to the subtree of Q rooted at its k-th axis node, where k is the query depth of V.

Page 12: Query Caching and View Selection for XML Databases

The Cost of Answerability Checking Answerability checking involves tree

operations: Checking isomorphism between trees. Looking for containment mappings between trees.

Cache lookup by checking each view could give a high lookup overhead.

Our Solution: Check answerability by string matching. This will be: Cheaper Amenable to indexing in a standard RDBMS.

Page 13: Query Caching and View Selection for XML Databases

Checking Tree Isomorphism

We represent each tree pattern with its “normalized” query string.

To obtain the normalized query: At each axis node, do a DFS into each predicate

subtree. Before returning from a node, append its children

node labels to its label, in lexicographic order. Concatenate all axis node labels to get the

normalized query.

Page 14: Query Caching and View Selection for XML Databases

Checking Predicate Containment

For the tree of each predicate in Preds(Q, k), generate all trees that “containment map” to it.

ConPreds(Q, k) is the set of normalized predicates obtained from these generated trees.

Page 15: Query Caching and View Selection for XML Databases

String-Based Criteria for Answerability

The second condition is tweaked to correctly support comparison predicates.

Page 16: Query Caching and View Selection for XML Databases

Cache Organization

• Let V = /a[u]/b[v][@x=50][y/z=“str”].

• Insert into XmlData (‘<result></result>’). Record viewId.

• Insert into Prefix (‘/a[u]/b’). Record prefixId.

• Insert into View (viewId, prefixId, ‘@x=50’, ’y/z=“str”|v’).

Page 17: Query Caching and View Selection for XML Databases

Cache Lookup

• Prefix(V, k) = P.prefix = Prefix(Q, k).

• V.pred Є Preds(V, k) and V.pred Є ConPreds(Q, k).

Page 18: Query Caching and View Selection for XML Databases

Cache Warm-up as View Selection

Given the warm-up workload W, we generate another workload S.

S is likely to have overlap with the test workload.

Problem: Choose some m views from S which together answer a maximal subset of S.

This is a variant of the set cover problem, which is NP-complete.

We use a variant of the greedy approximation algorithm for set cover.

Page 19: Query Caching and View Selection for XML Databases

Experimental Setup

The data instance was a 300 MB XML document created using the XMark generator.

Expts run on a Pentium 4, 512 MB RAM machine.

SQL Server 2005 beta used for storing the XML data as well as the cache.

We compare our cache with a naïve cache which stores (query string, result XML) pairs.

Page 20: Query Caching and View Selection for XML Databases

Query Workloads Used

We implemented our own XPath query generator for creating workloads for our expts.

Query structure is generated randomly. Value for each predicate is obtained by

sampling from its set of values taken, using Zipf distribution.

Queries with depths 3, 4 and 5 were given 2, 2 and 3 predicates respectively.

Page 21: Query Caching and View Selection for XML Databases
Page 22: Query Caching and View Selection for XML Databases
Page 23: Query Caching and View Selection for XML Databases
Page 24: Query Caching and View Selection for XML Databases
Page 25: Query Caching and View Selection for XML Databases

Conclusions

We defined a notion of query/view answerability for XPath.

We showed how an efficient semantic cache based on these ideas can be employed.

We demonstrated the scalability of cache lookup, and performance gains in query processing.