Join Synopses for Approximate Query Answering

24
Join Synopses for Approximate Query Answering Swarup Acharya, Philip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy By Vladimir Gamaley

description

Join Synopses for Approximate Query Answering. Swarup Acharya, Philip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy. By Vladimir Gamaley. Abstract. In large data environments its difficult to provide fast and reliable answers. - PowerPoint PPT Presentation

Transcript of Join Synopses for Approximate Query Answering

Page 1: Join Synopses for Approximate Query Answering

Join Synopses for Approximate Query Answering

Swarup Acharya, Philip B. Gibbons, Viswanath Poosala, Sridhar Ramaswamy

By Vladimir Gamaley

Page 2: Join Synopses for Approximate Query Answering

Abstract

• In large data environments its difficult to provide fast and reliable answers.

• In this paper we demonstrate the difficulties in the traditional approach and propose the new technics for evaluation and maintenance of Join Synopses.

Page 3: Join Synopses for Approximate Query Answering

Introduction

• Tradition query processing approach (exact answer though minimal time use)

• Not always an exact answer is needed

• Sometimes appropriate answer is enough

Page 4: Join Synopses for Approximate Query Answering

Introduction (continued)

• Schemes for providing approximate answers that rely on basic relations alone suffer from serious disadvantages.

• Use of precomputed small sets of distinguished joins.

Page 5: Join Synopses for Approximate Query Answering

Introduction (continued)

• Careful allocation of place

• Allocation heuristics

• Providing approximate bounds

• Join synopses maintenance

• Experimental study results

Page 6: Join Synopses for Approximate Query Answering

AQUA System

• The goal of AQUA system is to improve response times by avoiding accesses to the original data.

• Maintenance of small synopses of various samples and histograms.

Page 7: Join Synopses for Approximate Query Answering

AQUA System (Components)

•Statistics Collection

•Query Rewriting

•Maintenance

Page 8: Join Synopses for Approximate Query Answering

AQUA System (Architecture)

AquaSynopses

Data Warehouse

AQUAAnswersBounds

Queries

NewData

Page 9: Join Synopses for Approximate Query Answering

Problems with joins

Uniform random samples provide:

• Non uniform result samples

• Small join results sizes

Page 10: Join Synopses for Approximate Query Answering

Problems with joins (example)R.X

a

a

bb

a

b

a1

a2

b1

S.X

Base probability for tuple to be selected 1/ra1 and a2 - 1/r3

a1 and b1 - 1/r4

for k way foreign join - 1/rk

Page 11: Join Synopses for Approximate Query Answering

Join Synopses

Foreign Key Join:

A two way join r1 r2 is a foreign key join if the join attribute is a foreign key in r1 (a key in r2).

For k > 2, a k-way foreign join if there is an ordering r1,r2..rk and for j = 1,2,.. K, si-1 ri is a 2-way foreign join where

si-1 is a relation obtained joining r1, r2, … ri-1

Page 12: Join Synopses for Approximate Query Answering

Join synopsesTPC-D scheme

L

PS

S

N

R

C

O

P

Page 13: Join Synopses for Approximate Query Answering

Join synopses (continued)

Lemma 1:

The subgraph of G on the k nodes in any k - way foreign key join must be a connected graph with a single root node

Lemma 2:

There is a 1-1 correspondence between tuples in a relation r1 and tuple in any k-way foreign key join.

Page 14: Join Synopses for Approximate Query Answering

Join Synopses (continued)

Join Synopses:

For each node u in G, corresponding to a relation r1, define J(u) to be the output of the maximum foreign key join r1,r2..rk with source k1. Let Su be a uniform random sample of r1. Define a join synopses J(Su) to be the output of join Su, r2, ..rk. The join synopses for scheme consists of join synopses for all u’s.

Page 15: Join Synopses for Approximate Query Answering

Join Synopses (continued)Theorem:

Let r1,r2…rk, k>=3 be an arbitrary k-way foreign join, with source relation r1. Let u be the node in G corresponding to r1 and let Su be a uniform random sample of r1. Let A be the set of attributes in r1,r1…rk

Then:

1. J(Su) is a uniform random sample of J(u) with |Su| tuples

2. Join r1, r2…rk is a projection of J(u) on the attributes in r1, r2…rk

Page 16: Join Synopses for Approximate Query Answering

Join Synopses (continued)

Lemma:

From a single join synopses for a node whose maximum foreign key has k relations we can extract uniform random sample of between k-1 to 2k-1 -1 distinct foreign key joins

Lemma:

For any node u whose maximum foreign key join is a k-way join, number of tuples in its renormalized join synopsis J(Su) is at most k|Su|

Page 17: Join Synopses for Approximate Query Answering

Space allocation strategies

ni - numbr of tuples allocated to the joinfi- fraction of queries for which the join is a relation or the source of foreign key join

Theorem:

ni = N’ N’ = N/

si - size of join tuple

i ni

fi

3/2

si

fi

j

sjfj 3/13/2

Page 18: Join Synopses for Approximate Query Answering

Space allocation strategies

Heuristics

EqJoin: Equally between relations

CubeJoin: In proportion to the cube root of their join synopses tuple size

PropJoin: In proportion to their join synopses size.

Page 19: Join Synopses for Approximate Query Answering

Maintenance of Join Synopses

Adding a tuple

Deleting a tuple

Page 20: Join Synopses for Approximate Query Answering

Experiments Results

TestBed:

TPC-D decision benchmark. Scale factor 0.3 (database of about 300 megabytes).

296MHz UltraSparc-II. I/O - 5 MB/sec

Page 21: Join Synopses for Approximate Query Answering

Experiment 1accuracy - summary size

EquiBase, PropBase - produce answers only when the summary size exceeds 1.5% of the database.

EquiJoin, PropJoin - good results even for 0.1% of the database.

Page 22: Join Synopses for Approximate Query Answering

Experiment 2execution timing

Actual execution time - 122 seconds.

The response time increases with the summary size.

Query using Join Synopses needs in two orders less time!

Page 23: Join Synopses for Approximate Query Answering

Related Work

Approximate query answering

Statistical techniques

Page 24: Join Synopses for Approximate Query Answering

Conclusions

Schemes based on join synopses provide better answer than those, based only on the basic relations samples.

Approximate answering is becoming extremely important in new application of data warehouses.

However, there are still more problems: group-bys, ranks etc...