Raghavendra Madala

22
ICICLES: SELF-TUNING SAMPLES FOR APPROXIMATE QUERY ANSWERING BY VENKATESH GANTI, MONG LI LEE, AND RAGHU RAMAKRISHNAN CSE6339 – DATA EXPLORATION Raghavendra Madala

description

ICICLES: Self-tuning Samples for Approximate Query Answering By Venkatesh Ganti, Mong Li Lee, and Raghu Ramakrishnan CSE6339 – Data exploration. Raghavendra Madala. In this presentation…. Introduction Icicles Icicle Maintenance Icicle-Based Estimators Quality Guarantee - PowerPoint PPT Presentation

Transcript of Raghavendra Madala

Page 1: Raghavendra  Madala

ICICLES: SELF-TUNING SAMPLES FOR APPROXIMATE QUERY ANSWERING

BY VENKATESH GANTI, MONG LI LEE, AND RAGHU RAMAKRISHNAN

CSE6339 – DATA EXPLORATION

Raghavendra Madala

Page 2: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

2

In this presentation… Introduction

Icicles

Icicle Maintenance

Icicle-Based Estimators

Quality Guarantee

Performance Evaluation

Conclusion

Page 3: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

3

Introduction Analysis of data in data warehouses

useful in decision support• OLAP-provide interactive response times

to aggregate queries• AQUA- Approximate query answering

systems provide very fast alternatives to OLAP systems

Page 4: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

4

Approaches• Sampling-based• Histogram-based• Probabilistic-based• Wavelet-based• Clustering-based

Page 5: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

5

Join synopsisIs a Uniform Random Sampling

• All tuples are assumed to be equally important• OLAP queries follow a predictable repetitive

pattern• Sampling wastes precious main-memory• Join of random samples of base relations may

not be a random sample of the join of the base relations. This is basis for Join Synopsis by Gibbons

Page 6: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

6

Why Icicles?• To capture the data locality of aggregate

queries on foreign key joins• Is expected to consist of more tuples in

regions that are accessed more frequently• Sample relation space better utilized if

more samples from actual result set are present

• Dynamic algorithm that changes the sample to suit the queries being executed in the workload

Page 7: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

7

Icicles Is a uniform random sample of a

multiset of tuples L (an extension of R), which is the union of a relation R and all sets of tuples that were required to answer queries in the workload

Page 8: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

8

Icicle Maintenance The intuition is to incrementally

maintain a sample, called icicles.

We maintain an icicle such that the probability of a tuple being selected is proportional to frequency with which it is required to answer queries(exactly).

Page 9: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

9

Icicle Maintenance Algorithm

Efficient incremental maintenance is possible for the the following reasons• Uniform Random Sample of L(extension of

relation R) ensures that tuple’s selection in the icicle is proportional to it’s frequency

• Incremental maintenance of icicle requires only the segment of R that satisfies the new query each time

• Reservoir Sampling Algorithm is used to stream each tuple being appended to L.

Page 10: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

10

Icicle Maintenance Algorithm

Page 11: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

11

Icicle Maintenance Example

Page 12: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

12

Icicle-Based Estimators• Icicle is a non-uniform sample of original

data• Frequency must be maintained over all

tuples• Different Estimation mechanisms for

Average, Count and Sum

Page 13: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

13

Estimators for Aggregate queries

• Average is the average of distinct

tuples in sample satisfying query• Count is the sum of expected

contributions of all tuples in icicle that satisfy the query

• Sum is the product of average and count

Page 14: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

14

Maintaining Frequency Relation

• Add Frequency Attribute to the Relation R• Frequency of each tuples is set to 1• Frequency incremented each time when a

tuple is used to answer a query• Frequencies of relevant tuples updated

only when icicle updated with new query

Page 15: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

15

Quality Guarantees• When queries in workload exhibit data

locality, then icicles consists of more tuples from frequently accessed subsets of the relation

• Accuracy improves with increase in number of tuples used to compute it

Page 16: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

16

Performance EvaluationPlots definition:• Static sample:

Uniform random sample on the relation• Icicle:

Icicle evolves with the workload• Icicle-complete

The tuned icicle again on the same workload

Page 17: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

17

Performance EvaluationSELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)FROM LI, C, O, S, N, RWHERE C_Custkey=O_Custkey AND O_Orderkey=LI_Orderkey AND LI_Suppkey=S_Suppkey AND C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998

SELECT COUNT(*), AVG(LI_Extendedprice), SUM(LI_Extendedprice)FROM LICOS-icicle, N, RWHERE C_Nationkey = N_Nationkey AND N_Regionkey = R_Regionkey AND R Name = [region] AND O Orderdate >= Date[startdate] AND O Orderdate <= 12-31-1998

Qworkload : Template for generating workloads

Template for obtaining approximate answers

Page 18: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

18

Performance Evaluation

Page 19: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

19

Performance Evaluation

Page 20: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

20

Conclusion• Icicles are class of samples that are

sensitive to workload characteristics• Adapt quickly to changing workload• Icicles are useful when the workload

focuses on relatively small subsets in relation

• Icicle is a trade-off between accuracy and cost

Page 21: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

21

References• V. Ganti, M. Lee, and R. Ramakrishnan.

ICICLES: Self-tuning Samples for Approximate Query Answering. VLDB Conference 2000.

Page 22: Raghavendra  Madala

ICICLES: Self-tuning Samples for Approximate Query

22

Thank you!