SURAJIT CHAUDHURI RAJEEV MOTWANI VIVEK NARASAYYA On random sampling over Joins Presented by :...

17
SURAJIT CHAUDHURI RAJEEV MOTWANI VIVEK NARASAYYA On random sampling over Joins Presented by : Srikantha Nema

description

Terminologies SAMPLE(R, f) is an SQL operation When a query Q is evaluated, we obtain relation R f is a fraction of a relation R

Transcript of SURAJIT CHAUDHURI RAJEEV MOTWANI VIVEK NARASAYYA On random sampling over Joins Presented by :...

SURAJIT CHAUDHURIRAJEEV MOTWANIVIVEK NARASAYYA

On random sampling over Joins

Presented by : Srikantha Nema

Outline

Semantics of SampleDifficulty of join SamplingAlgorithms for SamplingSampling strategiesNew strategies for join SamplingExperimental evaluationConclusions

Terminologies

SAMPLE(R, f) is an SQL operation

When a query Q is evaluated, we obtain relation R

f is a fraction of a relation R

Semantics of Sample

Sampling with Replacement (WR)

Sampling without Replacement (WoR)

Independent Coin Flips (CF)

Difficulty of Join Sampling

,,,...,,,,,,, 23212011 kbabababaBAR

kcacacacaCAR ,,....,,,,,,, 12111022

),( 21 fRRSAMPLE

),(),( 2211 fRSAMPLEfRSAMPLE ?

Classification of Join Sampling problem

Case A No information is available for either or

Case B No information is available for but indexes and

/or statistics are available for Case C

Indexes/statistics are available for and

1R 2R

1R2R

1R 2R

Algorithms for Sampling

Unweighted Sequential WR Sampling Black-Box U1 Black-Box U2

Weighted Sequential WR Sampling Black-Box WR1 Black-Box WR2

Unweighted Sequential WR Sampling

Black-Box U2

Black-Box U1

Weighted Sequential Sampling

Black-Box WR1

Black-Box WR2

Sampling Strategies (old)

Strategy Naïve-Sample

Strategy Olken-Sample

New strategies for join Sampling

Strategy Stream-Sample

Strategy Group-Sample

Strategy Frequency-Partition-Sample

Strategy Frequency-Partition-Sample

Experimental Evaluation 1

Experimental Evaluation 2

Experimental Evaluation 3

Conclusions

Difficulty of join samplingClassification of the problem into 3 casesStrategies for join samplingNew schemes for sequential random

sampling for uniform and weighted samplingMore efficient strategies can be developed

for the case of single joinMore work needed to understand the

problem of sampling the result of join trees

Thank You