Post on 23-Feb-2016
description
SURAJIT CHAUDHURIRAJEEV MOTWANIVIVEK NARASAYYA
On random sampling over Joins
Presented by : Srikantha Nema
Outline
Semantics of SampleDifficulty of join SamplingAlgorithms for SamplingSampling strategiesNew strategies for join SamplingExperimental evaluationConclusions
Terminologies
SAMPLE(R, f) is an SQL operation
When a query Q is evaluated, we obtain relation R
f is a fraction of a relation R
Semantics of Sample
Sampling with Replacement (WR)
Sampling without Replacement (WoR)
Independent Coin Flips (CF)
Difficulty of Join Sampling
,,,...,,,,,,, 23212011 kbabababaBAR
kcacacacaCAR ,,....,,,,,,, 12111022
),( 21 fRRSAMPLE
),(),( 2211 fRSAMPLEfRSAMPLE ?
Classification of Join Sampling problem
Case A No information is available for either or
Case B No information is available for but indexes and
/or statistics are available for Case C
Indexes/statistics are available for and
1R 2R
1R2R
1R 2R
Algorithms for Sampling
Unweighted Sequential WR Sampling Black-Box U1 Black-Box U2
Weighted Sequential WR Sampling Black-Box WR1 Black-Box WR2
Unweighted Sequential WR Sampling
Black-Box U2
Black-Box U1
Weighted Sequential Sampling
Black-Box WR1
Black-Box WR2
Sampling Strategies (old)
Strategy Naïve-Sample
Strategy Olken-Sample
New strategies for join Sampling
Strategy Stream-Sample
Strategy Group-Sample
Strategy Frequency-Partition-Sample
Strategy Frequency-Partition-Sample
Experimental Evaluation 1
Experimental Evaluation 2
Experimental Evaluation 3
Conclusions
Difficulty of join samplingClassification of the problem into 3 casesStrategies for join samplingNew schemes for sequential random
sampling for uniform and weighted samplingMore efficient strategies can be developed
for the case of single joinMore work needed to understand the
problem of sampling the result of join trees
Thank You