RIOS: Runtime Integrated Query Optimizer forSpark · Query Optimization Two steps process...

Post on 04-Aug-2020

9 views 0 download

Transcript of RIOS: Runtime Integrated Query Optimizer forSpark · Query Optimization Two steps process...

RIOS: Runtime Integrated QueryOptimizer for Spark

YoufuLi(UCLA),MingdaLi(UCLA), LingDing(UCLA)andMatteoInterlandi(Microsoft)

CloudComputingPrograms

OpenSourceData-IntensiveScalableComputing(DISC)Platforms:HadoopMapReduceandSpark◦ functionalAPI◦ mapand reduceUser-DefinedFunctions◦ RDDtransformations(filter,flatMap,zipPartitions,etc.)

Severalyearslater,introductionofhigh-levelSQL-likedeclarativequerylanguages(andsystems)◦ Conciseness◦ Pickaphysicalexecutionplanfromanumberofalternatives

QueryOptimizationTwostepsprocess◦ Logical optimizations(e.g.,filterpushdown)◦ Physical optimizations(e.g.,joinordersandimplementation)

PhysicaloptimizerinRDMBS:◦ Cost-based◦ Datastatistics (e.g.,predicateselectivities,costofdataaccess,etc.)

Theroleofthecost-basedoptimizeristo(1) enumeratesomesetofequivalentplans(2) estimatethecostofeach(3) selectasufficientlygoodplan

QueryOptimization:WhyImportant?

0.25 1 4

16 64

256 1024 4096

16384

W R W R W R W R

431

17

343

21

unab

le to

fini

sh in

5+

hour

s

276

1510

2

954

Tim

e (s

)

Scale Factor = 10

Spark

AsterixDB

Hive

Pig

0.25 1 4

16 64

256 1024 4096

16384

W R W R W R W R

431

17

343

21

unab

le to

fini

sh in

5+

hour

s

276

1510

2

954

Tim

e (s

)

Scale Factor = 10

Spark

AsterixDB

Hive

Pig

QueryOptimization:WhyImportant?

BadplansoverBigDatacanbedisastrous!

Challenges for Cost-based OptimizerinDISC

Lack of upfrontstatistics:◦ datasitsinHDFSandunstructured

Evenifinputstatisticsareavailable:◦ Correlations betweenpredicates◦ Exponentialerrorpropagationinjoins◦ ArbitraryUDFs

Cost-based OptimizerinDISC:StateoftheArt

Pre-existing statistics

Bad statistics

No upfront statistics

Cost-based OptimizerinDISC:StateoftheArt

• Collect and store statistics• Noruntimeplanrevision

Pre-existing statistics◦ Spark CBO[1]

Bad statistics

No upfront statistics

Cost-based OptimizerinDISC:StateoftheArt

• Collect and store statistics• Noruntimeplanrevision

Pre-existing statistics◦ Spark CBO[1]

Bad statistics◦ AdaptiveQueryplanning[2]

No upfront statistics

Assumptionisthatsomeinitialstatisticsexist

Cost-based OptimizerinDISC:StateoftheArt

Pre-existing statistics◦ Spark CBO[1]

Bad statistics◦ AdaptiveQueryplanning[2]

No upfront statistics◦ Pilotruns(samples)[3]

Assumptionisthatsomeinitialstatisticsexist

• Samplesareexpensive• Onlyforeign-keyjoins• Noruntimeplanrevision

• Collect and store statistics• Noruntimeplanrevision

Traditional Query Planning VS RIOS

Query Optimizer

Physical Plan

Execution

Logical Plan

Statistics

Traditional Query Planning VS RIOS

Query Optimizer

Physical Plan

Execution

Logical Plan

Statistics

Logical Plan

Planning ExecutionRuntime Stats

Physical plan

RIOS

RuntimeIntegratedOptimizerforSparkKeyidea:Execute-Gather-Aggregate-Planstrategy(EGAP)◦ Queryplansarelazily executed◦ Statisticsaregatheredatruntime◦ Aggregate statistics aftergathering◦ Joinsaregreedily planned for execution◦ Plan canbedynamicallychanged ifabaddecision wasmade

Neitherupfrontstatisticsnorpilotrunsarerequired◦ Rawdatasetsizeisrequiredforinitialguess

Supportfornotforeign-keyjoins

RuntimeOptimizer:anExample

BAA

CAA

AAB

AAC

Assumption: A < C

RuntimeOptimizer:ExecuteStep

BAA

CAA

AAB

AAC

A

B

C

Assumption: A < C

RuntimeOptimizer:Gatherstep

BAA

CAA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < C

U

RuntimeOptimizer:Aggregatestep

BAA

CAA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < C

Driver

RuntimeOptimizer:Planstep

BAA

CAA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < C

U

RuntimeOptimizer:Executestep

BAA

CAA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < C

UAB

RuntimeOptimizer:Gatherstep

BAA

CAA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < C

UAB

S

RuntimeOptimizer:Planstep

BAA

CAA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < C

UAB

S

RuntimeOptimizer:Executestep

BAA

CAA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < C

UAB

S

ABCABCABC

RuntimeOptimizer:WrongGuess

B(A)A

σ(C)AA

AAB

AAC

Assumption: A < Cσσ(A) > σ(C)

RuntimeOptimizer:WrongGuess

B(A)A

σ(C)AA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < Cσσ(A) > σ(C)

RuntimeOptimizer:WrongGuess

B(A)A

σ(C)AA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < Cσ

B

σ(A) > σ(C)Repartition Step

RuntimeOptimizer:WrongGuess

B(A)A

σ(C)AA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < Cσ

B

σ(A) > σ(C)

RuntimeOptimizer:WrongGuess

B(A)A

σ(C)AA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < Cσ

B

BC

S

σ(A) > σ(C)

RuntimeOptimizer:WrongGuess

B(A)A

σ(C)AA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < Cσ

B

BC

S

σ(A) > σ(C)

RuntimeOptimizer:WrongGuess

B(A)A

σ(C)AA

AAB

AAC

A

B

C

S

S

S

S

Assumption: A < Cσ

B

BC

SABCABCABC

σ(A) > σ(C)

RuntimeIntegratedOptimizerforSpark

Spark batch execution model allows late binding of joins

Set of Statistics:◦ Joinestimations(basedonsamplingorsketches)◦ Numberofrecords◦ Averagesizeofeachrecord

Statisticsareaggregated usingaSparkjoboraccumulators

Joinimplementationsarepickedbasedonthresholds

ChallengesandOptimizations

Execute- Blockandreviseexecutionplanswithoutwastingcomputation

Aggregate- Efficientaccumulationofstatistics

Plan- Trytoscheduleasmanybroadcastjoinsaspossible

Experiments

Q1:WhataretheperformanceofRIOScomparedtoregularSpark,pilotruns and Spark-CBO?

Q2:Howexpensivearewrongguesses?

Minibenchmarkwith3FactTables

4 16 64

256 1024 4096

16384

1 10 100 1000

11

25

92

1482

9

24

86

1353

11

25

93

1521

74

1312

9095

unab

le to

fini

sh in

5+

hour

s

69

1242

8951

unab

le to

fini

sh in

5+

hour

s

Tim

e (s

)

Scale Factor

spark good-orderRIOS R2RRIOS W2R

pilot-runspark wrong-order

Minibenchmarkwith3FactTables

Q1:RIOSisalwaysfasterthanSparkandpilotrun

4 16 64

256 1024 4096

16384

1 10 100 1000

11

25

92

1482

9

24

86

1353

11

25

93

1521

74

1312

9095

unab

le to

fini

sh in

5+

hour

s

69

1242

8951

unab

le to

fini

sh in

5+

hour

s

Tim

e (s

)

Scale Factor

spark good-orderRIOS R2RRIOS W2R

pilot-runspark wrong-order

Minibenchmarkwith3FactTables

Q2:Notmuch,around15%intheworstcase

4 16 64

256 1024 4096

16384

1 10 100 1000

11

25

92

1482

9

24

86

1353

11

25

93

1521

74

1312

9095

unab

le to

fini

sh in

5+

hour

s

69

1242

8951

unab

le to

fini

sh in

5+

hour

s

Tim

e (s

)

Scale Factor

spark good-orderRIOS R2RRIOS W2R

pilot-runspark wrong-order

TPCDSandTPCHQueries

4

16

64

256

1024

4096

16384

1 10 100 1000 1 10 100 1000 1 10 100 1000 1 10 100 1000Query 17 Query 50 Query 28 Query 9

19

37

130

2468

15

25

114

1879

13 17

51

1298

30

62

255

6717

17 19

69

2171

10 14

66

1861

9 12

30

1109

16

29

187

6589

11

17

56

1973

10 14

45

1437

9 11

28

1073

12

21

109

5760

15

22

72

2114

13 18

60

1779

13 17

47

1177

19

30

239

7632

23

40

176

3253

21

34

163

2308

15

22

61

1399

44

97

828

unab

le to

fini

sh in

5+

hour

s

Tim

e (s

)

Scale Factor

spark good-orderCBO-plan

RIOSpilot-run

spark bad-order

TPCDSandTPCHQueries

Q1:RIOSisalwaysthefasterapproach

4

16

64

256

1024

4096

16384

1 10 100 1000 1 10 100 1000 1 10 100 1000 1 10 100 1000Query 17 Query 50 Query 28 Query 9

19

37

130

2468

15

25

114

1879

13 17

51

1298

30

62

255

6717

17 19

69

2171

10 14

66

1861

9 12

30

1109

16

29

187

6589

11

17

56

1973

10 14

45

1437

9 11

28

1073

12

21

109

5760

15

22

72

2114

13 18

60

1779

13 17

47

1177

19

30

239

7632

23

40

176

3253

21

34

163

2308

15

22

61

1399

44

97

828

unab

le to

fini

sh in

5+

hour

s

Tim

e (s

)

Scale Factor

spark good-orderCBO-plan

RIOSpilot-run

spark bad-order

ConclusionsRIOS:cost-based queryoptimizerforSpark

Statisticsaregatheredatruntime(noneedforinitialstatisticsorpilotruns)

Latebindofjoins

Upto2xfasterthanthebestplangeneratedbypilotrun,and>100Xthanpreviousapproachesforfacttablejoins.

ExperimentConfiguration

๏ Datasets:• TPCDS• TPCH

๏ Configuration:• 16 machines, 4 cores (2 hyper threads per core)

machines, 32GB of RAM, 1TB disk• Spark 2.2.1• Scale factor from 1 to 1000 (~1TB)

Reference1. https://databricks.com/blog/2017/08/31/cost-based-optimizer-in-apache-

spark-2-2.html.2. S.Agarwal,S.Kandula,N.Bruno,M.-C.Wu,I.Stoica,andJ.Zhou.Re-optimizing

data-parallelcomputing.InNSDI,2012.3. K.Karanasos,A.Balmin,M.Kutsch,F.Ozcan,V.Ercegovac,C.Xia,andJ.

Jackson.Dynamicallyoptimizingqueriesoverlargescaledataplatforms.InSIGMOD,pages943–954.

4. S.Chaudhuri,G.Das,andV.Narasayya.Optimizedstratifiedsamplingforapproximatequeryprocessing.TODS,32(2),jun 2007.

5. O.Papapetrou,W.Siberski,andW.Nejdl.Cardinalityestimationanddynamiclengthadaptationforbloomfilters.DistributedandParallelDatabases,28(2):119– 156,2010.

Thankyou