Materialized View Selection and Maintenance using Multi-Query Optimization

35
Materialized View Materialized View Selection and Selection and Maintenance using Maintenance using Multi-Query Multi-Query Optimization Optimization Hoshi Mistry Hoshi Mistry Prasan Roy Prasan Roy S. Sudarshan S. Sudarshan Krithi Ramamritham Krithi Ramamritham

description

Materialized View Selection and Maintenance using Multi-Query Optimization. Hoshi Mistry Prasan Roy S. Sudarshan Krithi Ramamritham. Materialized Views. Complex results materialized in order to speed up queries that depend on these results - PowerPoint PPT Presentation

Transcript of Materialized View Selection and Maintenance using Multi-Query Optimization

Page 1: Materialized View Selection and Maintenance using Multi-Query Optimization

Materialized View Materialized View Selection and Selection and

Maintenance using Maintenance using Multi-Query Multi-Query OptimizationOptimization

Hoshi MistryHoshi MistryPrasan RoyPrasan Roy

S. SudarshanS. SudarshanKrithi RamamrithamKrithi Ramamritham

Page 2: Materialized View Selection and Maintenance using Multi-Query Optimization

Materialized ViewsMaterialized ViewsComplex results materialized in order

to speed up queries that depend on these results

Increasingly being supported by commercial database systems (e.g. Oracle8i)

Crucial in data warehousing environments

Page 3: Materialized View Selection and Maintenance using Multi-Query Optimization

Materialized View Materialized View MaintenanceMaintenanceAs underlying data changes, the

materialized views need to be refreshed

Efficient view maintenance crucial! Need to provide up-to-date query

responses growing Amount of data added to data

warehouses increasing Maintenance time window shrinking

Page 4: Materialized View Selection and Maintenance using Multi-Query Optimization

FocusFocusEfficient techniques for maintenance of

a set of materialized views (MVs) byTransient materialization of common

subexpressions (CSEs)Selection of additional MVsComputation of the best maintenance

policy and plan for each MV

Page 5: Materialized View Selection and Maintenance using Multi-Query Optimization

Transient Materialization Transient Materialization of Common of Common SubexpressionsSubexpressionsCSEs materialized to reduce maintenance cost

by sharing computation, disposed after use

Motivated by Blakeley et al. [SIGMOD86], Ross et al. [SIGMOD96] – Huge search space; considered impractical

Earlier work by Sellis [TODS88] Efficient heuristic algorithms proposed by

Roy et al. [SIGMOD00]

Page 6: Materialized View Selection and Maintenance using Multi-Query Optimization

Selection of Additional Selection of Additional MVsMVsAdditional views materialized permanently

to reduce the overall maintenance cost

Motivated by Ross et al. [SIGMOD96]– restricted to incremental maintenance only– do not consider transient materialization

MV selection in general addressed in Roussopolous [TODS82], Agrawal et al. [VLDB00]

Page 7: Materialized View Selection and Maintenance using Multi-Query Optimization

Best Maintenance Policy Best Maintenance Policy and Plan Computationand Plan ComputationFor each MV, Determine the best maintenance policy

(incremental or recomputation) Find the corresponding best plan Earlier work by Vista [EDBT98]

– Does not take into account transient materialization of CSEs or presence of other MVs

Current systems need manual specification of the maintenance policy

Page 8: Materialized View Selection and Maintenance using Multi-Query Optimization

ContributionContributionA framework that consolidates the choice

ofCSEs to be transiently materializedAdditional MVs Best maintenance plan

(incremental/recomputation) Integrated with a state of the art query

optimizer (Volcano [ICDE93])

Page 9: Materialized View Selection and Maintenance using Multi-Query Optimization

ExampleExample

dAdA BB CC DD dEdE

BCBCDEDE

ABCABC CDECDE BCDEBCDE

mergemerge

mergemerge

incremental refreshincremental refresh recomputationrecomputation recomputationrecomputation

incremental refreshincremental refresh

permanentpermanent permanentpermanent permanentpermanent

permanentpermanent

transienttransient

initial setinitial set

Page 10: Materialized View Selection and Maintenance using Multi-Query Optimization

ApproachApproach Setting up the search space of

maintenance plans Best maintenance plan

computation Transient/Permanent materialized

view selection

Page 11: Materialized View Selection and Maintenance using Multi-Query Optimization

ApproachApproach Setting up the search space of

maintenance plans Best maintenance plan

computation Transient/Permanent materialized

view selection

Page 12: Materialized View Selection and Maintenance using Multi-Query Optimization

Setting Up the Setting Up the Maintenance Plan SpaceMaintenance Plan Space

The Query DAG representation for recomputation plans

Incorporating incremental plans

Page 13: Materialized View Selection and Maintenance using Multi-Query Optimization

Representation of the Representation of the Recomputation Plan SpaceRecomputation Plan Space

Equivalence ClassEquivalence Class(OR node)(OR node)

OperationOperation(AND node)(AND node)

AND/OR Query DAG

BCBC

ABCABC BCDBCD

CDCDABAB

CC DDBB

Best PlanBest Plan

AA

Additionally incorporates subsumption derivations Details in Roy et al. [SIGMOD00]

Page 14: Materialized View Selection and Maintenance using Multi-Query Optimization

Incremental Plans:Incremental Plans:Propagation Based Differential Propagation Based Differential GenerationGeneration Differentials propagated one at a time For each differential dR

– Start at dR and compute node differentials bottom-up along the “best plan” in a topological order

– Differential of a node computed as a function of its inputs and their differentials

• e.g. d(E1E2) = E1 dE2 U E2dE1 U dE1dE2 where dEi= differential of Ei wrt dR

– Refresh the relation R and the affected MVs wrt dR by merging with the differentials computed as above

Ross et al. [SIGMOD96]

Page 15: Materialized View Selection and Maintenance using Multi-Query Optimization

Incorporating Incremental Plans:Incorporating Incremental Plans:Propagation Based Differential Propagation Based Differential GenerationGeneration

Equivalence ClassEquivalence Class(OR node)(OR node)

OperationOperation(AND node)(AND node)

Propagation of dA

BCBC

BCdABCdA

BdABdA

CCBBdAdA

Best PlanBest Plan

Page 16: Materialized View Selection and Maintenance using Multi-Query Optimization

Incorporating Incremental Plans:Incorporating Incremental Plans:Propagation Based Differential Propagation Based Differential GenerationGeneration

Equivalence ClassEquivalence Class(OR node)(OR node)

OperationOperation(AND node)(AND node)

Propagation of dB

CdBCdB

ACdBACdB CDdBCDdB

CDCDAdBAdB

CC DDdBdBAA

Best PlanBest Plan

Page 17: Materialized View Selection and Maintenance using Multi-Query Optimization

Incorporating Incremental Plans:Incorporating Incremental Plans:Propagation Based Differential Propagation Based Differential GenerationGeneration

Equivalence ClassEquivalence Class(OR node)(OR node)

OperationOperation(AND node)(AND node)

Propagation of dC

BdCBdC

ABdCABdC BDdCBDdC

DdCDdCABAB

dCdC DDBBAA

Best PlanBest Plan

Page 18: Materialized View Selection and Maintenance using Multi-Query Optimization

Incorporating Incremental Plans:Incorporating Incremental Plans:Propagation Based Differential Propagation Based Differential GenerationGeneration

Equivalence ClassEquivalence Class(OR node)(OR node)

OperationOperation(AND node)(AND node)

Propagation of dD

BCBC

BCdDBCdD

CdDCdD

CC dDdDBB

Best PlanBest Plan

Page 19: Materialized View Selection and Maintenance using Multi-Query Optimization

Incorporating Incremental Incorporating Incremental PlansPlansLogical representation

ABAB

AA

BdABdA

BB

AdBAdB

dBdBdAdA For each equiv node and each base differential affecting it

– Introduce a new equiv node representing its differential– Populate with the differential plans

Maintain statistics for the full expression after successive mergesLarge space overhead!

recomputation planrecomputation plan

incremental planincremental planMerge operatorMerge operator

Page 20: Materialized View Selection and Maintenance using Multi-Query Optimization

Incorporating Incremental Incorporating Incremental PlansPlans

ABAB

AA

BdABdA

BB

AdBAdB

dBdBdAdA

Reuse the same structure for successive propagation cycles separate best plan pointers for each cycle separate statistics for the full expression after successive mergesAlso incorporates sort-orders, indices, etc. Roy et al. [SIGMOD00]

Actual space-efficient representation

Page 21: Materialized View Selection and Maintenance using Multi-Query Optimization

ApproachApproach Setting up the search space of

maintenance plans Best maintenance plan

computation Transient/Permanent materialized

view selection

Page 22: Materialized View Selection and Maintenance using Multi-Query Optimization

Maintenance Plan Maintenance Plan ComputationComputationGiven Set of nodes Mt materialized transiently

– can include full results as well as differentials Set of nodes Mp materialized permanently

– includes full results but not differentialscompute the best consolidated

maintenance plan for Mp

Page 23: Materialized View Selection and Maintenance using Multi-Query Optimization

Maintenance Plan Maintenance Plan ComputationComputationBest plan computed using a query optimizer

extended as follows: Plan accessing a materialized view (trans/perm)

does not include its computation, only its use Cost of a maintenance plan

totalcost(Mp, Mt) = eMpmaintcost(e | Mp, Mt) + eMttrmatcost(e | Mp, Mt)

wheremaintcost(Mp, Mt) : cost of cheapest maintenance plan for e

(recomputation/incremental)trmatcost(Mp, Mt) : cost of computing and materializing e

Page 24: Materialized View Selection and Maintenance using Multi-Query Optimization

ApproachApproach Setting up the search space of

maintenance plans Best maintenance plan

computation Transient/Permanent materialized

view selection

Page 25: Materialized View Selection and Maintenance using Multi-Query Optimization

Transient/Permanent Transient/Permanent Materialized View Materialized View SelectionSelectionGiven set of MVs M already materialized,

determine Set of nodes Mt to materialize transiently Set of nodes Mp ( M) to materialize

permanentlysuch that totalcost(Mp, Mt) is minimized

Exhaustive approach too expensive. Need heuristics!

Page 26: Materialized View Selection and Maintenance using Multi-Query Optimization

Transient/Permanent Materialized View Transient/Permanent Materialized View SelectionSelectionA Greedy HeuristicA Greedy Heuristic

Input: Initial MVs MOutput: Mp ( M) , Mt, corresp. best planBegin

Mp = M; Mt = {}S = set of equivalence nodes in the DAG for MWhile ( S {} )

Pick z S which maximizes Benefit(z | Mp, Mt)If ( Benefit(z | Mp, Mt) 0 )

breakIf ( z is a full result and

maintcost(z | Mp, Mt) < trmatcost(z | Mp, Mt) )Mp = Mp U {z}

else Mt = Mt U {z}S = S – {z}

Return (Mp, Mt)End How to compute Benefit(z | Mp, Mt)?

Page 27: Materialized View Selection and Maintenance using Multi-Query Optimization

Transient/Permanent Materialized View Transient/Permanent Materialized View SelectionSelectionBenefit ComputationBenefit ComputationBenefit(z | Mp, Mt) = gain(z | Mp, Mt) - investment(z | Mp, Mt)

where

gain(z | Mp, Mt) = eMp(maintcost(e | Mp, Mt) - maintcost(e | Mp, Mt U {z})) + eMt(trmatcost(e | Mp, Mt) - trmatcost(e | Mp, Mt U {z}))

and

investment(z | Mp, Mt) = min(maintcost(z | Mp, Mt), trmatcost(z | Mp, Mt))if z is a full result trmatcost(z | Mp, Mt) if z is a differential

Benefit computation expensive. Need efficient techniques!

Page 28: Materialized View Selection and Maintenance using Multi-Query Optimization

Transient/Permanent Materialized View Transient/Permanent Materialized View SelectionSelectionImproving Efficiency of the Improving Efficiency of the Greedy HeuristicGreedy Heuristic Cost-propagation based incremental

techniques to efficiently compute Benefit Monotonicity assumption

– Reduces the number of Benefit computations Techniques to determine if a node can be

shared across a given maintenance plan– Reduces the number of nodes considered for

transient materialization

Adapted from Roy et al. [SIGMOD00]. See paper for details.

Page 29: Materialized View Selection and Maintenance using Multi-Query Optimization

BenchmarkBenchmarkSingle Views

– Same views as above, refreshed separately

Set of Views– 10 views (5 with aggregates, 5

without) on 8 distinct relations, refreshed together

Page 30: Materialized View Selection and Maintenance using Multi-Query Optimization

Effect of Transient and Effect of Transient and Permanent MaterializationPermanent Materialization

Single ViewsSingle Views Set of ViewsSet of Views

Page 31: Materialized View Selection and Maintenance using Multi-Query Optimization

Effect of Adaptive Effect of Adaptive Maintenance Policy Maintenance Policy SelectionSelection

Single ViewsSingle Views Set of ViewsSet of Views

Page 32: Materialized View Selection and Maintenance using Multi-Query Optimization

Scalability AnalysisScalability Analysis

Optimization Memory RequirementsOptimization Memory Requirements Optimization TimeOptimization Time

Negligible Negligible one-timeone-time costs costs

Page 33: Materialized View Selection and Maintenance using Multi-Query Optimization

ConclusionConclusionPresented techniques Automate sharing of computation Automate view selection Automate maintenance policy selection and plan

computation Do the above in an integrated manner

– leading to benefits greater than could be achieved by considering each dimension individually

Are efficient and scalable – the overall benefits greatly outweigh the one-time cost

Integrate with state-of-the-art optimizers (e.g. MS SQL-Server)

Page 34: Materialized View Selection and Maintenance using Multi-Query Optimization

Future WorkFuture WorkExtend presented techniquesTo handle limited spaceTo speed up a workload of queries

in addition to maintenance of a set of materialized views

To work in dynamic query result caching environments

Page 35: Materialized View Selection and Maintenance using Multi-Query Optimization

QuestionsQuestions