Achieving Scalability in OLAP Materialized View Selection
description
Transcript of Achieving Scalability in OLAP Materialized View Selection
Achieving Scalability in OLAP Materialized View Selection
Thomas P. NadeauToby J. Teorey
University of Michigan
DOLAP 2002
2
Topics
• Overview of OLAP• Exponentiality in View Selection• Our Polynomial Greedy Algorithm (PGA)• Test Results• Conclusions• Current Work
3
Example Star Schema
Sell
CustID
DateID
BindID
Cost
Fact Table
DateID
Month
Quarter
Year
Calendar
CustID
Name
City
State/Prov
Customer
Bind StyleBindID
Desc
4
Star Schema Viewed with Data
Fact Table
Bind StyleBindID
PBHC
DescPaper BackHard Cover
DateID Month Quarter Year
1/1/98 Jan 1 1998
1/2/98 Jan 1 1998
12/31/00 Dec 4 2000
CustomerCustID Name City State/Prov
00001 U of M Ann Arbor MI00002 Smith & Co. Toronto Ont
SellCustID DateID BindID Cost$60000002 12/31/00 PB $500
$130000222 1/1/99 HC $1100
Many Rows
Calendar
5
Eight Dimensions of Book Database
Attribute Hierarchy Levels
Trim Width 4
Trim Length 4
Pages 4
Quantity 4
Stock Width 4
Stock Length 4
Bind Style 4
Press 4
6
Combinatorial Explosion
• Possible views = ℓi,
where d = |dimensions| ℓi = |levels| in dimension i
• Book database example– 2 dimensions, 42 = 16 views– 4 dimensions, 44 = 256 views– 6 dimensions, 46 = 4,096 views– 8 dimensions, 48 = 65,536 views
i = 1
d
7
Recap
• Materialized views quicken query responses
• Disk space limits view materialization
• Update window is a constraint
• Solution: Select strategic views
8
Our OLAP Optimization ApproachFact Table
Update
Users
Sample Data
Estimated View Size
Strategic Views
Current Views
Incremental Data
QueriesQuick
Responses
Completed Work
Current Work
Initial Data
Estimate Request
View Size Estimation
View Selection
View Maintenance
Query Optimization
9
View Selection:Example of Hypercube Lattice [HRU96]
p = Part
s = Supplier
c = Customer
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
10
Example of HRU Algorithm [HRU96]
5.2M x 4 = 20.8M0 x 4 = 00 x 4 = 0
5.99M x 2 = 11.98M5.8M x 2 = 11.6M5.9M x 2 = 11.8M
6M - 1
{p, s}{c, s}{c, p}
{s}{p}{c}{}
Iteration 1
Benefits of Possible Materialization Choices
p = Part
s = Supplier
c = Customer
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
11
0 x 4 = 00 x 4 = 0
0.79M x 2 = 1.58M0.6M x 2 = 1.2M
5.9M x 2 = 11.8M0.8M - 1
Iteration 2
Benefits of Possible Materialization Choices
p = Part
s = Supplier
c = Customer
Example of HRU
5.2M x 4 = 20.8M0 x 4 = 00 x 4 = 0
5.99M x 2 = 11.98M5.8M x 2 = 11.6M5.9M x 2 = 11.8M
6M - 1
{p, s}{c, s}{c, p}
{s}{p}{c}{}
Iteration 1
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
12
Exponentiality in HRU
• O(kn2) time, where k = |views to select|, n = |possible views|
• n = 2d in non-hierarchical database, where d = |dimensions|
• HRU algorithm is O(k22d) time• Two sources of exponentiality
– Each possible view is evaluated– Each view evaluation considers the effect of
materialization on every descendent
13
Polynomial Greedy Algorithm (PGA)
Nominate smallest child view
Nomination Selection
For each candidate
Select fact table
[more candidates]
[else]
[termination condition met]
[else]
Evaluate benefit
Select view greedily
Start new path
[path ended]
[continuing path]
14
p = Part
s = Supplier
c = Customer
Example of PGA [NT02]{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
15
Example of PGA{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
p = Part
s = Supplier
c = Customer
Nomination
Candidates
{p, s}{s}{}
16
Example of PGA
p = Part
s = Supplier
c = Customer
Candidates
{p, s}{s}{}
Iteration 1
5.2M x 4 = 20.8M5.99M x 2 = 11.98M
6M - 1
Nomination Selection
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
17
Example of PGA
p = Part
s = Supplier
c = Customer
Candidates
{p, s}{s}{}
Iteration 1
5.2M x 4 = 20.8M5.99M x 2 = 11.98M
6M - 1
Candidates
{c, s}{s}{c}{}
Nomination Selection Nomination
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
18
Example of PGA
p = Part
s = Supplier
c = Customer
Candidates
{p, s}{s}{}
Iteration 1
5.2M x 4 = 20.8M5.99M x 2 = 11.98M
6M - 1
Candidates
0 x 2 = 00.79M x 2 = 1.58M 5.9M x 2 = 11.8M
6M - 1
{c, s}{s}{c}{}
Iteration 2
Nomination Selection Nomination Selection
{c, p, s} 6M
{p, s} 0.8M {c, s} 6M {c, p} 6M
{s} 0.01M {p} 0.2M {c} 0.1M
{} 1
19
Nomination Complexity
• Maximum swatch width is d.
• Maximum path length is d.
• Finding one path is O(d2) time
• Our strategy nominates a path each time a view is selected, complexity is O(d2k) time
20
Evaluating Views in PGA
• Polynomial time evaluation requires approximating materialization benefits
• Account for smallest ancestor
• Account for materialized view with largest overlap in descendants
• Complexity of our algorithm is O(d2k2)
21
Complexities
d = | dimensions |
g = geometric mean of the number of hierarchical levels per dimension
k = | views selected for materialization |
ℓ = | layers in lattice |
Database Type HRU PGA
Non-Hierarchical O(k22d) time O(d2k2) time
O(d2k) space
Hierarchical O(kg2d) time O(dk2ℓ) time
O(dkℓ) space
22
Near Optimal Selection
d=2, ℓ = 4
0
200
400
600
800
1000
1200
1400
0 50 100 150 200 250 300 350
OptimalHRUPolynomial Greedy
Materialization Costs (rows)
Qu
ery
Cos
ts (
row
s)
23
Query Costs at Four Dimensions
Qu
ery
Cos
ts (
thou
san
ds
of r
ows)
Materialization Costs (thousands of rows)
0
200
400
600
800
0 20 40 60 80 100 120 140
HRU PGA
24
Query Costs at Six Dimensions
Qu
ery
Cos
ts (
mil
lion
s of
row
s)
Materialization Costs (thousands of rows)
0
5
10
15
20
0 50 100 150 200 250
HRU PGA
25
Query Costs at Eight Dimensions
Qu
ery
Cos
ts (
mil
lion
s of
row
s)
Materialization Costs (thousands of rows)
0
50
100
150
200
250
300
350
0 100 200 300 400 500
HRU PGA
26
Performance at Four Dimensions
Materialization Costs (thousands of rows)
Pro
cess
ing
Tim
e (s
econ
ds)
0
50
100
150
200
250
0 20 40 60 80 100 120 140
HRU PGA
27
Performance at Six Dimensions
0.00
50.00
100.00
150.00
200.00
0 50 100 150 200 250
HRU PGA
Materialization Costs (thousands of rows)
Pro
cess
ing
Tim
e (m
inu
tes)
28
Performance at Eight Dimensions
0.00
50.00
100.00
150.00
200.00
0 100 200 300 400 500Materialization Costs (thousands of rows)
Pro
cess
ing
Tim
e (m
inu
tes)
HRU PGA
29
Conclusions
• PGA finds a good set of views for materialization, when HRU fails due to algorithm complexity
• PGA extends the usefulness of OLAP systems into higher dimensionality
30
Current WorkFact Table
Update
Users
Sample Data
Estimated View Size
Strategic Views
Current Views
Incremental Data
QueriesQuick
Responses
Completed Work
Current Work
Initial Data
Estimate Request
View Size Estimation
View Selection
View Maintenance
Query Optimization
31
Current Work
• Design alternative data structures for materialized views in OLAP
• Test impact of new data structures on update and query costs.
• Integrate our work into an OLAP system
32
References
• [HRU96] V. Harinarayan, A. Rajaraman, J. D. Ullman. Implementing Data Cubes Efficiently. In Proceedings of 1996 ACM-SIGMOD Conf., pp. 205 - 216, Montreal, Canada.
• [NT01] T. P. Nadeau, T. J. Teorey. A Pareto Model for OLAP View Size Estimation. CASCON 2001, pp 1 – 13, Toronto, Canada.
• [NT02] T. P. Nadeau, T. J. Teorey. Achieving Scalability in OLAP Materialized View Selection. Technical Report (extended version). http://www.eecs.umich.edu/~teorey/cv.html .