PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of...
-
Upload
jeffry-mckenzie -
Category
Documents
-
view
213 -
download
0
Transcript of PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of...
![Page 1: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/1.jpg)
PrefixCube: Prefix-sharing Condensed Data
Cube
Jianlin Feng Qiong Fang Hulin Ding
Huazhong Univ. of Sci. & Tech.
Nov 12, 2004
![Page 2: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/2.jpg)
DOLAP 2004 2 Jianlin Feng
Outline
Introduction Related Work ODM: Ordered Datacube Model BST-Condensed Cube Prefix-sharing Condensed Cube Comparisons Conclusions
![Page 3: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/3.jpg)
DOLAP 2004 3 Jianlin Feng
Introduction
Data Cube (ICDE’96)– N-dimensional cube(A1, A2, …, AN)
– 2N cuboids, i.e. GROUP-BYs The Huge Size Problem
– When R is sparse, the size of a cuboid is possibly close to the size of R.
– The I/O cost even for storing the cube result tuples becomes dominative.
![Page 4: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/4.jpg)
DOLAP 2004 4 Jianlin Feng
Related Work
Condensed Cube (ICDE’02) Dwarf (SIGMOD’02) Quotient Cube (VLDB’02) QC-Tree (SIGMOD’03) Basic idea: remove redundancies
existing among cube tuples. – prefix redundancy – suffix redundancy
![Page 5: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/5.jpg)
DOLAP 2004 5 Jianlin Feng
Prefix redundancy
Given an example cube(A, B, C) – Each value of dimension A occurs in 4
cuboids: cuboid(A), (AB), (AC) and (ABC)
– Possibly many times in each cuboid except cuboid(A)
Inter-cuboid and Intra-cuboid prefix redundancy
![Page 6: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/6.jpg)
DOLAP 2004 6 Jianlin Feng
Suffix Redundancy
Occurs when cube tuples belonging to different cuboids are actually aggregated from the same group of base relation tuples.
An extreme case – Let the source relation R have only one single
tuple r(a1, a2, …, an, m);
– 2n cube tuples can be condensed into one physical tuple: (a1, a2, …, an, V), where V = aggr(r);
– together with some information indicating that it is a representative tuple.
![Page 7: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/7.jpg)
DOLAP 2004 7 Jianlin Feng
Thinking… Condensed cube
– It condenses those cube tuples, aggregated from one single base tuple, into a physical tuple in order to reduce cube’s size.
Dwarf– Besides suffix coalescing, i.e. multi-base-
tuple condensing, it also realized full prefix-sharing so as to achieve high cube size reducing effectiveness.
![Page 8: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/8.jpg)
DOLAP 2004 8 Jianlin Feng
Motivation
HOW to further reduce condensed cube’s size while taking into account query characteristics we intend to answer - range query?
Augmenting BST-condensing with removing of intra-cuboid prefix redundancy!
![Page 9: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/9.jpg)
DOLAP 2004 9 Jianlin Feng
Ordered Datacube Model
Value ALL(or *) is encoded as 0. A dimension D and its cardinality C
– each dimension value is one-to-one mapped to an integer value between 1 and C inclusively.
N dimensions form a N-dimensional space.
The origin O(0, 0, …, 0) represents the grand total.
![Page 10: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/10.jpg)
DOLAP 2004 10 Jianlin Feng
Ordered Datacube Model
Under ODM, a range query against a data cube can actually be reduced to a sub-query against only one particular cuboid in the cube or a union of such sub-queries.
![Page 11: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/11.jpg)
DOLAP 2004 11 Jianlin Feng
BST-Condensed Cube
Base Single Tuple (BST)
– t1 is a BST on SD {A} and {B}– t2 is a BST on SD {B}
A unique minimal BST-Condensed Cube can be got when fully taking advantage of each BST with all of its SDs - MinCube.
A B C Mt1 8 1 1 100t2 1 8 1 50t3 1 2 3 60
![Page 12: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/12.jpg)
DOLAP 2004 12 Jianlin Feng
BU-BST Condensed Cube BottomUpBST algorithms (ICDE’02) Each BST corresponds to only one SD. It’s easier to compute and to restore normal cube tuple
from condensed cube compared with MinCube.
Note: BST Condensing is a special kind of Prefix-sharing !
A B C M8 * * 108 1 * 108 * 1 108 1 1 10
A B C M SD
ct7 8 1 1 10 {A}
A group of cube tuples with sharing
prefix are represented by a
BST!
![Page 13: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/13.jpg)
DOLAP 2004 13 Jianlin Feng
A BU-BST Condensed Cube Example
A B C Mt1 8 1 1 100t2 1 8 1 50t3 1 2 3 60
A B C M SID CIDct1 * * * 210 ALLct2 1 * * 110 Act3 1 2 3 60 ABct4 1 8 1 50 ABct5 1 * 1 50 ACct6 1 * 3 60 ACct7 8 1 1 100 Act8 * 1 1 100 Bct9 * 2 3 60 B
ct10 * 8 1 50 Bct11 * * 1 150 Cct12 * * 3 60 C
Note:
Intra-cuboid prefix redundancy: ct3 and ct4
Inter-cuboid prefix redundancy: ct2, ct3 and ct5
![Page 14: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/14.jpg)
DOLAP 2004 14 Jianlin Feng
Prefix-sharing Condensed Cube - PrefixCube
BST Condensing BST Condensing ++
Intra-cuboid prefix-sharingIntra-cuboid prefix-sharing
Prefix-sharingPrefix-sharing
PrefixCubePrefixCube
![Page 15: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/15.jpg)
DOLAP 2004 15 Jianlin Feng
A PrefixCube Example
8
SID = A SID = AB SID = B
1 2 8
1 2 8
1 50
3 60
1 50
3 60
1 100
1 110210 1 1 150 3 60
1 50 3 60
V-RootsN-Roots
1 100
CID = ALL CID = ACCID = A CID = A
1
![Page 16: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/16.jpg)
DOLAP 2004 16 Jianlin Feng
Corresponding Dwarf
100
1 8 2
1 50 50
3 60 60
1 50 1103 60 1 150 2103 60
8
8 21
A Dimension
B Dimension
C Dimension
(node1)
(node2)
(node4)
(node3)
1
1 100
![Page 17: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/17.jpg)
DOLAP 2004 17 Jianlin Feng
PrefixCube vs. Dwarf
PrefixCube
Dwarf
Prefix-sharing Intra-cuboid Inter- and Intra-cuboid
PrefixCube does not aim at blindly achieving effective compression ratio, but it is intended to make a good compromise among cube size reducing ratio, restoring and updating costs, and query characteristics!
Suffix Coalescing
BST Condensing
Multi-tuple Condensing
Compression Ratio
Lower Higher
Saving extra value ALL?
No Yes
Tuple clustered by
cuboid?
Yes No
![Page 18: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/18.jpg)
DOLAP 2004 18 Jianlin Feng
Effectiveness of Size Reduction
Datasets– synthetic datasets with uniform distribution– # of tuples: 1,000,000
0%
20%
40%
60%
80%
100%
2 3 4 5 6 7 8 9
Number of Dimensions
Size
Rat
io
BU-BSTPrefixCube
0%
20%
40%
60%
80%
100%
2 3 4 5 6 7 8 9
Number of Dimensions
Size
Rat
io
BU-BSTPrefixCube
(a) Cardinality = 100 (b) Cardinality = 1000
![Page 19: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/19.jpg)
DOLAP 2004 19 Jianlin Feng
Effectiveness of Size Reduction
PrefixBUC– Full Cube (computed by BUC) – Prefix-sharing
0%
20%
40%
60%
80%
100%
2 3 4 5 6 7 8 9
Number of Dimensions
Size
Rat
io
C=100C=1000
![Page 20: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/20.jpg)
DOLAP 2004 20 Jianlin Feng
Impact of Data Density Datasets
– Uniform distribution– # of dimensions: 6– Cardinality of dimensions: 100– # of tuples: range from 1,000 to 1,000,000
0%
20%
40%
60%
80%
100%
1.E+03 1.E+04 1.E+05 1.E+06
Number of Tuples
Siz
e R
atio
BU-BSTPrefixCubePrefixBUC
![Page 21: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/21.jpg)
DOLAP 2004 21 Jianlin Feng
Impact of Data Skewness Datasets
– Zipf distribution– # of tuples: 1,000,000– Cardinality of dimensions: range from 1,000 to 500 with
100 interval– Zipf factor: range from 0 to 0.8 with 0.2 interval
0%
20%
40%
60%
80%
100%
0 0.2 0.4 0.6 0.8
Zipf Factors
Size
Rat
io
BU-BSTPrefixCubePrefixBUC
![Page 22: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/22.jpg)
DOLAP 2004 22 Jianlin Feng
Real-world Dataset Datasets
– Weather Datasets– # of tuples: 1,015,367
0
100
200
300
400
500
600
700
2 3 4 5 6 7 8 9
Number of Dimensions
Tim
e(se
c.)
BUCBU-BSTPrefixCube
0%
20%
40%
60%
80%
100%
2 3 4 5 6 7 8 9
Number of Dimensions
Siz
e R
atio
BU-BSTPrefixCubePrefixBUC
![Page 23: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/23.jpg)
DOLAP 2004 23 Jianlin Feng
Conclusion
A new cube structure PrefixCube was proposed by augmenting BU-BST condensing with intra-cuboid prefix-sharing.– It can greatly reduce data cube’s size
compared with BU-BST condensed cube.– It can also reduce the impact of data skew
on BU-BST condensing.– It can make a quite stable size reduction
on both dense and sparse datasets.
![Page 24: PrefixCube: Prefix-sharing Condensed Data Cube Jianlin FengQiong Fang Hulin Ding Huazhong Univ. of Sci. & Tech. fengjl@mail.hust.edu.cn Nov 12, 2004.](https://reader036.fdocuments.in/reader036/viewer/2022070402/56649f265503460f94c3e0d3/html5/thumbnails/24.jpg)
DOLAP 2004 24 Jianlin Feng
The End
Thank u!
Any question?