Compressing Relations Compressing Relations And IndexesAnd Indexes
Jonathan Goldstein Raghu RamakrishnanUri Shaft
Department of Compter Sciences, University of Wisconsin-Madison
June 18, 1997
AgendaAgenda
IntroductionIntroduction Compressing A RelationCompressing A Relation Compression Applied to Compression Applied to
Rectangle Base IndexesRectangle Base Indexes Performance EvaluationPerformance Evaluation Questions and RemarksQuestions and Remarks
IntroductionIntroduction
Page level CompressionPage level Compression Performance StudyPerformance Study Application to B-trees and Application to B-trees and
R-treesR-trees Multidimensional bulk Multidimensional bulk
loading algorithmloading algorithm
Compressing A Compressing A relationrelation
Frames Of ReferenceFrames Of Reference Non numeric attributesNon numeric attributes File level compressionFile level compression
Lossy Lossy CompressionCompression
Point approximation in lossy Point approximation in lossy compressioncompression
Compressing an Compressing an indexing structureindexing structure
Compressing a B-treeCompressing a B-tree Compressing a rectangle Compressing a rectangle
based indexing structurebased indexing structure Compression oriented Compression oriented
Bulk LoadingBulk Loading
Bulk-Loading Bulk-Loading AlgorithmAlgorithm
Input. A set of points in Input. A set of points in some n-dimentional some n-dimentional space.space.
Output. A partition of the Output. A partition of the inut into subsets.inut into subsets.
Requirements. The Requirements. The partition shuold group partition shuold group points that are close to points that are close to each other in the same each other in the same group as much as possiblggroup as much as possiblg
GB-Pack GB-Pack compression compression oriented bulk oriented bulk loadingloading
Qualities:Qualities:• trading off some tree quality trading off some tree quality
for increased compression.for increased compression.• number of entries per page is number of entries per page is
data-dependent.data-dependent.• cutting a dimension in a cutting a dimension in a
value boundary in the data.value boundary in the data.
Performance Performance EvaluationEvaluation
Relational Compression Relational Compression Experiments.Experiments.
CPU vs. I/O Costs.CPU vs. I/O Costs. Comparison With Comparison With
Techniques in commercial Techniques in commercial systems.systems.
Importance of Tuple-Level Importance of Tuple-Level Decompression.Decompression.
R-tree Compression R-tree Compression Experiments.Experiments.
Synthetic Data SetsSynthetic Data Sets
• Size: The number of tuples in Size: The number of tuples in the relation.the relation.
• Dimensionality: The number Dimensionality: The number of attributes of the relations.of attributes of the relations.
• Range: The range of values Range: The range of values for the attributes.for the attributes.
• Distribution :uniform(worst Distribution :uniform(worst case) / exponential.case) / exponential.
• Partition Strategy.Partition Strategy.• Page size.Page size.
Sales Data SetSales Data Set
Sales data set. Compression Sales data set. Compression Achieved versus dimensionality Achieved versus dimensionality
R-tree Compression R-tree Compression ExperimentsExperiments
Testing the quality of R-trees on Sales Testing the quality of R-trees on Sales Data SetData Set..
Top Related