Wisconsin benchmark June 2001 Prof. Sang Ho Lee
Transcript of Wisconsin benchmark June 2001 Prof. Sang Ho Lee
2
Overview (1)
References D. Bitton, D. J. DeWitt and C. Turbyfill, Benchmarking Database
Systems: A Systematic Approach, Proc. of the Ninth Int. Conference on Very Large Data Bases: 8-19, 1983.
D. Bitton and C. Turbyfill, A Retrospective on the Wisconsin Benchmark, In: Readings in Database Systems, M. Stonebraker ed., Morgan Kaufmann, 1988.
D. DeWitt, The Wisconsin Benchmark: Past, Present, and Future, In: The Benchmark Handbook: 269-316, J. Gray ed., Morgan Kaufmann, 1993.
Developed to measure the DIRECT database machine initially in 1983
The first “real” benchmark for relational databases Timeliness, simplicity and portability made it widely used !
3
Overview (2)
Synthetic database and controlled workload 32 queries in total Metric: elapsed time Focuses on access methods and query optimization in
relational databases Limitations
A single-user benchmark No test of concurrency control and recovery Tests features of the query optimizer only
No longer widely used to evaluate single-processor relational systems, but fairly used to evaluate database systems on parallel processors (Gamma, Tandem, Volcano, etc.)
4
Original test databases
Synthetic databases: approx. 5 M bytes Three relations: identical attributes but different
cardinalities Onektup (1000 tuples) Tenktup1 (10,000 tuples) Tenktup2 (10,000 tuples)
13 integer attributes + 3 52-byte string attributes One tuple = 182 bytes
Strings 3 distinguishing characters in position 1, 27, 52 The same character is padded in other positions String4 has only 4 unique values
5
Original tenktup relation
unique1unique2twofourtentwentyhundredthousandtwothousfivethoustenthousodd100even100stringu1stringu2string4
intintintintintintintintintintintintint
charcharchar
0 - 99990 - 9999
0 - 10 - 30 - 9
0 - 190 - 990 - 999
0 - 19990 - 49990 - 9999
5050
randomrandomrotatingrotatingrotatingrotatingrotatingrandomrandomrandomrandomrotatingrotatingrandomrotatingrotating
candidate keydeclared key
0,1,0,1,...0,1,2,3,0,1,...0,1,…,9,0,...
0,1,…,19,0,...0,1,…,99,0,...
candidate key1,3,5,…,99,1,...
2,4,6,…,100,2,...candidate keycandidate key
Name Type Range Order Comment
6
Indexes
Three indexes Clustered unique index (unique2) Non-clustered unique index (unique1) Non-clustered non-unique index (hundred)
7
Retrospective on test database
Why 2-byte integers only ? Why 52-byte fixed-length string ?
Ad-hoc survey shows that fixed or variable length strings of 20-30 characters are more common.
Most strings are differentiated by the first few characters in the string.
All values are uniformly distributed -- unrealistic Is 5M bytes database too small ? Hard to scale database
2-byte integer restricts the max. size of database to 32768 tuples
8
Scaling the benchmark relations
unique1unique2twofourtentwentyonePercenttenPercenttwentyPercentfiftyPercentunique3evenOnePercentoddOnePercentstringu1stringu2string4
0 - (maxtuples - 1)0 - (maxtuples - 1)
0 - 10 - 30 - 9
0 - 190 - 990 - 90 - 40 - 1
0 - (maxtuples - 1)0,2,4,…,1981,3,5,…,199
randomsequential
randomrandomrandomrandomrandomrandomrandomrandomrandomrandomrandomrandomrandomcyclic
unique, random orderunique, sequential(unique1 mod 2)(unique1 mod 4)
(unique1 mod 10)(unique1 mod 20)
(unique1 mod 100)(unique1 mod 10)(unique1 mod 5)(unique1 mod 2)
unique1(onePercent * 2)
(onePercent * 2) + 1candidate keycandidate key
Name Range Order Comment
9
Test queries: Strategies
To avoid compounding factors, default execution parameters are set 1,000 tuples in result All 16 attributes in the result Result output mode - into a relation Integer attributes in selection predicates One relation queries - tenktup
Three basic performance factors are varied Storage structure of relation Indexing: no index, primary index (unique2), secondary index (unique1) Selectivity
In retrospective: 1000 tuples in result are too many Not always all attributes in the result tuples Composite index should be included
10
Test queries: An overview
Totally 32 queries Relational instruction set
Selection with different selectivity factors Projections with different percentages of duplicate attributes 2-way and 3-way joins Simple aggregates and aggregate functions Updates: insert, delete, update
11
Experimental environments (1)
Hardware CPU : Ultra SPARC processor 233 MHz 1EA. Main memory : 128 MB HDD : 4GB internal HDD 1EA., 36GB external HDD 2EA.
OS : SunOS 5.7 DBMS A Experimental repetition frequency
Run a query 5 times Read garbage data after each query execution, to flush buffers
Measurement time An arithmetic mean of 5 query elapsed times
12
Experimental environments (2)
Test database scaling : 20 times bigger than original database records Data tablespace : 2 GB Index tablespace : 1 GB Rollback space : 500 MB Temporary tablespace : 300 MB
Query optimization method CHOOSE : Cost-based optimization is a base method, if there are
not statistical data, rule-based optimization is used ANALIZE TABLE
13
DBMS parameters
data_block_size : 2048 bytes db_block_buffers : 20000 blocks (40 MB) shared_pool_size : 10240000 bytes (10 MB) log_buffer : 20480000 bytes (20 MB) log_checkpoint_interval : 40000 OS blocks (20 MB)
SunOS block size : 512 bytes/block
log_checkpoint_timeout : 0 other parameters : default value used
14
Selections (1)
A selection operation depends on a number of different factors Hardware speed, architecture and quality of software Storage organization of relation and index Selectivity factor Query output mode
8 queries in total 6 queries
Into temporary table (1%, 10%) vs. (no index, prime index, secondary index)
2 queries Outputs to screen 1% and one tuple returned
15
Selections (2)
Query 1 (no index) – 1% selectionINSERT INTO TEMPSELECT * FROM BASERELATION1WHERE unique2D BETWEEN :lower AND :upper lower : random value , upper : lower + ( # of tuples * selectivity )
Query 3 ( clustered index ) – 1 % selectionINSERT INTO TMPSELECT * FROM BASERELATION1WHERE unique2D BETWEEN :lower AND :upper lower : random value , upper : lower + ( # of tuples * selectivity )
Query 5 –1% selection via a non-clustered indexINSERT INTO TMPSELECT * FROM BASERELATION1WHERE unique1D BETWEEN :lower AND :upperlower : random value , upper : lower + ( # of tuples * selectivity )
16
Selections (3)
QueriesResponse
timeQuery Execution Plan
Query 1
- 1% selection with no index
4.634 sec
0 INSERT STATEMENT Optimizer=CHOOSE 1 0 TABLE ACCESS (FULL) OF 'BASERELATION1'
Query 3
- 1% selection with clustered
index
0.640 sec
0 INSERT STATEMENT Optimizer=CHOOSE 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BASERELATION1' 2 1 INDEX (RANGE SCAN) OF 'INDEX_UNIQUE2A' (UNIQUE)
Query 5
- 1% selection with non-
clustered index
3.763 sec
0 INSERT STATEMENT Optimizer=CHOOSE 1 0 TABLE ACCESS (BY INDEX ROWID) OF 'BASERELATION1' 2 1 INDEX (RANGE SCAN) OF 'INDEX_UNIQUE1A' (UNIQUE)
17
Selection (4)
Index usefulness Clustered index vs. non-clustered index
18
Joins (1)
To show effect of three different factors Complexity of a query Performance of join algorithms Effectiveness of query optimizers
Three basic join queries JoinABprime: join A with 10% of A (Bprime) JoinASelB: join A with 10% of B JoinCselAselB: join of C, 10% of A and 10% of B
Three versions of each query, resulting in 9 queries totally No index A clustered index A non-clustered index
19
Joins (2)
Select Select
Join Scan
Join
1000 tuples 1000 tuples
1000 tuples1000 tuples
1000 tuples
10000 tuples 10000 tuples
A B
C1000 tuples
JoinCselAselB
20
Joins (3)
Query 11 (no index) - JoinCselAselBINSERT INTO TMPSELECT * FROM BASERELATION1, BASERELATION2,APRIMEWHERE (Aprime.unique2A = BASERELATION1.unique2D)
AND (BASERELATION1.unique2D = BASERELATION2.unique2E) AND (BASERELATION1.unique2D BETWEEN :lower AND :upper )
lower : random value , upper : lower + ( # of tuples * selectivity ) Query14 (clustered index) - JoinCselAselB
INSERT INTO TMPSELECT * FROM BASERELATION1, BASERELATION2, APRIMEWHERE (Aprime.unique2A = BASERELATION1.unique2D)
AND (BASERELATION1.unique2D = BASERELATION2.unique2E) AND (BASERELATION1.unique2D BETWEEN :lower AND :upper )
lower : random value , upper : lower + ( # of tuples * selectivity )
21
Join (4)
Query 17 (non-clustered index) – JoinCselAselBINSERT INTO TMP
SELECT *
FROM BASERELATION1, BASERELATION2,APRIME
WHERE (Aprime.unique1A = BASERELATION1.unique1D)
AND (BASERELATION1.unique1D = BASERELATION2.unique1E)
AND (BASERELATION1.unique1D BETWEEN :lower AND :upper )
lower : random value , upper : lower + ( # of tuples * selectivity )
22
Join (5)
QueriesResponse
timeQuery Execution Plan
Query 11
- JoinCselAselB with no index
163.232
sec
0 INSERT STATEMENT Optimizer=CHOOSE 1 0 MERGE JOIN 2 1 MERGE JOIN 3 2 SORT (JOIN) 4 3 TABLE ACCESS (FULL) OF 'APRIME' 5 2 SORT (JOIN) 6 5 TABLE ACCESS (FULL) OF 'BASERELATION1' 7 1 SORT (JOIN) 8 7 TABLE ACCESS (FULL) OF 'BASERELATION2'
Query 14
- JoinCselAselB with clustered
index
31.078
sec
0 INSERT STATEMENT Optimizer=CHOOSE 1 0 NESTED LOOPS 2 1 NESTED LOOPS 3 2 TABLE ACCESS (FULL) OF 'APRIME' 4 2 TABLE ACCESS (BY INDEX ROWID) OF 'BASERELATION1' 5 4 INDEX (UNIQUE SCAN) OF 'INDEX_UNIQUE2A' (UNIQUE) 6 1 INDEX (UNIQUE SCAN) OF 'INDEX_UNIQUE2B' (UNIQUE)
23
Join (6)
QueriesResponse
timeQuery Execution Plan
Query 17
- JoinCselAselB with non-
clustered index
260.762
sec
0 INSERT STATEMENT Optimizer=CHOOSE 1 0 NESTED LOOPS 2 1 NESTED LOOPS 3 2 TABLE ACCESS (FULL) OF 'APRIME' 4 2 TABLE ACCESS (BY INDEX ROWID) OF 'BASERELATION1' 5 4 INDEX (UNIQUE SCAN) OF 'INDEX_UNIQUE1A' (UNIQUE) 6 1 INDEX (UNIQUE SCAN) OF 'INDEX_UNIQUE1B' (UNIQUE)
24
Join (7)
According to using indexes With no index : Sort merge join With indexes : Nested loops join
Table access sequence Firstly, the table that have a few data was accessed
25
Projections (1)
Implementation of projection A pass is made to discard unwanted attributes
• A complete scan of relation Second phase is to eliminate duplicates
• By sorting or hashing Query 18: Projection with 1% projection
insert into tmp select distinct two, four, ten, twenty, onePercent, string4 from tenktup1;
Query 19: Projection with 100% projection insert into tmp
select distinct two, four, ten, twenty, onePercent, tenPercent, twentyPercent, fiftyPercent, unique3, evenOnePercent, oddOnePercent, stringu1, stringu2, string4from tenktup1;
In retrospective, should have been tested with more large relation!
26
Projections (2)
Query 18 – Projection with 1% ProjectionINSERT INTO TMP
SELECT DISTINCT two, four, ten, twenty, onePercent, string4
FROM BASERELATION1
Query 19 – Projection with 100% ProjectionINSERT INTO TMP
SELECT DISTINCT two, four, ten, twenty, onePercent, tenPercent,
twentyPercent, fiftyPercent, unique3, evenOnePercent, oddOnePercent,
stringu1, stringu2, string4
FROM BASERELATION1
27
Projections (3)
QueriesResponse
timeQuery Execution Plan
Query 18
- Projection with 1%
projection
7.808
sec
0 INSERT STATEMENT Optimizer=CHOOSE 1 0 SORT (UNIQUE) 2 1 TABLE ACCESS (FULL) OF 'BASERELATION1'
Query 19
- Projection with 100% projection
442.342
sec
0 INSERT STATEMENT Optimizer=CHOOSE 1 0 SORT (UNIQUE) 2 1 TABLE ACCESS (FULL) OF 'BASERELATION1'
28
Projection (4)
DISTINCT keyword Full table scan & sort
Difference to the number of selected rows
29
Aggregate queries (1)
Three aggregate queries with two version (no index or with secondary index)
Min scalar aggregate queries insert into temp select min(tenkup1.unique2) from tenktup1; Q20 (no index), Q23 (cluster index)
Min aggregate function queries with 100 partitions insert into temp
select min(tenkup1.unique3) from tenktup1group by tenktup1.onePercent
Q21 (no index), Q24 (cluster index) Sum aggregate function queries with 100 partitions:
similarly
30
Aggregate queries (2)
Query 20 (no index) - Minimum Aggregate FunctionINSERT INTO TMPSELECT MIN (BASERELATION1.unique2D) FROM BASERELATION1
Query 21 (no index) - Minimum Aggregate Function with 100 Partitions INSERT INTO TMP
SELECT MIN (BASERELATION1.unique3D) FROM BASERELATION1 GROUP BY BASERELATION1.onePercentD
Query 23 (clustered index) - Minimum Aggregate FunctionINSERT INTO TMPSELECT MIN (BASERELATION1.unique2D) FROM BASERELATION1
Query 24 (clustered index) - Minimum Aggregate Function with 100 Partitions INSERT INTO TMP
SELECT MIN (BASERELATION1.unique3D) FROM BASERELATION1 GROUP BY BASERELATION1.onePercentD
31
Aggregate queries (3)
QueriesResponse
timeQuery Execution Plan
Query 20
- Min function with no index
4.454 sec0 INSERT STATEMENT Optimizer=CHOOSE 1 0 SORT (AGGREGATE) 2 1 TABLE ACCESS (FULL) OF 'BASERELATION1'
Query 23
- Min function with clustered
index
0.128 sec
0 INSERT STATEMENT Optimizer=CHOOSE 1 0 SORT (AGGREGATE) 2 1 INDEX (FULL SCAN) OF 'INDEX_UNIQUE2A' (UNIQUE)
Query 21
- Min function with no index and Group by
clause
5.606 sec
0 INSERT STATEMENT Optimizer=CHOOSE 1 0 SORT (GROUP BY) 2 1 TABLE ACCESS (FULL) OF 'BASERELATION1'
Query 24
- Min function with clustered
index and Group by clause
5.478 sec
0 INSERT STATEMENT Optimizer=CHOOSE 1 0 SORT (GROUP BY) 2 1 TABLE ACCESS (FULL) OF 'BASERELATION1'
32
Aggregate queries (4)
Index usefulness for Min/Max aggregate function Group by clause
Irrelative to using indexes, full table scan is occurred
33
Updates (1)
To measure cost of updating relation and indexes Four simple update queries
Insert 1 tuple (Q26 and Q29) Update key attribute of 1 tuple (Q28 and Q31) Update non-key attribute of 1 tuple (Q32) Delete 1 tuple (Q27 and Q30)
Problems Not enough update to cause a significant reorganization of index
pages No concurrency control and recovery No bulk update The Halloween problem
34
Updates (2)
Query 26 (no index) – Insert 1 tupleINSERT INTO TENKTUP1 VALUES ( :upper,:upper,0,2,0,10,50,688,1950,4950, 9950,1,100,‘MxxxxxxxxxxxxxxxxxxxxxxxxxxGxxxxxxxxxxxxxxxxxxxxxxxxxC’,‘GxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxA’,‘OxxxxxxxxxxxxxxxxxxxxxxxxxxOxxxxxxxxxxxxxxxxxxxxxxxxxO’ )upper : random number that is larger than the total number of tuples
Query 27 (no index) – Delete 1 tupleDELETE FROM TENKTUP1 WHERE unique1= : upperupper : random number
Query 29 (with index) – Insert 1 tupleINSERT INTO TENKTUP1 VALUES ( :upper,:upper,0,2,0,10,50,688,1950,4950,9950,1,100,‘MxxxxxxxxxxxxxxxxxxxxxxxxxxGxxxxxxxxxxxxxxxxxxxxxxxxxC’,‘GxxxxxxxxxxxxxxxxxxxxxxxxxxCxxxxxxxxxxxxxxxxxxxxxxxxxA’,‘OxxxxxxxxxxxxxxxxxxxxxxxxxxOxxxxxxxxxxxxxxxxxxxxxxxxxO’ )upper : random number that is larger than the total number of tuples
Query 30 (with index) – Delete 1 tupleDELETE FROM TENKTUP1 WHERE unique1=:upper upper : random number
35
Updates (3)
QueriesResponse
timeQuery Execution Plan
Query 26
- Insert 1 tuple with no index
0.181 sec0 INSERT STATEMENT Optimizer=CHOOSE
Query 29
- Insert 1 tuple with index
0.237 sec0 INSERT STATEMENT Optimizer=CHOOSE
Query 27
- Delete 1 tuple with no index
4.224 sec0 DELETE STATEMENT Optimizer=CHOOSE 1 0 DELETE OF 'BASERELATION1' 2 1 TABLE ACCESS (FULL) OF 'BASERELATION1'
Query 30
- Delete 1 tuple with index
0.134 sec
0 DELETE STATEMENT Optimizer=CHOOSE 1 0 DELETE OF 'BASERELATION1' 2 1 TABLE ACCESS (BY INDEX ROWID) OF 'BASERELATION1' 3 2 INDEX (UNIQUE SCAN) OF 'INDEX_UNIQUE1A' (UNIQUE)
36
Updates (4)
No usefulness of index in case of insertion Index usefulness in case of deletion
37
Revisiting Wisconsin Benchmark
Criticized by a number of deficiencies Single-user testing only Absence of bulk update, database load and unload tests No outer join tests Its use of uniformly distributed attribute values lack of tests involving host language variables No “order by” clause Overly simple aggregation tests Simple join queries
Weak collection of data types is not bad !