SSD Aware Scan Operation Optimization in PostGreSQL Database
-
Upload
supun-nakandala -
Category
Software
-
view
235 -
download
0
Transcript of SSD Aware Scan Operation Optimization in PostGreSQL Database
A Study on SSD Aware Scan Operation Optimization in
PostgreSQL Database
SSDs vs Traditional Spin Type HDDs
SSDsSilicon memory chipsNo moving partsNo rotational delayNear zero seek time
Both random and sequential block access time is almost the same !
But ...The cost models in RDBMS are based on the
characteristics of spin type HDDs.Assumes random_block_access_time >
sequential_block_access_timeWhen used with SSDs this assumption is not
valid- Is there opportunities for improvements ??
Background informationScan operation
- SELECT * FROM table WHERE condition
SelectivityScan operation alternatives in PostgreSQL
- Heap Scan- Bitmap index scan + Bitmap heap scan- Index scan
Our HypothesisIndex scan based on a secondary index can
perform better than other scan operations in databases which runs on SSD type storage media.
Based on the fact that in SSDs the random block access cost is almost similar to sequential block access cost
Our Hypothesis (Continued)SELECT * FROM table WHERE column = val
- column is indexed (not primary)- correlation between primary index and secondary index is zero
MethodologyKingston 8GB Data TravelerDedicated PC running Ubuntu 12.04 (i5 2.3 GHz processor
and 4GB system memory)PostgreSQL 9.3Table with 36 columns, 6,000,000 rows of dataSELECT * FROM table_1 WHERE column_1 > val_1 AND
column_1 < val_21.7 GB of data (with indexes)
Methodology (Continued)numeric field “idx_column” indexed using a
btree indexcorrelation between primary index and
secondary index is = 0.000000…cardinality of the “idx_column” field is 933900
Selectivity (log) seq scan BHS + BIS index scan
-4 10594 0 0
-3 10269 1 0
-2 10255 9 4
-1 10260 94 44
0 10278 644 457
1 10407 8794 4915
2 11600 16528 49395
In PostgreSQLrandom_block_access_time
= 4 * seq_block_access_timeThis is assuming spin type HDDsWhat is the relation in SSDs ?
random_block_access_time= seq_block_access_time ??
Selectivity (log)Running times before optimization(ms)
Optimum running times(ms)
Running times after optimization(ms)
Cost reduction (ms) Cost reduction (%)
-4 0 0 0 0 -
-3 1 0 0 1 100
-2 9 4 4 5 56
-1 94 44 44 50 53
0 644 457 457 187 29
1 8794 4915 4915 3879 44
2 11600 11600 11600 0 0
Are we done ??We haven’t consider an important factor
- relative size of the table compared to the system memory
ObservationsSequential scan remains consistent for all the
system memory values. why ?Both BIS + BHS and index scan drastically
underperforms when system memory is reduced.
BIS + BHS performs slightly better than index scan
So the optimization will work only in special conditions where at least majority of the table content can reside in the main memory.- Does this means the optimization is of no use ??
Potential of this optimization
- Small table size databases- Embedded devices- Mobile phones etc.
Questions ??