SSD Aware Scan Operation Optimization in PostGreSQL Database

22
A Study on SSD Aware Scan Operation Optimization in PostgreSQL Database

Transcript of SSD Aware Scan Operation Optimization in PostGreSQL Database

Page 1: SSD Aware Scan Operation Optimization in PostGreSQL Database

A Study on SSD Aware Scan Operation Optimization in

PostgreSQL Database

Page 2: SSD Aware Scan Operation Optimization in PostGreSQL Database

SSDs vs Traditional Spin Type HDDs

Page 3: SSD Aware Scan Operation Optimization in PostGreSQL Database

SSDsSilicon memory chipsNo moving partsNo rotational delayNear zero seek time

Both random and sequential block access time is almost the same !

Page 4: SSD Aware Scan Operation Optimization in PostGreSQL Database

But ...The cost models in RDBMS are based on the

characteristics of spin type HDDs.Assumes random_block_access_time >

sequential_block_access_timeWhen used with SSDs this assumption is not

valid- Is there opportunities for improvements ??

Page 5: SSD Aware Scan Operation Optimization in PostGreSQL Database

Background informationScan operation

- SELECT * FROM table WHERE condition

SelectivityScan operation alternatives in PostgreSQL

- Heap Scan- Bitmap index scan + Bitmap heap scan- Index scan

Page 6: SSD Aware Scan Operation Optimization in PostGreSQL Database

Our HypothesisIndex scan based on a secondary index can

perform better than other scan operations in databases which runs on SSD type storage media.

Based on the fact that in SSDs the random block access cost is almost similar to sequential block access cost

Page 7: SSD Aware Scan Operation Optimization in PostGreSQL Database

Our Hypothesis (Continued)SELECT * FROM table WHERE column = val

- column is indexed (not primary)- correlation between primary index and secondary index is zero

Page 8: SSD Aware Scan Operation Optimization in PostGreSQL Database

MethodologyKingston 8GB Data TravelerDedicated PC running Ubuntu 12.04 (i5 2.3 GHz processor

and 4GB system memory)PostgreSQL 9.3Table with 36 columns, 6,000,000 rows of dataSELECT * FROM table_1 WHERE column_1 > val_1 AND

column_1 < val_21.7 GB of data (with indexes)

Page 9: SSD Aware Scan Operation Optimization in PostGreSQL Database

Methodology (Continued)numeric field “idx_column” indexed using a

btree indexcorrelation between primary index and

secondary index is = 0.000000…cardinality of the “idx_column” field is 933900

Page 10: SSD Aware Scan Operation Optimization in PostGreSQL Database
Page 11: SSD Aware Scan Operation Optimization in PostGreSQL Database

Selectivity (log) seq scan BHS + BIS index scan

-4 10594 0 0

-3 10269 1 0

-2 10255 9 4

-1 10260 94 44

0 10278 644 457

1 10407 8794 4915

2 11600 16528 49395

Page 12: SSD Aware Scan Operation Optimization in PostGreSQL Database

In PostgreSQLrandom_block_access_time

= 4 * seq_block_access_timeThis is assuming spin type HDDsWhat is the relation in SSDs ?

random_block_access_time= seq_block_access_time ??

Page 13: SSD Aware Scan Operation Optimization in PostGreSQL Database
Page 14: SSD Aware Scan Operation Optimization in PostGreSQL Database
Page 15: SSD Aware Scan Operation Optimization in PostGreSQL Database

Selectivity (log)Running times before optimization(ms)

Optimum running times(ms)

Running times after optimization(ms)

Cost reduction (ms) Cost reduction (%)

-4 0 0 0 0 -

-3 1 0 0 1 100

-2 9 4 4 5 56

-1 94 44 44 50 53

0 644 457 457 187 29

1 8794 4915 4915 3879 44

2 11600 11600 11600 0 0

Page 16: SSD Aware Scan Operation Optimization in PostGreSQL Database

Are we done ??We haven’t consider an important factor

- relative size of the table compared to the system memory

Page 17: SSD Aware Scan Operation Optimization in PostGreSQL Database
Page 18: SSD Aware Scan Operation Optimization in PostGreSQL Database
Page 19: SSD Aware Scan Operation Optimization in PostGreSQL Database

ObservationsSequential scan remains consistent for all the

system memory values. why ?Both BIS + BHS and index scan drastically

underperforms when system memory is reduced.

BIS + BHS performs slightly better than index scan

Page 20: SSD Aware Scan Operation Optimization in PostGreSQL Database

So the optimization will work only in special conditions where at least majority of the table content can reside in the main memory.- Does this means the optimization is of no use ??

Page 21: SSD Aware Scan Operation Optimization in PostGreSQL Database

Potential of this optimization

- Small table size databases- Embedded devices- Mobile phones etc.

Page 22: SSD Aware Scan Operation Optimization in PostGreSQL Database

Questions ??