Indexes don't mean slow inserts.

www.postgrespro.ru

Indexes don't meanslow inserts

Anastasia Lubennikova

Agenda

1. Why do we need it?2. Write-optimisation techniques3. PostgreSQL specific4. Advanced PostgreSQL indexes5. Future of indexing in PostgreSQL

2/29

PostgreSQL indexes

● Speed up search● Primary key● Constraints

● Secondaryindexes

3/29

Index maintenance overhead

● Index size● INSERT slowdown● Random I/O● Index becomes

fragmented

● More indexes -more overhead

4/29

Do we need write-optimisation?

UPDATE mytable SET a = a + 1;

● Heavy write load● MVCC

update = insert

5/29

Do we need write-optimisation?

● 1Gb table ● Update all values:

Without index ~ 200sWith index ~ 600s

6/29

DBMS trade-offs

7/29

DBMS trade-offs

● CAP theorem● ACID vs BASE● Lower hardware cost vs Increase productivity● Read speed vs Write speed● Productivity vs Fault-tolerance

8/29

Write-optimisation

● Writes are faster● Reads are good● Storage is fault-tolerant

9/29

Insert buffer

● Accumulate data. Sort. Insert at a time.+ Avoids random I/O- Seqscan buffer- Possible data loss- Merge time

● Avoids hidden scans- only non-unique

● MySQL InnoDB Change Buffer 10/29

Cache-oblivious data structures

● Approximately optimal for any hardware● Divide & Conquer

11/29

LSM trees

● Cascade of B-trees● First tree is in memory

● LevelDB● BigTable ● Cassandra ● Hbase● SophiaDB● other NoSQL DBs

12/29

From LSM to COLA

● Search optimisation for LSM● Leaf levels are linked by lookahead pointers

● Introduced in 2007 by founders of Tokutek

13/29

Cache-oblivious lookahead arrays

● Drop internal trees nodes● Bound the scan area with lookahead pointers

14/29

COLA: theory and pracrice

● Prototype shows incredible results!

15/29

COLA: theory and pracrice

● VACUUM?● WAL?● Concurrency?● Index size?

● too hard =(

● Prototype shows incredible results!

16/29

Fractal Tree

● Insert the message instead of the data● Send it down the tree● Apply a message

to leaf page

● TokuDB for MySQL● TokuMX for MongoDB

17/29

PostgreSQL specific

18/29

PostgreSQL specific (1)

● Write-Ahead-Log• WAL is not extendable

● Storage manager• 1 Relation (Heap or Index) = 1 continuous file• Free Space Map

● Block size• 8 Kb

19/29

Advanced indexes

20/29

Advanced PostgreSQL indexes

● Optimize the number of indexes• pg_stat_statements

● REINDEX● CREATE INDEX CONCURRENTLY

• rebuild bloated and fragmented indexes● Partial indexes● BRIN

• tiny min/max index

21/29

Ideas?

22/29

Covering and Unique

● To maintain constraint (Unique, Pimary key..) on A● To use IndexOnlyScan on A,B● Have to maintain2 indexes

23/29

Covering + Unique

CREATE UNIQUE INDEX ON mytable USING btree(a)INCLUDING(b);

24/29

Covering + Unique (1)

# CREATE INDEX covering_idx ON people (first_name) INCLUDING (last_name);

# EXPLAIN SELECT first_name, last_name FROM people WHERE first_name = 'Paul'; QUERY PLAN--------------------------------------------------------------------------- Index Only Scan using covering_idx on people (cost=0.28..1.44 rows=4 width=13) Index Cond: (first_name = 'Paul'::text)

25/29

Covering + Unique (2)

● Searches are faster• You can use IndexOnlyScan now

● Indexes maintenace overhead decreased• Inserts are faster• Size is smaller

26/29

Effective storage of duplicates

● Compress duplicated keys on index pageIN PROGRESS

27/29

Bulk insert

● INSERT INTO mytable SELECT x FROM generate_series(0, 1000000) as x;

● 1.000.000 B-tree searches● 1.000.000 WAL records

28/29

Insert Buffer

● Flexible● Recoverable

29/29

www.postgrespro.ru

Thanks for attention!Any questions?

[email protected]

mailto:[email protected]

Indexes don't mean slow inserts.

Technology

Transcript of Indexes don't mean slow inserts.