Indexes don't mean slow inserts.
-
Upload
anastasia-lubennikova -
Category
Technology
-
view
492 -
download
0
Transcript of Indexes don't mean slow inserts.
www.postgrespro.ru
Indexes don't meanslow inserts
Anastasia Lubennikova
Agenda
1. Why do we need it?2. Write-optimisation techniques3. PostgreSQL specific4. Advanced PostgreSQL indexes5. Future of indexing in PostgreSQL
2/29
PostgreSQL indexes
● Speed up search● Primary key● Constraints
● Secondaryindexes
3/29
Index maintenance overhead
● Index size● INSERT slowdown● Random I/O● Index becomes
fragmented
● More indexes -more overhead
4/29
Do we need write-optimisation?
UPDATE mytable SET a = a + 1;
● Heavy write load● MVCC
update = insert
5/29
Do we need write-optimisation?
● 1Gb table ● Update all values:
Without index ~ 200sWith index ~ 600s
6/29
DBMS trade-offs
7/29
DBMS trade-offs
● CAP theorem● ACID vs BASE● Lower hardware cost vs Increase productivity● Read speed vs Write speed● Productivity vs Fault-tolerance
8/29
Write-optimisation
● Writes are faster● Reads are good● Storage is fault-tolerant
9/29
Insert buffer
● Accumulate data. Sort. Insert at a time.+ Avoids random I/O- Seqscan buffer- Possible data loss- Merge time
● Avoids hidden scans- only non-unique
● MySQL InnoDB Change Buffer 10/29
Cache-oblivious data structures
● Approximately optimal for any hardware● Divide & Conquer
11/29
LSM trees
● Cascade of B-trees● First tree is in memory
● LevelDB● BigTable ● Cassandra ● Hbase● SophiaDB● other NoSQL DBs
12/29
From LSM to COLA
● Search optimisation for LSM● Leaf levels are linked by lookahead pointers
● Introduced in 2007 by founders of Tokutek
13/29
Cache-oblivious lookahead arrays
● Drop internal trees nodes● Bound the scan area with lookahead pointers
14/29
COLA: theory and pracrice
● Prototype shows incredible results!
15/29
COLA: theory and pracrice
● VACUUM?● WAL?● Concurrency?● Index size?
● too hard =(
● Prototype shows incredible results!
16/29
Fractal Tree
● Insert the message instead of the data● Send it down the tree● Apply a message
to leaf page
● TokuDB for MySQL● TokuMX for MongoDB
17/29
PostgreSQL specific
18/29
PostgreSQL specific (1)
● Write-Ahead-Log• WAL is not extendable
● Storage manager• 1 Relation (Heap or Index) = 1 continuous file• Free Space Map
● Block size• 8 Kb
19/29
Advanced indexes
20/29
Advanced PostgreSQL indexes
● Optimize the number of indexes• pg_stat_statements
● REINDEX● CREATE INDEX CONCURRENTLY
• rebuild bloated and fragmented indexes● Partial indexes● BRIN
• tiny min/max index
21/29
Ideas?
22/29
Covering and Unique
● To maintain constraint (Unique, Pimary key..) on A● To use IndexOnlyScan on A,B● Have to maintain2 indexes
23/29
Covering + Unique
CREATE UNIQUE INDEX ON mytable USING btree(a)INCLUDING(b);
24/29
Covering + Unique (1)
# CREATE INDEX covering_idx ON people (first_name) INCLUDING (last_name);
# EXPLAIN SELECT first_name, last_name FROM people WHERE first_name = 'Paul'; QUERY PLAN--------------------------------------------------------------------------- Index Only Scan using covering_idx on people (cost=0.28..1.44 rows=4 width=13) Index Cond: (first_name = 'Paul'::text)
25/29
Covering + Unique (2)
● Searches are faster• You can use IndexOnlyScan now
● Indexes maintenace overhead decreased• Inserts are faster• Size is smaller
26/29
Effective storage of duplicates
● Compress duplicated keys on index pageIN PROGRESS
27/29
Bulk insert
● INSERT INTO mytable SELECT x FROM generate_series(0, 1000000) as x;
● 1.000.000 B-tree searches● 1.000.000 WAL records
28/29
Insert Buffer
● Flexible● Recoverable
29/29