Old and New Tricks with GIN - iki.fi
Transcript of Old and New Tricks with GIN - iki.fi
![Page 1: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/1.jpg)
Old and New Tricks with GIN
Heikki Linnakangas / VMware
March 20, 2014
![Page 2: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/2.jpg)
What is GIN?
Generalized Inverted iNdex
Used to index things like
I full-text searchI arraysI key/value pairs (hstore)I json, xml (with expression indexes)
![Page 3: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/3.jpg)
GIN example: Arrays
create table int_arrays (intarr integer[]);
create index intarr_gin on int_arrays using GIN (intarr);
insert into int_arrays
select array[g, random() * 1000, random() * 1000]
from generate_series(1,10000) g;
![Page 4: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/4.jpg)
GIN example: Arrays
select * from int_arrays where intarr @> array[29, 95];
intarr
---------------
{4399,95,29}
{34355,29,95}
{59742,29,95}
{94927,95,29}
(4 rows)
![Page 5: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/5.jpg)
GIN example: Array operators
At index creation / insertion:
1. Extract elements from array
2. Index the elements
![Page 6: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/6.jpg)
GIN example: Array operators
At search:
1. Extract elements from query
2. Search the index for the elements
3. Return rows that contain all of them
@> - “contains”, must contain all elements&& - “overlap”, must contain at least one element
![Page 7: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/7.jpg)
Operator classes
PostgreSQL is extendable.
The operations to extract elements, search, and combine resultsare defined by an operator class
Built-in operator classes for arrays, full-text search, etc.
![Page 8: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/8.jpg)
Three fundamental GIN operations
1. Extract keys from a value to insert or query
System calls the opclass’ extractQuery / extractValue function
2. Index them
System stores the extracted keys in a B-tree, using the opclass’compare function.
3. Combine matches of several keys efficiently
System calls the opclass’ consistent function to determine if theitem with a combination of keys matches the overall query.
![Page 9: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/9.jpg)
GIN examples: Full-text search (1/2)
At insert:
1. Extract words from text:‘PostgreSQL - The world’‘s most advanced open sourcedatabase’->“postgresql”, “world”, “advanc”, “open”, “sourc”
2. Index the words in the b-tree within GIN index.
![Page 10: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/10.jpg)
GIN examples: Full-text search (2/2)
At search:
1. Extract words from query
2. Fetch all items containing any of the words
3. Determine which items match the overall query
Full-text search has a mini parser and syntax of its own:
select plainto_tsquery(’an advanced open source database’);
plainto_tsquery
-----------------------------------------
’advanc’ & ’open’ & ’sourc’ & ’databas’
(1 row)
![Page 11: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/11.jpg)
GIN examples: Trigrams (1/2)
At insert:
1. Extract trigrams from text:
foobar -> ‘f’, ‘fo’, ‘foo’, ‘oob’, ‘oba’, ‘bar’, ‘ar’
2. Index them
![Page 12: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/12.jpg)
GIN examples: Trigrams (2/2)
At search:
1. Extra trigrams from query
2. Fetch all items containing any of the trigrams.
3. Determine which items match the overall query
must have at least N common trigrams.
I Can speed up LIKE searches!I Also regular expressions!
![Page 13: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/13.jpg)
Three fundamental GIN operations
1. Extract keys from a value to insert or query
2. Index them
System stores the extracted keys in a B-tree, using the opclass’compare function.
3. Determine which rows match, based on the keys present
![Page 14: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/14.jpg)
Refresher: Regular B-tree
advanc: (0,8)advanc: (0,14)advanc: (0,22)advanc: (0,17)advanc: (0,26)...databas: (0,3)databas: (2,10)open: (0,11)postgresql: (0,8)postgresql: (0,41)...
![Page 15: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/15.jpg)
GIN on-disk format
![Page 16: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/16.jpg)
Posting list
I A posting list contains pointers to the physical tuples in thetable
I Each pointer consists of the Page Number and offset withinthe page
(0,8) (0,14) (0,17) (0,22) (0,26) (0,33) (0,34) (0,35) (0,45) (0,47)(0,48) (1,3) (1,4) (1,6) (1,8)
Can be stored in-line in the entry-tree, or as a whole separateB-tree (posting tree)
![Page 17: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/17.jpg)
Posting tree page format
9.3 format
(0,8) (0,14) (0,17) (0,22) (0,26) (0,33) (0,34) (0,35)(0,45) (0,47) (0,48) (1,3) (1,4) (1,6) (1,8)
Each pointer takes 6 bytes (4 bytes for block number and 2 foroffset): 90 bytes in total.
![Page 18: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/18.jpg)
Posting tree page format
9.4 format
(0,8) +6 +3 +5 +4 +7 +1 +1 +10 +2 +1 +2051 +1+2 +2
Stores the pointers in compressed format, as a difference from theprevious item: 21 bytes in total!
![Page 19: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/19.jpg)
9.4 Posting tree format - btree gin example
(btree gin extension is a “dummy” opclass implementation toemulate a normal B-tree)
create extension btree_gin;
create table numbers (n int4);
insert into numbers
select g % 10 from generate_series(1, 10000000) g;
create index numbers_btree on numbers (n);
create index numbers_gin on numbers using gin (n);
![Page 20: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/20.jpg)
9.4 Posting tree format - btree gin example9.4
postgres=# \di+
List of relations
Schema | Name | ... | Size | ...
--------+---------------+-----+--------+-----
public | numbers_btree | | 214 MB |
public | numbers_gin | | 11 MB |
(2 rows)
9.3
Schema | Name | ... | Size | ...
--------+---------------+-----+--------+-----
public | numbers_btree | | 214 MB |
public | numbers_gin | | 58 MB |
(2 rows)
![Page 21: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/21.jpg)
Wow!
Table 346 MB
B-tree index 214 MB
GIN (9.3) 58 MB
GIN (9.4) 11 MB
![Page 22: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/22.jpg)
New posting list format in 9.4
I Much more compactI The new code can still read old-format pages
I pg upgrade worksI but you won’t get the benefit until you REINDEX.
I More expensive to do random updates
I GIN isn’t very fast with random updates anyway. . .
![Page 23: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/23.jpg)
Recap: Three fundamental GIN operations
1. Extract keys from a value to insert or query
2. Index them
3. Combine matches of several keys efficiently, anddetermine which items match the overall query
![Page 24: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/24.jpg)
Consistent function
select plainto_tsquery(
’an advanced PostgreSQL open source database’);
plainto_tsquery
--------------------------------------------------------
’postgresql’ & ’advanc’ & ’open’ & ’sourc’ & ’databas’
(1 row)
select * from foo where col @@ plainto_tsquery(
’an advanced PostgreSQL open source database’
)
![Page 25: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/25.jpg)
3. Combine matches efficiently (0/4)The query returns the following matches from the index:
advanc databas open postgresql sourc
(0,8) (0,3) (0,2) (0,8) (0,1)
(0,14) (0,8) (0,8) (0,41) (0,2)
(0,17) (0,43) (0,30) (0,8)
(0,22) (0,47) (0,33) (0,12)
(0,26) (1,32) (0,36) (0,13)
(0,33) (0,44) (0,18)
(0,34) (0,46) (0,19)
(0,35) (0,56) (0,20)
(0,45) (1,4) (0,26)
(0,47) (1,22) (0,34)
(0,48) (1,24) (0,35)
(1,3) (1,32) (0,50)
(1,4) (1,39) (1,1)
(1,6) (1,5)
(1,8) (1,6)
![Page 26: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/26.jpg)
3. Combine matches efficiently (1/4)(0,1) contains only word “sourc” -> no match
advanc databas open postgresql sourc
(0,8) (0,3) (0,2) (0,8) (0,1)
(0,14) (0,8) (0,8) (0,41) (0,2)
(0,17) (0,43) (0,30) (0,8)
(0,22) (0,47) (0,33) (0,12)
(0,26) (1,32) (0,36) (0,13)
(0,33) (0,44) (0,18)
(0,34) (0,46) (0,19)
(0,35) (0,56) (0,20)
(0,45) (1,4) (0,26)
(0,47) (1,22) (0,34)
(0,48) (1,24) (0,35)
(1,3) (1,32) (0,50)
(1,4) (1,39) (1,1)
(1,6) (1,5)
(1,8) (1,6)
![Page 27: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/27.jpg)
3. Combine matches efficiently (2/4)(0,2) contains words “open” and “sourc” -> no match
advanc databas open postgresql sourc
(0,9) (0,3) (0,2) (0,8) (0,1)
(0,14) (0,8) (0,8) (0,41) (0,2)
(0,17) (0,43) (0,30) (0,8)
(0,22) (0,47) (0,33) (0,12)
(0,26) (1,32) (0,36) (0,13)
(0,33) (0,44) (0,18)
(0,34) (0,46) (0,19)
(0,35) (0,56) (0,20)
(0,45) (1,4) (0,26)
(0,47) (1,22) (0,34)
(0,48) (1,24) (0,35)
(1,3) (1,32) (0,50)
(1,4) (1,39) (1,1)
(1,6) (1,5)
(1,8) (1,6)
![Page 28: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/28.jpg)
3. Combine matches efficiently (3/4)(0,3) contains word “databas” -> no match
advanc databas open postgresql sourc
(0,8) (0,3) (0,2) (0,8) (0,1)
(0,14) (0,8) (0,8) (0,41) (0,2)
(0,17) (0,43) (0,30) (0,8)
(0,22) (0,47) (0,33) (0,12)
(0,26) (1,32) (0,36) (0,13)
(0,33) (0,44) (0,18)
(0,34) (0,46) (0,19)
(0,35) (0,56) (0,20)
(0,45) (1,4) (0,26)
(0,47) (1,22) (0,34)
(0,48) (1,24) (0,35)
(1,3) (1,32) (0,50)
(1,4) (1,39) (1,1)
(1,6) (1,5)
(1,8) (1,6)
![Page 29: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/29.jpg)
3. Combine matches efficiently (4/4)(0,8) contains all the words -> match
advanc databas open postgresql sourc
(0,8) (0,3) (0,2) (0,8) (0,1)
(0,14) (0,8) (0,8) (0,41) (0,2)
(0,17) (0,43) (0,30) (0,8)
(0,22) (0,47) (0,33) (0,12)
(0,26) (1,32) (0,36) (0,13)
(0,33) (0,44) (0,18)
(0,34) (0,46) (0,19)
(0,35) (0,56) (0,20)
(0,45) (1,4) (0,26)
(0,47) (1,22) (0,34)
(0,48) (1,24) (0,35)
(1,3) (1,32) (0,50)
(1,4) (1,39) (1,1)
(1,6) (1,5)
(1,8) (1,6)
![Page 30: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/30.jpg)
Fast Scan
Instead of scanning through the posting lists of all the keywords,only scan through the list with fewest items, and skip the otherlists to the next possible match.
I Big improvement for “frequent-term AND rare-term” stylequeries
![Page 31: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/31.jpg)
Fast scan example(0,8) contains all the words -> match
postgresql databas open advanc sourc
(0,8) (0,3) (0,2) (0,8) (0,1)
(0,41) (0,8) (0,8) (0,14) (0,2)
(0,43) (0,30) (0,17) (0,8)
(0,47) (0,33) (0,22) (0,12)
(1,32) (0,36) (0,26) (0,13)
(0,44) (0,33) (0,18)
(0,46) (0,34) (0,19)
(0,56) (0,35) (0,20)
(1,4) (0,45) (0,26)
(1,22) (0,47) (0,34)
(1,24) (0,48) (0,35)
(1,32) (1,3) (0,50)
(1,39) (1,4) (1,1)
(1,6) (1,5)
(1,8) (1,6)
![Page 32: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/32.jpg)
Summary: Improvements in 9.4
More compact posting list format
I 2x-10x smaller indexes, yay!
Fast scan
I Big speedup for queries with some frequent and some rareitems
Thanks to Alexander Korotkov for these improvements!
![Page 33: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/33.jpg)
Final GIN tip
GIN indexes are efficient at storing duplicates
I Use a GIN index using btree gin extension for status-fields etc.
postgres=# \di+
List of relations
Schema | Name | ... | Size | ...
--------+---------------+-----+--------+-----
public | numbers_btree | | 214 MB |
public | numbers_gin | | 11 MB |
(2 rows)
![Page 34: Old and New Tricks with GIN - iki.fi](https://reader030.fdocuments.in/reader030/viewer/2022012414/616ef4f72962ae351466fbb1/html5/thumbnails/34.jpg)
Questions?