PostgreSQL, the big the fast and the (NOSQL on) Acid

38
PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid Federico Campoli 21 May 2015 Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 1 / 34

Transcript of PostgreSQL, the big the fast and the (NOSQL on) Acid

Page 1: PostgreSQL, the big the fast and the (NOSQL on) Acid

PostgreSQL, The Big, The Fastand The (NOSQL on ) Acid

Federico Campoli

21 May 2015

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 1 / 34

Page 2: PostgreSQL, the big the fast and the (NOSQL on) Acid

Table of contents

1 The Big

2 The Fast

3 The (NOSQL on) Acid

4 Wrap up

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 2 / 34

Page 3: PostgreSQL, the big the fast and the (NOSQL on) Acid

Table of contents

1 The Big

2 The Fast

3 The (NOSQL on) Acid

4 Wrap up

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 3 / 34

Page 4: PostgreSQL, the big the fast and the (NOSQL on) Acid

The Big

Image by Caitlin - https://www.flickr.com/photos/lizard queen

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 4 / 34

Page 5: PostgreSQL, the big the fast and the (NOSQL on) Acid

PostgreSQL, an history of excellence

Created at Berkeley in 1982 by database’s legend Prof. Stonebraker

In the 1994 Andrew Yu and Jolly Chen added the SQL interpreter

In the 1996 becomes an Open Source project.

The project’s name changes in PostgreSQL

Fully ACID compliant

High performance in read/write with the MVCC

Tablespaces

Runs on almost any unix flavour

From the version 8.0 is native on *cough* MS Windows *cough*

HA with hot standby and streaming replication

Heterogeneous federation

Procedural languages (pl/pgsql, pl/python, pl/perl...)

Support for NOSQL features like HSTORE and JSON

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 5 / 34

Page 6: PostgreSQL, the big the fast and the (NOSQL on) Acid

PostgreSQL, an history of excellence

Created at Berkeley in 1982 by database’s legend Prof. Stonebraker

In the 1994 Andrew Yu and Jolly Chen added the SQL interpreter

In the 1996 becomes an Open Source project.

The project’s name changes in PostgreSQL

Fully ACID compliant

High performance in read/write with the MVCC

Tablespaces

Runs on almost any unix flavour

From the version 8.0 is native on *cough* MS Windows *cough*

HA with hot standby and streaming replication

Heterogeneous federation

Procedural languages (pl/pgsql, pl/python, pl/perl...)

Support for NOSQL features like HSTORE and JSON

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 5 / 34

Page 7: PostgreSQL, the big the fast and the (NOSQL on) Acid

PostgreSQL, an history of excellence

Created at Berkeley in 1982 by database’s legend Prof. Stonebraker

In the 1994 Andrew Yu and Jolly Chen added the SQL interpreter

In the 1996 becomes an Open Source project.

The project’s name changes in PostgreSQL

Fully ACID compliant

High performance in read/write with the MVCC

Tablespaces

Runs on almost any unix flavour

From the version 8.0 is native on *cough* MS Windows *cough*

HA with hot standby and streaming replication

Heterogeneous federation

Procedural languages (pl/pgsql, pl/python, pl/perl...)

Support for NOSQL features like HSTORE and JSON

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 5 / 34

Page 8: PostgreSQL, the big the fast and the (NOSQL on) Acid

PostgreSQL, an history of excellence

Created at Berkeley in 1982 by database’s legend Prof. Stonebraker

In the 1994 Andrew Yu and Jolly Chen added the SQL interpreter

In the 1996 becomes an Open Source project.

The project’s name changes in PostgreSQL

Fully ACID compliant

High performance in read/write with the MVCC

Tablespaces

Runs on almost any unix flavour

From the version 8.0 is native on *cough* MS Windows *cough*

HA with hot standby and streaming replication

Heterogeneous federation

Procedural languages (pl/pgsql, pl/python, pl/perl...)

Support for NOSQL features like HSTORE and JSON

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 5 / 34

Page 9: PostgreSQL, the big the fast and the (NOSQL on) Acid

Development

Old ugly C language

New development cycle starts usually in June

New version released usually by the end of the year

At least 4 LTS versions

Can be extended using shared libraries

Extensions (from the version9.1)

BSD like license

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 6 / 34

Page 10: PostgreSQL, the big the fast and the (NOSQL on) Acid

Limits

Database size. No limits.

Table size, 32 TB

Row size 1.6 TB

Rows in table. No limits.

Fields in table 250 - 1600 depending on data type.

Tables in a database. No limits.

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 7 / 34

Page 11: PostgreSQL, the big the fast and the (NOSQL on) Acid

Data types

Alongside the general purpose data types PostgreSQL have some exotic types.

Range (integers, date)

Geometric (points, lines etc.)

Network addresses

XML

JSON

HSTORE (extension)

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 8 / 34

Page 12: PostgreSQL, the big the fast and the (NOSQL on) Acid

Table of contents

1 The Big

2 The Fast

3 The (NOSQL on) Acid

4 Wrap up

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 9 / 34

Page 13: PostgreSQL, the big the fast and the (NOSQL on) Acid

The Fast

Image by Hein Waschefort -http://commons.wikimedia.org/wiki/User:Hein waschefort

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 10 / 34

Page 14: PostgreSQL, the big the fast and the (NOSQL on) Acid

Page layout

A PostgreSQL’s data file is an array of fixed length blocks called pages. Thedefault size is 8kb.

Figure : Index page

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 11 / 34

Page 15: PostgreSQL, the big the fast and the (NOSQL on) Acid

Page layout

Each page have an header used to enforce the durability, and the optional page’schecksum. There are some pointers used to track the free space inside the page.

Figure : Page headerFederico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 12 / 34

Page 16: PostgreSQL, the big the fast and the (NOSQL on) Acid

Tuple layout

Just after the header there is a list of pointers to the physical tuples stored in thepage’s end. Each tuple is and array of raw data, called datum. The nature of thisdatum is unknown to the postgres process. The datum becomes the data typewhen PostgreSQL loads the page in memory. This requires a system cataloguelook up.

Figure : Tuple structure

The tuple’s header is used in the MVCC.Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 13 / 34

Page 17: PostgreSQL, the big the fast and the (NOSQL on) Acid

The magic of the MVCC

Any operation in PostgreSQL happens through transactions.

By default when a single statement is successfully completed the databasecommits automatically the transaction.

It’s possible to wrap multiple statements in a single transaction using thekeywords [BEGIN;]....... [COMMIT; ROLLBACK]

The minimal possible level the transaction isolation is READ COMMITTED.

PostgreSQL from 9.2 supports the snapshot export to other sessions.

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 14 / 34

Page 18: PostgreSQL, the big the fast and the (NOSQL on) Acid

Consistency

The PostgreSQL’s consistency is achieved using the Multi Version ConcurrencyControl (MVCC).

Its logic is theoretically simple.

A 4 byte unsigned integer called xid is incremented by 1 and assigned to thecurrent transaction.

The committed xid which value is lesser than the current xid are in the pastand then visible to the current transaction.

The xid greater than the current xid are in the future and then invisible tothe current session.

The commit status is managed in the $PGDATA using the directory pg clogand pg serial where small 8k files are used to tracks the transaction statuses.

The the xid match is performed using the fields t xmin,t xmax in the tuple’sheader.

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 15 / 34

Page 19: PostgreSQL, the big the fast and the (NOSQL on) Acid

Tuple’s header

t xmin is set to the xid generated at tuple insertt xmax is set to the xid generated at tuple deletet cid is used to track the command’s sequence inside the same transaction

t cid basically solves the Halloween problem, where an update operation causes achange in the physical location of a row, potentially allowing the row to be visitedmore than once during the operation.

But there’s something missing, isn’t it?

Where is the field to store the UPDATE xid?

Figure : Tuple structure

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 16 / 34

Page 20: PostgreSQL, the big the fast and the (NOSQL on) Acid

Tuple’s header

t xmin is set to the xid generated at tuple insertt xmax is set to the xid generated at tuple deletet cid is used to track the command’s sequence inside the same transaction

t cid basically solves the Halloween problem, where an update operation causes achange in the physical location of a row, potentially allowing the row to be visitedmore than once during the operation.But there’s something missing, isn’t it?

Where is the field to store the UPDATE xid?

Figure : Tuple structure

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 16 / 34

Page 21: PostgreSQL, the big the fast and the (NOSQL on) Acid

There’s no such thing like an update

Well, PostgreSQL actually NEVER performs an update.

When an UPDATE statement is issued the updated rows are inserted with t xminset to the current XID value.

The old rows versions are marked as dead writing the t xmax field with thecurrent transaction id.

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 17 / 34

Page 22: PostgreSQL, the big the fast and the (NOSQL on) Acid

Dead tuples and VACUUM

A dead tuple is left in place for any transaction that should see it. This addsoverhead to any I/O operation.

VACUUM clears the dead tuples

VACUUM is designed to have the minimal impact on the database normalactivity

VACUUM removes only dead tuples no longer visible to the open transactions

Running VACUUM on the entire cluster at least every 2 billions transactionsis compulsory

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 18 / 34

Page 23: PostgreSQL, the big the fast and the (NOSQL on) Acid

Failure is not an option

XID is a 4 byte unsigned integer.

Every 4 billions transactions the value wraps

PostgreSQL uses the modulo − 231 comparison method

For each value 2 billions XID are in the future and 2 billions are in the past

When a xid’s age becomes too close to 2 billions VACUUM freezes the xminvalue to an hardcoded xid forever in the past

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 19 / 34

Page 24: PostgreSQL, the big the fast and the (NOSQL on) Acid

Failure is not an option

If for any reason an xid reaches 10 millions transactions from the wraparoundfailure the database starts emitting scary messages

WARNING: database "mydb" must be vacuumed within 177009986 transactionsHINT: To avoid a database shutdown, execute a database-wide VACUUM in "mydb".

If a xid’s age reaches 1 million transactions from the wraparound failurePostgreSQL shuts down and doesn’t restarts. When this happens the only optionto get back the cluster running is to use the single user mode and perform acluster’s VACUUM.

Anyway, the autovacuum daemon, even if is turned off, take care of theproblematic relations long before this catastrophic scenario happens.

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 20 / 34

Page 25: PostgreSQL, the big the fast and the (NOSQL on) Acid

Table of contents

1 The Big

2 The Fast

3 The (NOSQL on) Acid

4 Wrap up

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 21 / 34

Page 26: PostgreSQL, the big the fast and the (NOSQL on) Acid

The (NOSQL on) Acid

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 22 / 34

Page 27: PostgreSQL, the big the fast and the (NOSQL on) Acid

JSON

JSON - JavaScript Object Notation

The version 9.2 adds JSON as native data type

The version 9.3 adds the support functions for JSON

JSON is stored as text

JSON is parsed and validated on the fly

The 9.4 adds JSONB (binary) data type

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 23 / 34

Page 28: PostgreSQL, the big the fast and the (NOSQL on) Acid

JSON

JSON - ExamplesFrom record to JSON

postgres=# SELECT row_to_json(ROW(1,’foo’));row_to_json

---------------------{"f1":1,"f2":"foo"}

(1 row)

Expanding JSON into key to value elements

postgres=# SELECT * from json_each(’{"a":"foo", "b":"bar"}’);

key | value-----+-------a | "foo"b | "bar"

(2 rows)

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 24 / 34

Page 29: PostgreSQL, the big the fast and the (NOSQL on) Acid

HSTORE

HSTORE is a custom data type used to store key to value items

Is an extension

Data stored as text

A shared library does the magic transforming the datum in HSTORE

Is similar to JSON without nested elements

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 25 / 34

Page 30: PostgreSQL, the big the fast and the (NOSQL on) Acid

HSTORE

HSTORE - ExamplesFrom record to HSTORE

postgres=# SELECT hstore(ROW(1,2));hstore

----------------------"f1"=>"1", "f2"=>"2"

(1 row)

HSTORE expansion to key to value elements

postgres=# SELECT * FROM each(’a=>1,b=>2’);key | value

-----+-------a | 1b | 2

(2 rows)

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 26 / 34

Page 31: PostgreSQL, the big the fast and the (NOSQL on) Acid

JSON and HSTORE

There is a subtile difference between HSTORE and JSON. HSTORE is not anative data type.

The JSON is a native data type and the conversion happens inside the postgresprocess.

The HSTORE requires the access to the shared library.

Because the conversion from the raw datum happens for each tuple loaded in theshared buffer can affect the performance’s overall.

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 27 / 34

Page 32: PostgreSQL, the big the fast and the (NOSQL on) Acid

JSONB

Because JSON is parsed and validated on the fly and and this can be a bottleneck.

The new JSONB introduced with PostgreSQL 9.4 is parsed, validated andtransformed at insert/update’s time. The access is then faster than the plainJSON but the storage cost can be higher.The functions available for JSON are working with JSONB

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 28 / 34

Page 33: PostgreSQL, the big the fast and the (NOSQL on) Acid

Table of contents

1 The Big

2 The Fast

3 The (NOSQL on) Acid

4 Wrap up

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 29 / 34

Page 34: PostgreSQL, the big the fast and the (NOSQL on) Acid

Wrap up

Schema less data are useful. They are flexible and powerful.

Ignoring the MVCC implementation lead to disasters.

The lack of horizontal scalability in PostgreSQL can be a serious problem.

An interesting project for a distributed cluster is PostgreSQL XL -http://www.postgres-xl.org/Unfortunately at the date of writing is still in RC

Never forget PostgreSQL is a RDBMS

Get a DBA on board

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 30 / 34

Page 35: PostgreSQL, the big the fast and the (NOSQL on) Acid

Questions

Questions?

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 31 / 34

Page 36: PostgreSQL, the big the fast and the (NOSQL on) Acid

Contacts

Twitter: 4thdoctor scarf

PostgreSQL’s blog: http://www.pgdba.co.uk

PostgreSQL Book:http://www.slideshare.net/FedericoCampoli/postgresql-dba-01

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 32 / 34

Page 37: PostgreSQL, the big the fast and the (NOSQL on) Acid

License and copyright

This presentation is licensed under the terms of the Creative CommonsAttribution NonCommercial ShareAlike 4.0

http://creativecommons.org/licenses/by-nc-sa/4.0/

The elephant photo is copyright by Caitlin -https://www.flickr.com/photos/lizard queen

The cheetah photo is copyright by Hein Waschefort -http://commons.wikimedia.org/wiki/User:Hein waschefort

The elephant logos are copyright of the PostgreSQL Global DevelopmentGroup - http://www.postgresql.org/

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 33 / 34

Page 38: PostgreSQL, the big the fast and the (NOSQL on) Acid

PostgreSQL, The Big, The Fastand The (NOSQL on ) Acid

Federico Campoli

21 May 2015

Federico Campoli PostgreSQL, The Big, The Fast and The (NOSQL on ) Acid 21 May 2015 34 / 34