Evolution of Fault Tolerance in PostgreSQL

1

Evolution of FaultEvolution of FaultTolerance in Tolerance in

156398, MSc 156398, MSc @ @

18 April 2016 Tallinn18 April 2016 Tallinn

PostgreSQLPostgreSQL

Gulcin YildirimGulcin YildirimIAF0530IAF0530 Tallinn University of Technology Tallinn University of Technology

http://www.postgresql.org/

https://ee.linkedin.com/in/gulcinyildirim

http://pld.ttu.ee/IAF0530/

http://www.ttu.ee/studying/masters/masters_programmes/computer-and-systems-engineering/

2

Overview of PostgreSQL DatabasePostgreSQL Fault Tolerance: WALWhat is Replication?Replication Methods for PostgreSQLPhysical ReplicationStreaming Replication and WALManaging Timeline Issues: pg_rewindTrigger-based Replication: Londiste, SlonyLogical Decoding : BDR and pglogical

AgendaAgenda

3

PostgreSQL in a nutshell (9.5)PostgreSQL in a nutshell (9.5)

Advanced open source db systemSQL standards compliance up to SQL:2011 Supports different data models: relational, document (JSON andXML), and key/value (hstore extension)

Highly extensible

Fully ACID-compliant (atomicity, consistency, isolation, durability)Allows physical and logical replicationBuilt-in physical and logical backup solutionSynchronous and asynchronous transactionsPITR (Point-in-time Recovery)MVCC (Multiversion concurrency control)

http://www.postgresql.org/about/

4

PostgreSQL is robust!PostgreSQL is robust!

All actions on the database are performed within transactions, protectedby a transaction log that will perform automatic crash recovery in caseof software failure.

Databases may be optionally created with data block checksums to helpdiagnose hardware faults. Multiple backup mechanisms exist, with fulland detailed PITR, in case of the need for detailed recovery. A variety ofdiagnostic tools are available.

Database replication is supported natively.

Synchronous Replication can provide greater than "5 Nines" (99.999percent) availability and data protection, if properly configured andmanaged.

http://www.postgresql.org/about/

5

WAL Write-ahead LogWAL Write-ahead Log

The WAL consists of a series of binary files written to the pg_xlogsubdirectory of the PostgreSQL data directory.

Each change made to the database is recorded first in WAL, hence thename "write-ahead" log, as a synonym of "transaction log". When atransaction commits, the default—and safe—behaviour is to force theWAL records to disk.

Should PostgreSQL crash, the WAL will be replayed, which returns thedatabase to the point of the last committed transaction, and thus ensuresthe durability of any database changes.

6

Transaction? Commit? Transaction? Commit?

Database changes themselves aren't written to disk at transaction commit.Those changes are written to disk sometime later by the background writer on awell-tuned server. (WAL)

Transactions are a fundamental concept of all database systems. Theessential point of a transaction is that it bundles multiple steps into asingle, all-or-nothing operation.

The intermediate states between the steps are not visible to otherconcurrent transactions, and if some failure occurs that prevents thetransaction from completing, then none of the steps affect the database atall. ( PostgreSQL does not support dirty-reads. )

7

CheckpointCheckpoint

Crash recovery replays the WAL, but from what point does it start to recover?

Recovery starts from points in the WAL known as checkpoints. Theduration of crash recovery depends on the number of changes in thetransaction log since the last checkpoint. A checkpoint is a known safestarting point for recovery, since it guarantees that all the previouschanges to the database have already been written to disk.

A checkpoint can be either immediate or scheduled. Immediatecheckpoints are triggered by some action of a superuser, such as theCHECKPOINT command or other; scheduled checkpoints are decidedautomatically by PostgreSQL.

8

PostgreSQL ReplicationPostgreSQL Replication

Database replication is the term we use to describe the technology usedto maintain a copy of a set of data on a remote system.

9

Postgres Replication HistoryPostgres Replication History

PostgreSQL 7.x (~2000)

Replication should not be part of core PostgresLondiste - Slony (trigger based logical replication)

PostgreSQL 8.0 (2005)

Point-In-Time Recovery (WAL)


Streaming Replication (physical)


Logical Decoding (changeset extraction)

10

Physical ReplicationPhysical Replication

11


The existing replication is more properly known as Physical StreamingReplication since we are streaming a series of physical changes from onenode to another. That means that when we insert a row into a table wegenerate change records for the insert plus all of the index entries.

When we VACUUM a table we also generate change records.

Also, Physical Streaming Replication records all changes at thebyte/block level, making it very hard to do anything other than justreplay everything.

12


WAL over network from master to standby

Sending files: scp, rsync, ftpStreaming changes: using internalprotocol (sender and receiver processes)

13

Standby ModesStandby Modes

Warm Standby

Can be activated immediately, but cannot perform

useful work until activated

Hot Standby

Node is already active

Read-only queries only

Multi-Master

All nodes can perform read/write work

14

Warm StandbyWarm Standby

Warm Standby

15

Hot StandbyHot Standby

Hot Standby

16

WAL LevelWAL Levelminimal ---------------------->

replica ----------------------->

logical ------------------------>

Suitable ForSuitable Forcrash recovery

physical replicationfile-based archiving

logical replication

WAL and ReplicationWAL and Replication

17

Failover and SwitchoverFailover and Switchover

In single-master replication, if the master dies, one of the standbys musttake its place ( promotion ). Otherwise, we will not be able to accept newwrite transactions. Thus, the term designations, master and standby, arejust roles that any node can take at some point. To move the master role toanother node, we perform a procedure named Switchover.

If the master dies and does not recover, then the more severe role changeis known as a Failover. In many ways, these can be similar, but it helps touse different terms for each event.

18

TimelinesTimelines

TL1

TL2

Master (Old master)

Standby (New master)

Failover

There are outstanding changes in the old masterTimeline increase represents new history of changesChanges from the old timeline can't be replayedon the servers that switched to new timelineThe old master can't follow the new master

19

TimelinesTimelines

TL1

TL2

Master (Old master)


Switchover

There are no outstanding changes in the old masterTimeline increase represents new history of changesThe old master can become standby for the newmaster

20

pg_rewind (9.5)pg_rewind (9.5)

TL1

TL2

Master (Old master)


Outstanding changes are removed using data fromthe new masterThe old master can follow the new master

21

Synchronous commitSynchronous commitBy default, PostgreSQL implements asynchronous replication, wheredata is streamed out whenever convenient for the server. As we've seenthis can mean data loss in case of failover. It's possible to ask Postgres torequire one (or more) standbys to acknowledge replication of the dataprior to commit, this is called synchronous replication ( synchronouscommit ).

With synchronous replication, the replication delay directly affects theelapsed time of transactions on the master. With asynchronousreplication, the master may continue at full speed.

Synchronous replication guarantees that data is written to at least twonodes before the user or application is told that a transaction hascommitted.

22

Synchronous commitSynchronous commit

The user can select the commit mode of each transaction, so that itis possible to have both synchronous and asynchronous committransactions running concurrently.

This allows flexible trade-offs between performance and certaintyof transaction durability.

23

Logical ReplicationLogical Replication

24

Logical ReplicationLogical ReplicationUnlike physical replication which captures changes to theraw data on disk, the logical replication captures the logicalchanges to the individual records in database and replicatesthose.

This allows for more complex replication topology thanmaster and standby and also allows for partial replication ofthe database (selective replication) .

The logical records work across major releases, so we canuse this to upgrade from one release to another.

There are two basic approaches to logical replication, thetrigger-based and the changeset extraction (called logicaldecoding in PostgreSQL).

25

Trigger-based ReplicationTrigger-based Replication

26

Triggered-based ReplicationTriggered-based Replication

Slony (~2004), Londiste (~2007)Predates the physical replication (as a result of the "noreplication in core philosophy")Uses triggers to capture the changes to individual table

Increases the amount of work needed to be done for eachwrite

Use table(s) as queue

Duplicates all writes

27

Logical Decoding a.k.a.Logical Decoding a.k.a.Changeset ExtractionChangeset Extraction

Extracts information from Write-Ahead Log into logical changes(INSERT/UPDATE/DELETE)Per row and commit orderedNo write amplificationC API for output pluginNo DDLSQL InterfaceStreaming Interface

28

Logical Streaming ReplicationLogical Streaming Replication

Build on top of logical decodingUses same transport mechanism as streaming replication(sender & apply)

Allows for synchronous commit

Currently available as extensions: BDR and pglogicalBetter performant than trigger-based replications

29

pglogicalpglogical

Publish / Subscribe modelMultiple upstream (publisher) servers into a single subscriberReplicates transactions in commit orderSelective ReplicationOnline UpgradeData Transport

Data integrationStreaming changes to analytical databaseMaster configuration data management...

Optionally synchronous (mix)

http://2ndquadrant.com/en/resources/pglogical/

30

BDR Bi-directional ReplicationBDR Bi-directional Replication

The project is used for feeding logical replicationdevelopment in PostgreSQLMulti-masterAsynchronousOptimistic conflict detection (after commit)Does not prevent concurrent writesConflict resolution:

Happens automaticallyLast update wins by default Custom resolution triggers

Eventually consistent (cluster)

31

BDR Bi-Directional Replication BDR Bi-Directional Replication

Bi-directional Replication

32

Thank you!Thank you!

33

ReferencesReferences1. 2. 3.

4. 5. 6. 7. 8. 9.

10. 11.

12.

PostgreSQL 9 High Availability CookbookPostgreSQL ReplicationPostgreSQL Documentation - High Availability, Load Balancing andReplicationBDR DocumentationPostgreSQL 9 Administration Cookbook - Second EditionWhy Logical Replication?pglogicalPerformance limits of logical replication solutionsStreaming replication slots in PostgreSQL 9.4Failover slots for PostgreSQLContinuous Archiving and Point-in-Time Recovery (PITR)

pg_rewind Nordic PGDay presentation by Heikki Linnakangas

http://www.amazon.com/PostgreSQL-9-High-Availability-Cookbook/dp/1849516960

http://www.amazon.de/gp/product/1849516723/ref=as_li_ss_tl?ie=UTF8&camp=1638&creative=19454&creativeASIN=1849516723&linkCode=as2&tag=flohcc00-21

http://www.postgresql.org/docs/current/static/high-availability.html

http://bdr-project.org/docs/0.9.0/

https://www.packtpub.com/big-data-and-business-intelligence/postgresql-9-administration-cookbook-second-edition

http://blog.2ndquadrant.com/why-logical-replication/

http://2ndquadrant.com/en/resources/pglogical/

http://blog.2ndquadrant.com/performance-limits-of-logical-replication-solutions/

http://blog.2ndquadrant.com/postgresql-9-4-slots/

http://blog.2ndquadrant.com/failover-slots-postgresql/

http://www.postgresql.org/docs/current/static/continuous-archiving.html

http://hlinnaka.iki.fi/presentations/NordicPGDay2015-pg_rewind.pdf

Evolution of Fault Tolerance in PostgreSQL

Software

Transcript of Evolution of Fault Tolerance in PostgreSQL