HA with RelStorage and Postgres

12
Harder, better, faster, stronger. HA with RelStorage and Postgres abstract @ PLOG 2014 simone.deponti@abstract. it /

description

How to improve the availability of your Plone site using RelStorage and PostgreSQL, with the help of repmgr. A brief introduction to HA is given, before dwelling deep into the setup of RelStorage and PostgreSQL, the use of repmgr, and how to avoid common pitfalls and unexpected traps.

Transcript of HA with RelStorage and Postgres

Page 1: HA with RelStorage and Postgres

Harder, better, faster, stronger.

HA with RelStorage and Postgres

abstract @ PLOG [email protected] /

Page 2: HA with RelStorage and Postgres

HA?From the not-very-realiable Wikipedia page about it:

A = Ut / Tt

Where Ut is the uptime and Tt the total time

Page 3: HA with RelStorage and Postgres

Three rules of HA1. Eliminate single points of failure2. Have a reliable failover3. Detect failures as they occur

Page 4: HA with RelStorage and Postgres

It’s a short way to HA

Page 5: HA with RelStorage and Postgres

Two elephants in a cluster1. Use PostgreSQL’s native Streaming

Replication2. Means explicit master/slave roles at

any given time3. Manual failover (slave promotion) or

use repmgr

Page 6: HA with RelStorage and Postgres

repmgr1. Developed by phoenicians

2ndQuadrant2. Works in addition to streaming

replication3. Acts as watchdog and can take

automatic actions (run bash scripts)4. You run it on the slave node

(https://github.com/2ndQuadrant/repmgr)

Page 7: HA with RelStorage and Postgres

How it works1. Continuously and compulsively

checks twitter that the master is alive

2. If the master is unreachable for more than N seconds, runs a bash script

It also offers convenient command line tools to check status of cluster, promote nodes, syncronize.

Page 8: HA with RelStorage and Postgres

repmgr’s gotchas1. Create a database to store

replication info (do not follow the bad example of the documentation)

2. The suggested wal_keep_segments setting is too high, will use up to 78GB, can be reduced with -w option

3. Use custom promote and follow scripts

4. Launch daemon with with --monitoring-history

Page 9: HA with RelStorage and Postgres

Fail scenarios1. One node has a catastrophic failure

(easy)2. There is a total network outage

(when the network goes up again, you have a split brain)

3. There is network partitioning (similar to above, can be worst)

repmgr and streaming replication do no perform too well in cases 2 and 3

Page 10: HA with RelStorage and Postgres

Our work is never over1. Always notify when a failure is

detected2. Investigate ASAP, even if automatic

action was taken3. Have the slave try to exclude the

master upon promotion

Example of #3:The slave upon promotion contacts all the clients and tells them to avoid talking to master

Page 11: HA with RelStorage and Postgres

RelStorage and PostgreSQLIt has a smal issue with connections IDLE IN TRANSACTION.

Check with:

SELECT datname, usename, query_start, state_change, state, queryFROM pg_stat_activity;

It’s not fatal, but might result in locks during backups.

Page 12: HA with RelStorage and Postgres

Simone [email protected]