WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open...

65
WHO WANTS A SERVICE WITH ZERO DOWNTIME?

Transcript of WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open...

Page 1: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

WHO WANTS A SERVICE WITH ZERO DOWNTIME?

Page 2: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

… EVERYBODY

Page 3: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

IS IT THAT GOOD?

Page 4: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

NOT JUST TECHNOLOGY. RISKS, PROCEDURES, PEOPLE

Page 5: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

FROM 0 TO ~100: BUSINESS CONTINUITY WITH POSTGRESQL

Giulio Calacoci Senior Developer @ 2ndQuadrant

DataOps 2019 Barcellona

Page 6: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

ABOUT MYSELF

▸ Open Source passionate since early 2k

▸ Member of the Italian and European PostgreSQL community

▸ Lean and DevOps practitioner

▸ Open Source Developer

▸ Member of the Barman team

▸ Continuous Delivery Architect @2ndQuadrant

▸ 24/7 support engineer @2ndQuadrant

Page 7: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

BUSINESS CONTINUITY

▸ Disaster Recovery

▸ High Availability

▸ Types of disaster/failures

▸ Availability = Uptime / (Uptime + Downtime)

Page 8: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

OBJECTIVES

▸ Recovery Point Objective (RPO)

▸ How much data can I afford to lose?

▸ Recovery Time Objective (RTO)

▸ How long will it take me to recover?

Page 9: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

SERVICE RELIABILITY

▸ Cost of downtime

▸ How many €/$/£/AUD/AED/…?

▸ Risk management

▸ SLI, SLO and SLA

Page 10: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

SOME NOTES FOR THIS PRESENTATION

▸ PostgreSQL on Linux

▸ Servers can be either physical or virtual

▸ Storage must be redundant

▸ RAID is required

▸ VOLUME: redundant disk mounted on a system

Page 11: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

LET’S START

Page 12: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

0. ONE POSTGRES SERVER

Page 13: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

ARCHITECTURE

Server name: HOPE

Page 14: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

RECAP

▸ Why is RPO = ∞?

▸ Why is RTO = n/a?

▸ “Hope is not a strategy” (cit. Google)

▸ More common than you’d expect

Page 15: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

10. ONE POSTGRES SERVER + LOGICAL BACKUPS

Page 16: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

ARCHITECTURE

Add systematic backups with pg_dump

LOGICAL BACKUP LOGICAL

BACKUPLOGICAL BACKUP …

Day 04AM

Day -1 4AM

Day -2 4AM

Page 17: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

RECAP

▸ How do you feel now?

▸ Still: RPO = ∞ and RTO = n/a. Why?

▸ A backup is valid only if you have tested it

▸ Unfortunately, this is very common

Page 18: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

20. ONE POSTGRES SERVER + LOGICAL BACKUPS + LOGICAL RESTORES

Page 19: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

ARCHITECTURE

Test your backups with pg_restore

LOGICAL BACKUP

Day 04AM

Page 20: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

DEFINING SOME OBJECTIVES

▸ Measure time for pg_restore

▸ RPO = backup frequency

▸ RTO = maximum time of recovery

▸ Provision another server

▸ Configure another server (automated, right?)

▸ Time to restore the last backup (measure it)

Page 21: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

HAVE WE REALLY THOUGHT ABOUT EVERYTHING?

Page 22: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

TIME OF REACTION

Page 23: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

RECAP

▸ Can this architecture work for you?

▸ We need reliable monitoring

▸ From now on, we assume we have it in place!

▸ We need to reduce both RPO and RTO

Page 24: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

HOW?POINT-IN-TIME-RECOVERY

Page 25: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

Using a time machine

Page 26: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

POSTGRESQL’S PITR

▸ Part of core (fully open source)

▸ Rebuild a cluster at a point in time

▸ From crash recovery to sync streamrep (physical/logical)

▸ RPO = 0 (zero data loss)

▸ Hot base backup, continuous WAL archiving, Recovery

▸ API

Page 27: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

BASIC CONCEPTS

▸ Continuous copy of WAL data (continuous archiving)

▸ Physical base backups

▸ Recovery:

▸ copy base backup to another location

▸ recovery mode (replay of WALs until target)

Page 28: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

BARMAN

▸ Latest version: Barman 2.8

▸ Open Source (GNU GPL 3)

▸ Written in Python

▸ Developed and maintained by 2ndQuadrant

▸ Available at www.pgbarman.org

Page 29: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

40. ONE POSTGRES SERVER + ONE BARMAN SERVER

Page 30: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

ARCHITECTURE

Continuous backup

Page 31: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

BASIC CONCEPTS

▸ Remote backup and recovery

▸ Multiple server management

▸ Backup catalogue and WAL archive

▸ Retention policies

Page 32: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

COPY METHOD

▸ PostgreSQL streaming

▸ Practical/Windows/Docker

▸ Rsync/SSH

▸ Incremental backup and recovery (via hard links)

▸ Parallel backup and recovery

▸ Network compression and bandwidth limitation

Page 33: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

WAL SHIPPING METHOD

▸ “archiving”, through “archive_command”:

▸ RPO ~ 16MB of WAL data, or

▸ “archive_timeout”

▸ “streaming”, through streaming replication:

▸ “pg_receivewal” or “pg_receivexlog”

▸ continuous stream, RPO ~ 0

▸ PostgreSQL 9.2+ required

Page 34: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

EXAMPLE FROM POSTGRESQL.CONF

archive_mode = on

wal_level = logical

max_wal_senders = 10

max_replication_slots = 10

archive_command = 'rsync -a %p

barman@HOST:/var/lib/barman/ID/incoming'

Page 35: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

EXAMPLE FROM BARMAN.CONF[stark] description = “Tony Stark database" ssh_command = ssh postgres@stark conninfo = user=barman-avengers dbname=postgres host=stark retention_policy = RECOVERY WINDOW OF 6 MONTHS copy_method = rsync reuse_backup = link parallel_jobs = 4 archiver = true streaming_archiver = true slot_name = barman_streaming

Page 36: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

RECAP

▸ How do you feel now?

▸ Still: RPO = ∞ and RTO = n/a. Why?

▸ A backup is valid only if you have tested it

▸ Barman reduces backup risks, does not exclude them

▸ Systematic tests (especially custom scripts)

▸ Business risk is very high

Page 37: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

60. ONE POSTGRES SERVER + ONE BARMAN SERVER + ONE RECOVERY SERVER

Page 38: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

ARCHITECTURE

Test your backups with barman

recover

Page 39: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

WHAT A WASTE!

Page 40: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

TESTING OR BI?HAVE YOU EVER THOUGHT OF USING IT FOR

Page 41: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

HOOK SCRIPTS

▸ Barman has hook scripts:

▸ pre and post backup

▸ pre and post archiving

▸ with retry option (until the script returns SUCCESS)

Page 42: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

EXAMPLE OF RECOVERY SCRIPT

▸ Write a bash script that:

▸ connects to a remote server via SSH

▸ stops the PostgreSQL server

▸ issues a “barman recover” with target “immediate”

▸ starts the PostgreSQL

▸ Set it as post-backup script

Page 43: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

SOME FOOD FOR THOUGHT

▸ Outcomes:

▸ Systematically test your backup

▸ Measure your recovery time

▸ Identical server? This is a backup server ready to start

▸ You can use a different data centre

▸ Be creative, PostgreSQL gives you infinite freedom!

Page 44: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

RECAP

▸ RPO ~ 0 (your backups work, every time)

▸ RTO = Time of reaction + Recovery time

▸ Example: RPO ~0 and RTO < 1 day

▸ Acceptable or not acceptable?

▸ Entry level architecture for business continuity

▸ Priority now: improve RTO

Page 45: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

HOW?REPLICATION

Page 46: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

POSTGRESQL’S REPLICATION

▸ Part of core (fully open source)

▸ One master, multiple standby servers

▸ Evolution of PITR

▸ Standby server is in continuous recovery mode

▸ Hot standby (read-only)

▸ Both streaming (9.0+) and file based pulling of WAL

▸ Cascading from a standby

Page 47: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

SYNCHRONOUS REPLICATION

▸ Fine control (from global down to transaction level)

▸ 2-safe replication

▸ COMMIT of a write transactions waits until written on both the master and a standby (or more from 9.6)

▸ Read consistency of a cluster

▸ RPO = 0 (zero data loss)

Page 48: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

80. TWO POSTGRES SERVERS + ONE BARMAN SERVER + ONE RECOVERY SERVER

Page 49: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

ARCHITECTURE

barman_restore_wal

barman recover

Symmetric Cluster

master standby

STARK ROGERS

Page 50: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

EXCERPT FROM ROGERS POSTGRESQL’S CONFIGURATIONpostgresql.conf:

hot_standby = on

recovery.conf:

standby_mode = ‘on' # Streaming primary_conninfo = ‘host=stark user=replica application_name=ha sslmode=require’ # Fallback via Barman restore_command = 'barman-wal-restore -U barman avengers stark %f %p’

Page 51: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

SWITCHOVER (PLANNED)

▸ Applications are paused (start of downtime)

▸ Shut down the master

▸ Allow the standby to catch up with the master

▸ Promote the standby

▸ Switch virtual IPs

▸ Resume applications (end of downtime)

▸ Reconfigure the former master as standby

Page 52: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

FAILOVER (UNPLANNED)

▸ The master is down (start of downtime)

▸ Promote the standby

▸ Change the virtual IP

▸ DEGRADED SYSTEM

Page 53: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

MANUAL SWITCHOVER AND FAILOVER

▸ Manual switchover != manual switchover procedure

▸ Manual switchover = manually triggered

▸ Automate the procedure!!!

▸ bash (good)

▸ Ansible (better)

▸ Enhance gradually

Page 54: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

RECAP

▸ RPO ~ 0 (your backups work, every time)

▸ RTO = Time of reaction + Time of promotion

▸ Criticality: manual intervention

▸ Reliable monitoring

▸ Trained people (practice & docs!)

Page 55: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

MANUAL FAILOVER VS AUTOMATED FAILOVER

▸ Risk management

▸ Split brain nightmare

▸ Automated is built on manual (test!)

▸ Your choice

▸ Very good solution for business continuity

▸ Uptime > 99.99% in a year

Page 56: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

90. TWO POSTGRES SYNC SERVERS + ONE BARMAN SERVER + ONE RECOVERY SERVER

Page 57: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

ARCHITECTURE

barman_restore_wal

barman recover

Synchronous

ZERO DATA LOSS

Page 58: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

SYNCHRONOUS REPLICATION

▸ Primary: Barman

▸ Zero data loss backup

▸ Primary: Standby

▸ Zero data loss cluster (reduce RTO)

▸ Just one configuration line in PostgreSQL

▸ synchronous_standby_names = '1 (ha, barman_receive_wal)'

Page 59: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

~100. TWO POSTGRES SYNC SERVERS + ONE BARMAN SERVER + ONE RECOVERY SERVER + REPMGR (AUTO-FAILOVER)

Page 60: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

ARCHITECTURE

Potential synchronous

Synchronous

repmgr repmgr

repmgr witness

Page 61: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

WHAT’S MORE?

Page 62: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

PUSH THE BOUNDARIES

▸ Repeatable architectures in multiple data centres

▸ PgBouncer

▸ Virtual IPs

▸ S3 relay via Barman hook scripts

▸ Multiple standby servers and cascading replication

▸ Docker containers

▸ Logical replication backups

Page 63: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

CONCLUSIONS

▸ Babysteps and KISS

▸ New? Explore and learn

▸ Practice is the only way to mastery (drills)

▸ Plan regular healthy downtimes

▸ Use switchovers to perform PostgreSQL updates

▸ Smart downtimes increase long-term uptime

Page 64: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

ANY QUESTIONS?

▸ PostgreSQL: www.postgresql.org

▸ Barman: www.pgbarman.org #pgbarman

▸ PgBouncer: pgbouncer.github.io

▸ Repmgr: www.repmgr.org

▸ Our blog: blog.2ndquadrant.com

Page 65: WHO WANTS A SERVICE WITH ZERO DOWNTIME? · POSTGRESQL’S REPLICATION Part of core (fully open source) One master, multiple standby servers Evolution of PITR Standby server is in

2ndquadrant.com

@asdmaster @2ndQuad #PostgreSQL #DataOps #Barcellona #BusinessContinuity

LICENCE

Attribution 4.0 International (CC BY 4.0)

You are free to:

▸ Share — copy and redistribute the material in any medium or format

▸ Adapt — remix, transform, and build upon the material for any purpose, even commercially.

The licensor cannot revoke these freedoms as long as you follow the license terms.