20100927 Built-in Replication in PostgreSQLa · •Logical hot backup –pg_dump •Maintenance...
Transcript of 20100927 Built-in Replication in PostgreSQLa · •Logical hot backup –pg_dump •Maintenance...
Copyright(c)2010 NTT, Inc. All Rights Reserved.
Built-in Replication in PostgreSQL
Fujii MasaoNTT OSS Center
09/27/2010
Copyright(c)2010 NTT, Inc. All Rights Reserved. 2
Who am I?
• Database engineer in NTT Open Source Software Center
• PostgreSQL developer since 2008
• Author of new built-in replication
Copyright(c)2010 NTT, Inc. All Rights Reserved. 3
Abstract
• What’s replication?
• Background
• How does the built-in replication work?– Features
– Architecture
– Limitations
– Future works
Copyright(c)2010 NTT, Inc. All Rights Reserved. 4
What’s replication?
• Create a replica of the database on multiple servers– Multiple servers have the same database
Client
Change
Change
Original Replicas
Copyright(c)2010 NTT, Inc. All Rights Reserved. 5
Why replication?
• High Availability– Reduces the system downtime
• Load Balancing– Improve the system performance
ClientClient
SQL SQLSQL
High Availability Load Balancing
DBMS DBMS
Copyright(c)2010 NTT, Inc. All Rights Reserved. 6
Background
• Historical policy– Avoid putting replication into core
PostgreSQL
– No "one size fits all" replication solution
• Replication war!
Slony-IBucardo
Londiste
Sequoia
PGCluster PostgresForest
Postgres-R
Mammoth
PyReplica
PL/Proxy
pgpool-II
rubyrepPostgres-XC
GridSQL
syncreplicator
Copyright(c)2010 NTT, Inc. All Rights Reserved. 7
Road to core
• No default choice– Too complex to install and use for simple cases
– Low activity, easily-inactive
– No Japanese document
– Cannot work on other than linux
– vs. other dbms
• v9.0– Simple, reliable basic replication in core
Copyright(c)2010 NTT, Inc. All Rights Reserved. 8
Built-in replication in PostgreSQL 9.0
• Streaming Replication– Capability to stream changes on master to standby
• Hot Standby– Capability to run read-only queries on standby
• 1 + 1 = 3
Streaming Replication
Change
Hot StandbyClient
R/W SQL R/O SQL
Master Standby
Copyright(c)2010 NTT, Inc. All Rights Reserved. 9
Master / Standbys
• One master / Multiple standbys– Only master accepts write query
– Both accepts read query
• Read scalable– Not write scalable
Change
R/W SQLR/O SQL
Client
Master
Standbys
Copyright(c)2010 NTT, Inc. All Rights Reserved. 10
Cascading vs. Proxy
Client
Client
Master
Master Standby Standby
StandbysProxy
Not allow Cascading
Allow Proxy approach
Copyright(c)2010 NTT, Inc. All Rights Reserved. 11
Hot Standby
• Logical hot backup– pg_dump
• Maintenance– VACUUM, ANALYZE– (Replicated from master)
• Physical hot backup– pg_start/stop_backup
Allow• Query access
– SELECT
Not allow• Data Manipulation
Language (DML)– INSERT, UPDATE, DELETE– SELECT FOR UPDATE
• Data Definition Language (DDL)– CREATE, DROP, ALTER
Copyright(c)2010 NTT, Inc. All Rights Reserved. 12
Log-shipping
• WAL is shipped from master to standby– WAL a.k.a transaction log
• Standby in recovery mode– Keeps the database current by replaying receved
WAL
WALRecovery
DatabaseWALWAL
Write query
Master
Client
Standby
Copyright(c)2010 NTT, Inc. All Rights Reserved. 13
Limitation by log-shipping
• Must be the same between master and standby– H/W architecture– PostgreSQL major version
Master
StandbyClient
OS: 64bit
PG: v9.0.0
PG: v9.1.0
OS: 32bit
OS: 64bit
PG: v9.0.2
NG
NG
OK
Copyright(c)2010 NTT, Inc. All Rights Reserved. 14
Per database cluster granularity
• All database objects are replicated– Per-table granularity is not allowed
Per database cluster Per table
Master Standby Master Standby
Copyright(c)2010 NTT, Inc. All Rights Reserved. 15
Easy migration
• No need to change table definition– cf. Slony-I forces table to have a primary key
• No need to rewrite SQL– cf. Slony-I doesn’t replicate DDL– All the SQL PostgreSQL supports are available in master
• Easy to use existing database server as master
Master
Client
Standby
Client
Stand-alone
Easy migration!
Copyright(c)2010 NTT, Inc. All Rights Reserved. 16
No query distribution
• Postgres doesn’t provide query distribution capability– Implement query distribution logic into application– Use query distributor
ClientClient
Read queryWrite query
Master
Write query
Query
Distributor
Standby
Read query
Master Standby
Implement logic Use distributor
Copyright(c)2010 NTT, Inc. All Rights Reserved. 17
Shared nothing
• WAL is shipped via network– No special H/W required
– No distance limitation
– No single point of failure
Shared nothing Shared disk
Master Standby Master Standby
Copyright(c)2010 NTT, Inc. All Rights Reserved. 18
Asynchronous
• WAL is shipped asynchronously– Low performance impact on the master
– Data loss window on failover
– Query on the standby sees a bit outdated transactions
Transaction
Master
Client
Standby
“success”
WAL
Client thinks this transaction has been committed. But..
Transaction has not been replicated yet.
Copyright(c)2010 NTT, Inc. All Rights Reserved. 19
Failover
• Standby can be brought up anytime– Automatic failover requires clusterware
• Failover time is relatively short
ClientClient
Master
pgpool-II
StandbyMaster Standby
Pacemaker pgpool-II
VIP
Copyright(c)2010 NTT, Inc. All Rights Reserved. 20
Online standby addtion and deletion
• Standby can be added or deleted without downtime of the master and the other standbys– This is useful for small start system
Client
Master
New Standby
Don’t need to stop master during adding new standby
Copyright(c)2010 NTT, Inc. All Rights Reserved. 21
Built-in
• Easy to install and use– Need to install only Postgres
– User-intuitive usage
– Run on all the major operating systems
• Highly active community– Volunteers translate the document into Japanese
– Bug will be fixed soon
– Continuous improvement and development
– Many users
Copyright(c)2010 NTT, Inc. All Rights Reserved. 22
Architecture
walsender walreceiver startup
database database
Client
change
write
WAL WAL
read
send receive
write read
apply
access
Write query Read query
backendbackendbackend
backendbackendbackend
Master Standby
Copyright(c)2010 NTT, Inc. All Rights Reserved. 23
Multiple standbys
walsender walreceiver
walsender walreceiver
walsender walreceiver
WAL
backendbackendbackend WAL startup
WAL startup
WAL startup
Master
Standby
Standby
Standby
• One-to-one relationship between walsender and standby– WAL is shipped to each standby in parallel
Copyright(c)2010 NTT, Inc. All Rights Reserved. 24
Walsender and WAL
• Walsender always reads WAL from disk– Prevent standby from going ahead of
master– Avoid loss of consistency between master
and standby
• WAL is basically read from file cache– WAL is read just after written– I/O load by walsender is not high– But, WAL is read from disk if standby falls
far behind master
Copyright(c)2010 NTT, Inc. All Rights Reserved. 25
Recovery vs. Read-only query
startup
database
Client
WAL
read
apply
access
Read query
backendbackendbackend
Recovery
Copyright(c)2010 NTT, Inc. All Rights Reserved. 26
Recovery vs. Read-only query
• Until the conflict has been resolved,– Read query returns outdated result
– Failover is blocked
• Parameter specifying maximum delay in recovery– Increase the delay when running time-consuming
job
– Decrease the delay when we want to make the failover time short
Copyright(c)2010 NTT, Inc. All Rights Reserved. 27
Recovery vs. Read-only query
walsender walreceiver
database
Client
change
write
WAL WAL
read
send receive
write
Write query
backendbackendbackend
Master Standby
Don’t interfere with log-shipping
Copyright(c)2010 NTT, Inc. All Rights Reserved. 28
Future work - Synchronous
walsender walreceiver
database
Client
change
write
WAL
read
send receive
Write query
backendbackendbackend
Master Standby
• Synchronous replication is essential to avoid data loss on failover– Currently under development for 9.1– Three synchronization levels: recv, fsync, apply
“success”
recvMaster waits until standby has received WAL
reply
Copyright(c)2010 NTT, Inc. All Rights Reserved. 29
Future work - Synchronous
walsender walreceiver
database
Client
change
write
WAL WAL
read
send receive
write
access
Write query
backendbackendbackend
Master Standby
• Synchronous replication is essential to avoid data loss on failover– Currently under development for 9.1– Three synchronization levels: recv, fsync, apply
“success”
fsyncMaster waits until standby has written WAL
reply
Copyright(c)2010 NTT, Inc. All Rights Reserved. 30
Future work - Synchronous
walsender walreceiver startup
database database
Client
change
write
WAL WAL
read
send receive
write read
apply
Write query Read query
backendbackendbackend
Master Standby
• Synchronous replication is essential to avoid data loss on failover– Currently under development for 9.1– Three synchronization levels: recv, fsync, apply
“success”
applyMaster waits until standby has applied WAL
reply
Copyright(c)2010 NTT, Inc. All Rights Reserved. 31
Future work - Synchronous
• Per-transaction control– Some transactions are important, others are
not
• Quorum commit– Master waits for N standbys
Copyright(c)2010 NTT, Inc. All Rights Reserved. 32
If you find bug or problem
• Bug report form– http://www.postgresql.org/support/submit
bug
• Mail– [email protected]
Copyright(c)2010 NTT, Inc. All Rights Reserved.
Thank you for listening