...Lag

44
...Lag

Transcript of ...Lag

Page 1: ...Lag

...Lag

Page 2: ...Lag

...Lag

What’s wrong with my slave?

Page 3: ...Lag

Act I

How do I know?

Page 4: ...Lag

Monitoring 101Cacti, Nagios, Zabbix + pagerduty

Page 5: ...Lag

select * from pg_stat_replication;

Page 6: ...Lag

What is Normal?

Page 7: ...Lag

Create Table rep_ts (ts timestamp not null default NOW()

);

Page 8: ...Lag

Time V Bytes

Page 9: ...Lag

Time Measurement

Page 10: ...Lag

SELECT CASE WHEN pg_last_xlog_receive_location() =

pg_last_xlog_replay_location() THEN 0 ELSE EXTRACT (EPOCH FROM now() -

pg_last_xact_replay_timestamp())::int END AS log_delay

Page 11: ...Lag

Normal - 7 Days

Max < 3.5 Sec, 10 spikes Date ->

Tim

e ->

Page 12: ...Lag

Also Normal

Date ->

Tim

e ->

Max < 20k Sec, 7 spikes

Page 13: ...Lag

7 Days, 2 regular spikes

Page 14: ...Lag

Still Normal

Date ->

Tim

e ->

Max < 10k Sec, 14 spikes

Page 15: ...Lag

is replication paused?

Page 16: ...Lag

Byte Measurement

Page 17: ...Lag

SELECT client_hostname, pg_xlog_location_diff

(pg_stat_replication.sent_location, pg_stat_replication.

replay_location),

EXTRACT(EPOCH FROM now())

FROM pg_stat_replication;

Page 18: ...Lag

Normal - 7 Days

Page 19: ...Lag

Act II

What’s Going Wrong?

Page 20: ...Lag

Most issues are in the initial setup phase.

Page 21: ...Lag

1. Configuration

Page 22: ...Lag

1. Configuration2. Hardware

Page 23: ...Lag

1. Configuration2. Hardware3. Human Error

Page 24: ...Lag

Async or Sync?

Page 25: ...Lag

Traffic Isolation:separate host with alternate

configuration

Page 26: ...Lag

Act III

Configuration

Page 27: ...Lag

max_standby_archive_delay&

max_standby_streaming_delay

Page 28: ...Lag

max_standby_archive_delay

Applies as WAL data is read

Page 29: ...Lag

max_standby_streaming_delay

Applies when WAL data is received

Page 30: ...Lag

replication_timeout&

wal_receiver_status_interval

Page 31: ...Lag

hot_standby_feedback

Page 32: ...Lag

hot_standby_feedback+

max_standby_streaming_delay=

LAG

Page 33: ...Lag

Act IV

Hardware

Page 34: ...Lag

25 - 30 Seconds

5 Hours, Irregular Spikes

Page 35: ...Lag

Spindle disk, Non-Partitioned xlogs

Page 36: ...Lag

15 Hours, Mostly Regular Spikes

Page 37: ...Lag

VERY Slight Increase

Page 38: ...Lag

Zoomed Out, 3 Weeks

Clear Pattern from 0 to ~5 s

Page 39: ...Lag

Zoomed Out, 3 Weeks

Clear Pattern from 0 to ~5 s

Page 40: ...Lag

NTP

Page 41: ...Lag

What’s that 500 second Abnormal Spike?

Page 42: ...Lag

Not Worth It

Page 43: ...Lag

What have you experienced?

Page 44: ...Lag