卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM &...

52
HA Architecture in DP MMM & Memcached 卢钧轶@DP

Transcript of 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM &...

Page 1: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

HA Architecture in DPMMM & Memcached

卢钧轶@DP

Page 2: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

Web

MemcachedCluster

MMM

HA in DP

WriterDB

ReaderDB

memcache

Web1 Web2 Web3

memcache memcache

Page 3: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM

Page 4: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

What is MMM

● Perl● Message between Monitor & Agent● Auto Failover for M/S

but MMM is not:● SQL router● Load Balancer

Page 5: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

Products like MMM

● MHA● LVS + Heartbeat● Pacemaker + Heartbeat

Page 6: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Internals

Monitorwhile(){

process_check_resultscheck_host_statesprocess_commandsdistribute_rolesend_status_to_agents

}

Agentwhile( read socket){

handle_command}

Page 7: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM architecture

Monitor

Slave

Master

Slave

Master

Page 8: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM architecture

Monitor

Slave

Master

Slave

Master

Page 9: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

How MMM Do Failover

Monitor

Slavevip3

Mastervip1

Slavevip4

Mastervip2

Page 10: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

How MMM Do Failover

Monitor

Slavevip3

Mastervip1

Slavevip4

Mastervip2set global read_only=1

Page 11: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

How MMM Do Failover

Monitor

Slavevip3

Master

Slavevip4

Mastervip2remove VIP

Page 12: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

How MMM Do Failover

Monitor

Slavevip3

Master

Slavevip4

Mastervip2

select MASTER_POS_WAIT()

Page 13: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

How MMM Do Failover

Monitor

Slavevip3

Master

Slavevip4

Mastervip2

show master status

Page 14: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

How MMM Do Failover

Monitor

Slavevip3

Master

Slavevip4

Mastervip2

change master to

Page 15: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

How MMM Do Failover

Monitor

Slavevip3

Master

Slavevip4

Mastervip1&vip2

vip1 online

Page 16: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMMMMM in DP

Page 17: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM in DP

Frontend Groupvip1 & vip2

Backend Groupvip3 & vip4

Job Groupvip5

Slavevip3 / vip5

Mastervip1

Slavevip4

Mastervip2

Page 18: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMMProblems in MMM

Page 19: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

What's wrong with MMM

MMM is 1) fundamentally broken and unsuitable for use as a HA tool2) absolutely cannot be fixed.

http://www.xaprb.com/blog/2011/05/04/whats-wrong-with-mmm/

Page 20: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 1

set read_only is difficult on busy serverset read_only will be blocked by long running SQL

Monitor

Slavevip3

Mastervip1

Slavevip4

Mastervip2set global read_only=1

Page 21: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 1

Monitor

Slavevip3

Mastervip1

Slavevip4

Mastervip2set global read_only=1

Page 22: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 1 -- Fix

Monitor

Slavevip3

Mastervip1

Slavevip4

Mastervip2remove vip

Page 23: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 1 -- Fix

Monitor

Slavevip3

Master

Slavevip4

Mastervip2kill uncommited

process

Page 24: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 1 -- Fix

Monitor

Slavevip3

Master

Slavevip4

Mastervip2kill uncommited

process

Page 25: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 1 -- Fix

Monitor

Slavevip3

Master

Slavevip4

Mastervip1&vip2

show master statschange master to

Page 26: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 2

Monitor

Slave30m Behind

Master

Slavevip4

Mastervip2

select MASTER_POS_WAIT()

Writer VIP cannot be accessed when slave is far behind master

Page 27: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 2

Monitor

Slavevip3

Master

Slavevip4

Mastervip1&2

Writer VIP cannot be accessed when slave is far behind master

30 minutes later.......

Page 28: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 2 -- Fix

Monitor

Slave30m behind

Master

Slavevip4

Mastervip2

Record the position on M2 and Bring on VIP1 immediately

select MASTER_POS_WAIT

Page 29: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 2 -- Fix

Monitor

Slave30m behind

Master

Slavevip4

Mastervip2

Record the position on M2 and Bring up VIP1 immediately

show master status$file $position

Page 30: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 2 -- Fix

Monitor

Slave30m behind

Master

Slavevip4

Mastervip1&2

Record the position on M2 and Bring up VIP1 immediately

Bring up VIP1

Page 31: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 2 -- Fix

Monitor

Slave30m behind

Master

Slavevip4

Mastervip1&2

Record the position on M2 and Bring up VIP1 immediately

select MASTER_POS_WAIT

Page 32: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MMM Problem 2 -- Fix

Monitor

Slave30m behind

Master

Slavevip4

Mastervip1&2

Record the position on M2 and Bring up VIP1 immediately

change master to M2$file $position

Page 33: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

Memcachedmemcached in DP

Page 34: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

Memcached in DP

Node1

Node2

Node3

Node3

Main Ring Backup Ring

Page 35: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

Memcached in DP

Node1

Node2

Node3

Node3

Main Ring Backup Ring

Client

set key1 set key1

Page 36: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

Memcached in DP

Node1

Node2

Node3

Node3

Main Ring Backup Ring

Client

get key1

Page 37: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

Memcached in DP

Node1

Node2

Node3

Node3

Main Ring Backup Ring

Client

get key1 get key1

Page 38: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MemcachedProblems We Met

Page 39: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MultiGet Hole

MultiGet / Gets: get command with multiple keys

Purpose: Omit the multiple network round-trips, when issuing multiple single get commands.

Problem: The gets command will be slower when we add more nodes into the cluster.

Page 40: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MultiGet Hole

Node1 Node2 Node3

Client get key1,key2 ... key12

Page 41: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MultiGet Hole

Client

<node1> get key1,key4,key7,key10

<node3> get key3,key6,key7,key12

<node2> get key2,key5,key8,key11

Node1 Node2 Node3

Page 42: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MultiGet Hole

Node1 Node2 Node3

ClientResultv1,v4,v7,v10

<node3> get key3,key6,key9,key12

<node2> get key2,key5,key8,key11

<node1> get key1,key4,key7,key10

Page 43: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MultiGet Hole

Node1 Node2 Node3

Client <node3> get key3,key6,key9,key12

<node2> get key2,key5,key8,key11

Resultv1,v4,v7,v10v2,v5,v8,v11

Page 44: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MultiGet Hole

Node1 Node2 Node3

ClientResult

v1,v4,v7,v10v2,v5,v8,v11v3,v6,v9,v12

<node3> get key3,key6,key9,key12

Page 45: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

MultiGet Hole

Node1 Node2 Node3

Client

One more Round Trip !!!!

Node4

Resultv1,v5,v9

v2,v6,v10v3,v7,v11v4,v8,v12

Page 46: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

Cache Miss Storm

Happens when : ● Memcached failed● Key expire

Ideal Cache Miss Procedure1. get memcached miss2. query MySQL3. set memcached

Page 47: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

Cache Miss Storm

In Fact !1. get memcached miss2. massive concurrent query on MySQL

(timeout)3. nothing be set into memcached4. cache miss forever....

Page 48: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

Cache Miss Storm -- Our Solution

Hot Key0. set local cache after every get1. get memcached miss2. add lock key

a. if (success) query MySQL & set memcacheb. if (failed) return local cache

* Only one web can query MySQL for missed key at the same time.

Page 49: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

VPL

VPL: virtual packet lossno actual packet loss, but vm response time exceeds the retransmission timeout

Two network-bounded virtual machine put together result in huge get timeout.

Page 50: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

VPL

A normal retransmission consume 50ms, which exceeds our Memcached timeout. timeout == no result == cache missResult: another kind of cache miss storm

Page 51: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

Avoid VPL

● Split Network-Bound biz on different real machine.

● Maybe UDP?● Maybe fast retransmission?

Page 52: 卢钧轶@DP MMM & Memcached - IT168.comtopic.it168.com/factory/adc2013/doc/lujunyi.pdf · MMM & Memcached 卢钧轶@DP. Web Memcached Cluster MMM HA in DP Writer DB Reader DB memcache

Thanks!Q&A