The State of HBase Replication

68
1 The State of HBase Replication Jean-Daniel Cryans May 5th, 2014

description

Speaker: Jean-Daniel Cryans (Cloudera) HBase Replication has come a long way since its inception in HBase 0.89 almost four years ago. Today, master-master and cyclic replication setups are supported; many bug fixes and new features like log compression, per-family peers configuration, and throttling have been added; and a major refactoring has been done. This presentation will recap the work done during the past four years, present a few use cases that are currently in production, and take a look at the roadmap.

Transcript of The State of HBase Replication

Page 1: The State of HBase Replication

1

The State of HBase ReplicationJean-Daniel CryansMay 5th, 2014

Page 2: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

About me

2

• Software Engineer at Cloudera, Storage team• Apache HBase committer since 2008, PMC member

Page 3: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Motivation for HBase Replication• Even though HBase is:

3

Page 4: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Motivation for HBase Replication• Even though HBase is:

• distributed;

3

Page 5: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Motivation for HBase Replication• Even though HBase is:

• distributed;• fault-tolerant;

3

Page 6: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Motivation for HBase Replication• Even though HBase is:

• distributed;• fault-tolerant;• highly available; and

3

Page 7: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Motivation for HBase Replication• Even though HBase is:

• distributed;• fault-tolerant;• highly available; and• almost magic.

3

Page 8: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Motivation for HBase Replication• Even though HBase is:

• distributed;• fault-tolerant;• highly available; and• almost magic.

3

Page 9: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

The Current State• It’s production-ready.

4

Page 10: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

The Current State• It’s production-ready.• It’s used to replicate data between thousands of nodes across continents.

4

Page 11: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

The Current State• It’s production-ready.• It’s used to replicate data between thousands of nodes across continents.• It’s used for Disaster Recovery, geo-distributed serving, and more.

4

Page 12: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.5

Agenda• Four Years of Replication• Use Cases in Production• Roadmap

Page 13: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Design• Clusters are distinct• Pull VS push• Sync VS Async

6

Page 14: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Clusters are Distinct•HBase doesn’t span DCs, HDFSs

7

Master20 RS

Slave15 RS

Page 15: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Clusters are Distinct•HBase doesn’t span DCs, HDFSs• .META. operations aren’t replicated

7

Master20 RS

Slave15 RS

Page 16: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Clusters are Distinct•HBase doesn’t span DCs, HDFSs• .META. operations aren’t replicated

• Regions can be different

7

Master20 RS

Slave15 RS

Page 17: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Clusters are Distinct•HBase doesn’t span DCs, HDFSs• .META. operations aren’t replicated

• Regions can be different• Security has to be configured for each cluster

7

Master20 RS

Slave15 RS

Page 18: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Push instead of Pull

8

MySQLMaster

MySQLSlave

Get binlog

Apply locally

MySQL Replication uses PullCluster A Cluster B

Page 19: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Push instead of Pull

9

RS RSreplicate entries

Apply to cluster

HBase Replication uses PushCluster A Cluster B

Page 20: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Async instead of Sync

10

Cluster A Cluster B

RSHLog

MemStore

RSHLog

MemStore

Synchronous Replication

Page 21: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Async instead of Sync

10

Cluster A Cluster B

RSHLog

MemStore

RSHLog

MemStore

Put2

3

1

Synchronous Replication

Page 22: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Async instead of Sync

10

Cluster A Cluster B

RSHLog

MemStore

RSHLog

MemStore

Put2

3

1

Ack Ack

Put5

6

4

78

Synchronous Replication

Page 23: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Async instead of Sync

11

Asynchronous Replication

Page 24: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Async instead of Sync

11

Asynchronous ReplicationCluster A

RSHLog

MemStore

Put

Ack

2

3

1

4

Page 25: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Async instead of Sync

11

Asynchronous ReplicationCluster A

RSHLog

MemStore

Put

Ack

2

3

1

4

Cluster B

RSHLog

MemStoreAck

Put3

4

2

5

HLogTailingThread

1

Page 26: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

First Release - 0.90.0• Simple master-slave (only one)•Disabled by default• Uses ZK as a metadata store

12

Page 27: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Original Implementation

13

replicateLogEntries()ReplicationSource

ZooKeeperWatcher

Region Server onMaster Cluster

ReplicationSink

HTablePut

Delete

Region Server onSlave Cluster

Page 28: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

First Lesson Learned•HDFS doesn’t support tailing files being written to. It requires:• open()• seek()// go where we stopped last time• while (not EOF || enoughData)

•read()

• close()• repeat

14

Page 29: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Second Lesson Learned• Single threaded, non-batched ZK is slow• ZK didn’t have an atomic move operation

• Doubles # ops needed, race conditions

15

Page 30: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Second Lesson Learned• Single threaded, non-batched ZK is slow• ZK didn’t have an atomic move operation

• Doubles # ops needed, race conditions

15

/hbase /replication /RS1 /1 /hlog1 /hlog2...

/hbase /replication /RS2 /1-RS1 /hlog1

1. create new hlog22. delete old hlog2

Page 31: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Second Release - 0.92.0• Cyclic replication•Multi-slave (scope LOCAL or GLOBAL)• Enable / disable peer• Special configurations

16

Page 32: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Cyclic Replication

17

Cluster1

Cluster2

Cluster3

Put Row X

Page 33: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Cyclic Replication

17

Cluster1

Cluster2

Cluster3

Put Row X

Put Row X

Page 34: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Cyclic Replication

17

Cluster1

Cluster2

Cluster3

Put Row X

Put Row X

Put Row X

Page 35: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Cyclic Replication

17

Cluster1

Cluster2

Cluster3

Put Row X

Put Row X

Put Row X

Row X is from 1Don’t replicate!

Page 36: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Multi-Slave

18

Cluster1

Cluster2

Cluster3

Put Row X

Page 37: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Multi-Slave

18

Cluster1

Cluster2

Cluster3

Put Row X

Put Row X

Page 38: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Multi-Slave

18

Cluster1

Cluster2

Cluster3

Put Row X

Put Row X Put Row X

Page 39: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Enable / Disable Peers

19

Cluster 1

RSHLog

Cluster 2

RSHLogTailingThread

Page 40: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Enable / Disable Peers> disable_peer ‘2’

19

Cluster 1

RSHLog

Cluster 2

RSHLogTailingThread

Is the peer enabled?

Page 41: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Enable / Disable Peers> disable_peer ‘2’

19

Cluster 1

RSHLog

Cluster 2

RSHLogTailingThreadHLog

Is the peer enabled?

Page 42: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Enable / Disable Peers> disable_peer ‘2’

19

Cluster 1

RSHLog

Cluster 2

RSHLogTailingThreadHLog

HLog

Is the peer enabled?

Page 43: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Enable / Disable Peers> disable_peer ‘2’

19

Cluster 1

RSHLog

Cluster 2

RSHLogTailingThreadHLog

HLog

HLogIs the peer enabled?

Page 44: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Enable / Disable Peers> disable_peer ‘2’

19

Cluster 1

RSHLog

Cluster 2

RSHLogTailingThreadHLog

HLog

HLog

HLog Is the peer enabled?

Page 45: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Enable / Disable Peers> disable_peer ‘2’

19

Cluster 1

RSHLog

Cluster 2

RSHLogTailingThreadHLog

HLog

HLog

HLog

HLog

Is the peer enabled?

Page 46: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Special Configurations• KEEP_DELETED_CELLS

• Must be used on slaves with replication when deleting data.

20

Page 47: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Special Configurations• KEEP_DELETED_CELLS

• Must be used on slaves with replication when deleting data.

•MIN_VERSION• With TTL, makes it easy to configure a slave that contains only the last few days of data.

20

Page 48: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Third Lesson Learned• It’s easy to DDOS yourself.• Replication was using the normal handlers...• ... and using them to write back!

21

Handler1: PutHandler2: DeleteHandler3: ReplicateHandler4: GetHandler5: Put

Replicated Put goes in the queue

Page 49: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Fourth Lesson Learned• Instinctively, what would something called stop_replication do?

22

Page 50: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Fourth Lesson Learned• Instinctively, what would something called stop_replication do?•Good intentions, bad outcomes, HBASE-8861

22

start/stop_replicationX

Page 51: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Third Release - 0.96.0 / 0.98.0• Replication enabled by default!• Completely refactored for readability/extensibility (Chris Trezzo)• ReplicationSyncUp tool (HBASE-9047)• Throttling (HBASE-9501)• Finer grained replication controls (HBASE-8751)

23

Page 52: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

ReplicationSyncUp Tool•Works on an offline cluster• Can finish replicating the queues in ZK• Useful to finish draining a master cluster

24

HBase

HDFS

ZooKeeper

HBase

HDFS

ZooKeeper

ReplicationSyncUp

Page 53: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Finer Grained Replication Controls> set_peer_tableCFs '2', "table1; table2:cf1,cf2; table3:cfA,cfB"•Meaning: enable replication to peer #2 for:

• All of table1• cf1 and cf2 from table2• cfA and cfB from table3

25

Page 54: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.26

Agenda• Four Years of Replication•Use Cases in Production• Roadmap

Page 55: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Flurry• Two data centers, coast to coast• Three clusters, in master-master pairs

• 1200 nodes• 800 nodes• 30 nodes

• Replication traffic: 2Gbps• Latency between DCs: 85ms

27

Page 56: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Opower• Two clusters, same data center

• Master: tens of nodes• Slave: tens of nodes

• Replication traffic: 1GB/day• Bulk load replication traffic: 180GB/day• Recent use case

28

Page 57: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Lily HBase Indexer• Collaboration between NGData & Cloudera.

• NGData are the creators of the Lily data management platform.

• Lily HBase Indexer • Service which acts as a HBase replication listener.• Custom sink writes to SolrCloud.• Integrates Cloudera Morphlines library for ETL of rows.

29

Page 58: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.30

Agenda• Four Years of Replication• Use Cases in Production• Roadmap

Page 59: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Stop Relying on Permanent Znodes• Current rule is to never rely on znodes to survive cluster restarts, upgrades, etc.• State data should be kept in an HBase table.•Notification done through a new mechanism• See: https://issues.apache.org/jira/browse/HBASE-10295

31

Page 60: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Define a Replication Interface• Replication is somewhat extendable but it lacks stable interfaces.• The HBase Indexer is such an extension and it required surgery every time a committer sneezed.• See: https://issues.apache.org/jira/browse/HBASE-10504

32

Page 61: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Distributed Counters• Incrementing consists of:

33

Page 62: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Distributed Counters• Incrementing consists of:

1.Taking a lock;

33

Page 63: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Distributed Counters• Incrementing consists of:

1.Taking a lock;2.Get’ing the current value; and

33

Page 64: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Distributed Counters• Incrementing consists of:

1.Taking a lock;2.Get’ing the current value; and3.Put’ing the newly incremented value.

33

Page 65: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Distributed Counters• Incrementing consists of:

1.Taking a lock;2.Get’ing the current value; and3.Put’ing the newly incremented value.

• This breaks in Master-Master because the Puts are overwriting each other.

33

Page 66: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Distributed Counters• Incrementing consists of:

1.Taking a lock;2.Get’ing the current value; and3.Put’ing the newly incremented value.

• This breaks in Master-Master because the Puts are overwriting each other.• See https://issues.apache.org/jira/browse/HBASE-2804

33

Page 67: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

More Tooling• Replication management console, one shell to rule all the clusters!• Replication bootstrapping tool.• Tool that can move queues between region servers.• Tool that can throttle replication on a live cluster.

34

Page 68: The State of HBase Replication

©2014 Cloudera, Inc. All rights reserved.

Questions?•Or ping me async:

• @jdcryans• [email protected]• jdcryans on #hbase irc.freenode.net

35