HBase Backups

48
HBase Backups Backups in the Enterprise Jesse Yates Demai Ni Jing Chen He Richard Ding 1 HBase Backups - HBaseCon 2014

description

Speakers: Jesse Yates (Salesforce.com), Demai Ni, Richard Ding & Jing Chen He (IBM) This talk provides an overview of enterprise-scale backup strategies for HBase: Jesse Yates will describe how Salesforce.com runs backup and recovery on its multi-tenant, enterprise scale HBase deploys; Demai Ni, Songqinq Ding, and Jing Chen of the IBM InfoSphere BigInsights development team will then follow with a description of IBM's recently open-sourced disaster/recovery solution based on HBase snapshots and replication.

Transcript of HBase Backups

Page 1: HBase Backups

HBase BackupsBackups in the Enterprise

Jesse Yates Demai NiJing Chen HeRichard Ding

1 HBase Backups - HBaseCon 2014

Page 2: HBase Backups

Overview• Commonalities• IBM BigInsights• Backups at Salesforce.com• Summary

2 HBase Backups - HBaseCon 2014

Page 3: HBase Backups

Commonalities• Per-Table Backups• Stored On HDFS• Full Backup + Incrementals• Fast Restore• Multiple Clusters• Timestamp file layout• Manifest Files for additional info• Merging Backups

3 HBase Backups - HBaseCon 2014

Page 4: HBase Backups

IBM BigInsights

HBase Backups - HBaseCon 20144

Page 5: HBase Backups

Backup Solution - IBM• Customer Requirements• Feature Overview• Technical Design• User Interface: CLI and Web UI• Data Structures

• OpenSource through HBASE-7912 HBase Backup/Restore Based on HBase Snapshot

5 HBase Backups - HBaseCon 2014

Page 6: HBase Backups

Customer Requirements• Backup and Restore

– Critical requirements from enterprise customers– General solution– Easy-to-use user interfaces: CLI and Web UI– Multiple file systems: HDFS and GPFS*

– Multiple MR frameworks: Hadoop and PSMR*

6 HBase Backups - HBaseCon 2014

*GPFS: IBM General Parallel File System

*PSMR: Platform Symphony MapReduce

Page 7: HBase Backups

Feature Overview• Full Backup based on HBase Snapshot• Incremental Backup based on HBase transaction logs• Table-level Incremental Backup• Point-In-Time Restore• On-the-fly and Off-line Convert from HLogs to HFiles • Off-line Merge Backup Images• Self-contained Backup Image with Manifest File• Usability features:

– progress, status, and history reports– purge old Backup Images

7 HBase Backups - HBaseCon 2014

Page 8: HBase Backups

Technical Design - Overview• Object: Backup Image• Operations:

– Full Backup– Incremental Backup– Convert– Merge– Restore

HBase Backups - HBaseCon 20148

Page 9: HBase Backups

Technical Design - Backup Images

Full Backup Table1(Monday)

Full Backup Table2(Tuesday)

Incremental Backup [Table1, Table2] (Wednesday)

Incremental Backup [Table1, Table2] (Thursday)

depends

dependsdepends

HBase Backups - HBaseCon 20149

Page 10: HBase Backups

Technical Design - Full Backup

10 HBase Backups - HBaseCon 2014

$ hbase backup create full hdfs://targetCluster.ibm.com:9000/hbasebackups biginsights:hbasecon_table1

Global DistributedWAL Roll

Take Snapshot

Track WAL Timestamp ThroughZookeeper

Export Snapshot

Generate Manifest

Page 11: HBase Backups

Technical Design - Incremental Backup

11 HBase Backups - HBaseCon 2014

$ hbase backup create incremental hdfs://targetCluster.ibm.com:9000/hbasebackups

Global DistributedWAL Roll

Track WAL Timestamp Through ZooKeeper

DistCp WAL Logs into Backup Image

Generate Manifest

Page 12: HBase Backups

Technical Design - Restore

12 HBase Backups - HBaseCon 2014

$ hbase restore hdfs://targetCluster.ibm.com:9000/hbasebackups biginsights:hbasecon_table1 biginsights:hbasecon_table1_restore

Create Table Pre-Split Using Manifest Info

Bulk Load HFilesFull and Incremental

Play WAL of Unconverted Hlogs

Verify Lineage and Restore

Page 13: HBase Backups

Technical Design - Convert

13 HBase Backups - HBaseCon 2014

$ hbase backup convert /hbasebackups backup_20140502_2100

full backup : backup_20140501_2100 Incremental backup backup_20140502_2100

/hbasebackups/biginsights/hbasecon_table1/ backup_20140501_2100/Metadata+HFiles backup_20140502_2100/Metadata /hbasebackups/biginsights/hbasecon_table2/ backup_20140501_2100/Metadata+HFiles backup_20140502_2100/Metadata

/hbasebackups/WALs/ backup_20140502_2100/HLogs of ALL Tables

Befor

e

Page 14: HBase Backups

Technical Design - Convert

14 HBase Backups - HBaseCon 2014

$ hbase backup convert /hbasebackups backup_20140502_2100

full backup : backup_20140501_2100 Incremental backup backup_20140502_2100

/hbasebackups/biginsights/hbasecon_table1/ backup_20140501_2100/Metadata+HFiles backup_20140502_2100/Metadata+HFiles /hbasebackups/biginsights/hbasecon_table2/ backup_20140501_2100/Metadata+HFiles backup_20140502_2100/Metadata+HFiles

/hbasebackups/WALs/ backup_20140502_2100/

Afte

r

Page 15: HBase Backups

Technical Design - Merge

15 HBase Backups - HBaseCon 2014

$ hbase backup merge /hbasebackups biginsights:hbasecon_table1 backup_20140501_2100 backup_20140502_2100

Full backup: backup_20140501_2100 Incremental backup: backup_20140502_2100

/hbasebackups/biginsights/hbasecon_table1/ backup_20140501_2100/ backup_20140502_2100/

/hbasebackups/biginsights/hbasecon_table1/ backup_20140502_2100/

TimeStamp 2

TimeStamp 1

TimeStamp 2

Page 16: HBase Backups

User Interface - CLI$ hbase backup helpUsage: hbase backup COMMANDwhere COMMAND is one of: create create a new backup cancel cancel an ongoing backup delete delete an existing backup describe show the detailed information of a backup history show history of all successful backups status show the status of the latest backup request convert convert incremental backup WAL files into HFiles merge merge backup images stop remove table(s) from backup table set show show table(s) in backup table setEnter 'help COMMAND' to see help message for each command

16 HBase Backups - HBaseCon 2014

Page 17: HBase Backups

User Interface – Web UI Backup

17 HBase Backups - HBaseCon 2014

Page 18: HBase Backups

User Interface – Web UI Restore

18 HBase Backups - HBaseCon 2014

Page 19: HBase Backups

Data Structure - Backup Image • Table Info and Region Info• Backup Manifest

– Table Name– Type: Full or Incremental– Size– Timestamp Info– State Info: Converted, Merged, Compacted, etc.– Dependency Lineage

• Data– HFiles– WALs (For Incremental Backup before convert)

19 HBase Backups - HBaseCon 2014

Page 20: HBase Backups

Data Structure - ZooKeeper/backup/hbase startcode {backup marker}

complete/ backupId_1 {contains backup metadata}

…… backupId_n

ongoing {contains the progress status of the current operation} failed {contains error code and message of the current operation} cancel {triggers a cancel operation } incr/ tablelogtimestamp/ table_1 {list of region servers and associated log timestamp for this table} …… table_n last-roll-log-ts/ rs_1 {contains the log timestamp from last roll log} …… rs_n

20 HBase Backups - HBaseCon 2014

Page 21: HBase Backups

HBase Backups - HBaseCon 2014

Sincere gratitude is hereby extended to the following developers who contributed to this effort:

Richard Ding, Jing Chen He, Enoch Hsu, Yu Li, Jihong Ma, Demai Ni, Kan Zhang, Liping Zhang, Xiang Zhou

* ordered by last name

21

Page 22: HBase Backups

Salesforce.com Backups

HBase Backups - HBaseCon 2014

Jesse Yates

22

Page 23: HBase Backups

Safe harbor statement under the Private Securities Litigation Reform Act of 1995: This presentation may contain forward-looking statements that involve risks, uncertainties, and assumptions. If any such uncertainties materialize or if any of the assumptions proves incorrect, the results of salesforce.com, inc. could differ materially from the results expressed or implied by the forward-looking statements we make. All statements other than statements of historical fact could be deemed forward-looking, including any projections of subscriber growth, earnings, revenues, or other financial items and any statements regarding strategies or plans of management for future operations, statements of belief, any statements concerning new, planned, or upgraded services or technology developments and customer contracts or use of our services.The risks and uncertainties referred to above include – but are not limited to – risks associated with developing and delivering new functionality for our service, our new business model, our past operating losses, possible fluctuations in our operating results and rate of growth, interruptions or delays in our Web hosting, breach of our security measures, risks associated with possible mergers and acquisitions, the immature market in which we operate, our relatively limited operating history, our ability to expand, retain, and motivate our employees and manage our growth, new releases of our service and successful customer deployment, our limited history reselling non-salesforce.com products, and utilization and selling to larger enterprise customers. Further information on potential factors that could affect the financial results of salesforce.com, inc. is included in our annual report on Form 10-K for the most recent fiscal year ended January 31, 2011. This document and others are available on the SEC Filings section of the Investor Information section of our Web site. Any unreleased services or features referenced in this or other press releases or public statements are not currently available and may not be delivered on time or at all. Customers who purchase our services should make the purchase decisions based upon features that are currently available. Salesforce.com, inc. assumes no obligation and does not intend to update these forward-looking statements.

23 HBase Backups - HBaseCon 2014

Safe Harbor

Page 24: HBase Backups

Salesforce Environment• Many tenants per cluster• At least 90 days of recovery• DR failover to remote DC• All writes through Phoenix

– Timestamp control

24 HBase Backups - HBaseCon 2014

Page 25: HBase Backups

Design Goals• Validate backups regularly• Minimize time to restore a tenant• Validate replication is up to date

• Minimize data storage

25 HBase Backups - HBaseCon 2014

Page 26: HBase Backups

Backups• M/R a table at a given point in time

– Point-in-time view of the table• Chunked by file size + tenant (per server)• Chunk manifest

– Chunk info (min/max/hash/tenant ids)

26 HBase Backups - HBaseCon 2014

Page 27: HBase Backups

Backups

27 HBase Backups - HBaseCon 2014

Key CF CQ TS Value

user1_a fam qual 14 value10

user1_a fam qual 12 Value5

user1_a fam qual 10 Valu2

user1_a fam qual 8 value4

user1_a fam qual 3 value13

user1_a fam qual 2 value56

1. http://phoenix.incubator.apache.org/

Page 28: HBase Backups

Backups

28 HBase Backups - HBaseCon 2014

Some HBase Table

M M M M M M M

Hadoop Distributed File System

Page 29: HBase Backups

Backups• Each backup is an incremental

– Lineage by convention• Never write too far back in time• Data retained by custom coprocessor

– Retained up to last successful backup

29 HBase Backups - HBaseCon 2014

Page 30: HBase Backups

“Backup isn’t a backup until you’ve restored it and tested it”

-- Some Ops Guy

30 HBase Backups - HBaseCon 2014

Page 31: HBase Backups

Restore + Validation• Restore each backup to a new table• Validate that backup has same data a existing

table– Within backup timerange

• Move ‘retained timestamps’ forward

31 HBase Backups - HBaseCon 2014

Page 32: HBase Backups

Restore

32 HBase Backups - HBaseCon 2014

HDFS

/hbase …/salesforce /backup /somehbasetable /03/14/14 backup.properties chunk1 chunk1.manifest …. chunk1000 chunk1000.manifest

M

M

M

SomeHBaseTable_Restore

Page 33: HBase Backups

Restore• Configurable validation percent

– Start high, move lower

• Backup only valid if restore is successful

33 HBase Backups - HBaseCon 2014

Page 34: HBase Backups

34 HBase Backups - HBaseCon 2014

90 Days of Backup is LOTS of Data

Even without any duplicates!

Page 35: HBase Backups

Granularity Reduction• Combine backups every ‘period’

– Week, month, 3 months– Specified in table metadata

• Keep latest version of the row• Helpful with lots of updates

– Not useful for unique data (e.g. time series)

35 HBase Backups - HBaseCon 2014

Page 36: HBase Backups

Granularity Reduction

36 HBase Backups - HBaseCon 2014

HDFS

/salesforce /backup /somehbasetable /03-14-14 /03-13-14 … /03-07-14 /03-01_07-14 /02-23_28-14 /02-16_24-14 /02-09_15-14 /01-14 /12-13 /11-13 /base

M

M

M

HDFS

/salesforce /03-07_14-14 /03-01_07-14

/02-14 /01-14 /12-13

/base

Page 37: HBase Backups

HDFS

Granularity Reduction

37 HBase Backups - HBaseCon 2014

HDFS

/salesforce /backup /somehbasetable /03-14-14 /03-13-14 … /03-07-14 /03-01_07-14 /02-23_28-14 /02-16_24-14 /02-09_15-14 /01-14 /12-13 /11-13 /base

M

M

M

Weekly Merge

Monthly Merge

/salesforce /03-07_14-14 /03-01_07-14

/02-14 /01-14 /12-13

/base

Rebuilt Base

Page 38: HBase Backups

38 HBase Backups - HBaseCon 2014

Meanwhile…

Remember that DR site?

Page 39: HBase Backups

Disaster Recovery

39 HBase Backups - HBaseCon 2014

Primary Data Center Buddy (DR) Data Center

Page 40: HBase Backups

Validation By Backup• Validate replication is working• Validate backup process consistent• Validate granularity reduction consistent

40 HBase Backups - HBaseCon 2014

Page 41: HBase Backups

Validation By Backup• Build up hash of hashes

– Two level Merkle Tree• Check that both DCs have the same hash

– Can easily identify differences per-manifest• Requires time-delay for backups

– <= replication delay

41 HBase Backups - HBaseCon 2014

Page 42: HBase Backups

Hash Validation

42 HBase Backups - HBaseCon 2014

Backup Manifest• chunk size• start time• end time • combined hash• version

Chunk Manifest

• key prefix• stats• hash

Chunk Manifest

• key prefix• stats• hash

Primary Data Center

Backup Manifest• chunk size• start time• end time • combined hash• version

Chunk Manifest

• key prefix• stats• hash

Chunk Manifest

• key prefix• stats• hash

Buddy Data Center

Mismatch!

Page 43: HBase Backups

Tracking Status• Daily emails• Progress stored in Phoenix Table• Easy access for auditing• Easy display for UI (coming soon)

43 HBase Backups - HBaseCon 2014

Page 44: HBase Backups

Future Work• Extensive tooling around per-tenant restore• M/R from snapshot

44 HBase Backups - HBaseCon 2014

Page 45: HBase Backups

Lessons Learned• Track Properties

– Version, table, lineage, etc• Fast Restore is Important

– Consider your business case• Validation!

45 HBase Backups - HBaseCon 2014

Page 46: HBase Backups

Special ThanksAll the members of the Salesforce HBase team,

particularly:Vasu Mariyala, Sukumar Maddineni, Alex Araujo, Lars

Hofhansl, Ian Varley, Santosh Rau

46 HBase Backups - HBaseCon 2014

Page 47: HBase Backups

Summary• Per-Table Backups• IBM

– WAL based– Extra tooling for fast restores– Extensive lineage tracking

• Salesforce– M/R over HTable– Multi-tenant– Multiple Validation vectors

47 HBase Backups - HBaseCon 2014

Page 48: HBase Backups

48 HBase Backups - HBaseCon 2014

Thanks!Questions?

Jesse Yates Demai NiJing He ChenRichard Ding