Download - Hbase Operations At facebook

Hbase OperationsAt facebook

Paul TuckfieldJanuary 2012

HBase Operations

The Hbase cells▪ Many hbase cells

▪ 3 versions, several minor branches/revs

▪ Mostly uniform host types

▪ Varying network topologies/rack topologies

▪ Varying sizes

▪ We (Ryan, Alex and me) are the “DBAs” or “SREs”of hbase at facebook

▪ Moving towards slightly more differentiation of roles for teams at facebook as hbase effort matures

The Use cases: some live some not▪ Titan (user facing messaging)

▪ Facebook specific time series▪ Puma (user facing stats)▪ ODS (system metrics)Hashout

▪ Eris “multi tennant” “dormitory” for incubation of new projects

▪ CDB : a few use cases replacing what would have been on smallish sharded mysql setups

▪ ODS-Hbase: facebook instrumetnation and alerting system, currently on mysql

▪ prototype/testing of general user data on hbase

We have some important use cases running Hbase, but are small compared to what is running in MySQL and Hadoop. That said, there are some critical use cases, and the fraction of very large facebook environment is still pretty large.

SMC / HSH : basic facebook “cloud” tools used for HBASE• SMC:

• User defined sets of host:port “services”

• Arbitrary metadata

• Machine states (enabled,disabled)

• HSH

• Better version of dsh

• Integration with SMC

Other examples besides deploy:

- Cluster start/stop

- Autostart

- Scan ports

- Scan logs

Deploy: push slaves info to smc, use smc/hsh to push code to hosts that make up the cell

SMC

HBASE

SVN/Git

Deploy, toolUtility, whatever

HBase Maintenance“It’s self-healing”▪ Backups

▪ Stage 1,2,3▪ Repairs

▪ FBAR▪ Upgrades

▪ Rolling, cold▪ Rack concerns

Attempt to standardize bandwidth/rack dispersion tradeoffsRunning on several different generations of network core/rackswitch combo, some slow some fast

Rack oriented would have better intra cell performance in worst case situations (not uncommon)

“horizontally” organized hopefully can survive single rack issues

I’m not so sure it’s a good thing: Network is pretty reliable, why emphasize uplink failure tolerance. maybe we should have sharded hbase setups

2 cells of 40 hosts each, spread across 5 racks rather than “vertical”

Cell 1

Cell 2

Spares

Things we monitor/alertMonitor hundreds of variables in ODS, the facebook timeseries database

Alert /SMS on:

• Hbck failures

• Dfs fsck failures

• Probe / scan a table from client

• Thruput rates in some cases

• Most application alarms left to other teams in an attempt to be relatively generic service to the rest of facebook

TroubleshootingTypical problems ▪ Regionserver/Slave apocalypse

▪ fsck inconsistencies

▪ hbck inconsistencies ▪ Long recoveries/timeouts after failures▪ Wedged regions/meta info

▪ Log splitting during recovery

▪ Memory /thread exhausted -> regionserver deaths

▪ GC pauses , tuning related deaths

▪ Rackswitch bandwidth related issues

Setting up Hbase ClustersDoing all the things▪ HBase versions, 0.89 vs 0.92▪ Rack and host selection▪ Imaging and partitioning▪ Populating SMC tiers▪ Building from templates▪ Pushing▪ Starting up everything!

Tools use $CELLNAME env varTypical session

Run “setcell” to set environ, all subsequent commands are “pointed at” the given hbase cell

- Hbscan to see status of hosts in that cell

- Hblog to look at logs

- Hbprocess (like showprocess)

- Etc.

Typical operations: setcell/hbhostTypically start with “setcell”

Hbhost just shows what is in SMC for this cell

“hbhost nn” or “hbhost master” to ssh to the given host without caring about hostnames.

Hbscan : python “nmap” like scanHbscan to get a quick impression of the state of the cell

Queries SMC for topology

Scans all hosts for all known ports (tcp connect )

Takes a few seconds

Hblog: “normalize” and summarize loglinesAttempt to remove entropy to get to “core” message

Fingerprint with md5

Summarize by md5/host

Columns -> clusterwide errors

Rows-> this particular node is jacked

Observation: Cluster is as slow as the slowest regionserverCommon pattern is to ingest data and multiput to hbase from many frontends

The larger the multiput, the more likely clients will serialize/collide on a hot regionserver

Don’t look at the average . . Look at the average *and* the outliers

But which metric?

(imagine lines drawn from every box to every can)

Observation: evolution/selection of balanceIn a few cases performance issues or bugs relating to load cause hosts to crash

When crash happens regions move around

A new “hand is drawn” with different combintations of regions

When combination of regions is such that there’s no death . . Balanced!

Observation: balancing could be much better• In cases where skew seems to dominate we’ve

experimented with manual region placement /splitting

• Developed basic jruby/groovy scripts using HBaseAdmin

• Maybe support ‘user space’ balancers