Apache HBase Internals you hoped you Never Needed to Understand

24
Apache HBase Internals you Hoped you Never Needed to Understand Josh Elser Future of Data, NYC 2016/10/11

Transcript of Apache HBase Internals you hoped you Never Needed to Understand

Page 1: Apache HBase Internals you hoped you Never Needed to Understand

Apache HBase Internals you Hoped you Never Needed to UnderstandJosh ElserFuture of Data, NYC2016/10/11

Page 2: Apache HBase Internals you hoped you Never Needed to Understand

2 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Engineer at Hortonworks, Member of the Apache Software Foundation

Top-Level Projects• Apache Accumulo®• Apache Calcite™• Apache Commons ™• Apache HBase ®• Apache Phoenix ™

ASF Incubator• Apache Fluo ™• Apache Gossip ™• Apache Pirk ™• Apache Rya ™• Apache Slider ™

These Apache project names are trademarks or registeredtrademarks of the Apache Software Foundation.

Page 3: Apache HBase Internals you hoped you Never Needed to Understand

3 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache HBase for storing your data!

CC BY 3.0 US: http://hbase.apache.org/

Page 4: Apache HBase Internals you hoped you Never Needed to Understand

4 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

What happens when things go wrong?

CC BY-ND 2.0: https://www.flickr.com/photos/widnr/6588151679

Page 5: Apache HBase Internals you hoped you Never Needed to Understand

5 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

The BigTable Architecture

BigTable’s architecture is simple

Debugging a distributed system is not simple

How can we break down a complex system?

How do we write resilient software?

• Log-Structured Merge Tree• Write-Ahead Logs• Distributed Coordination• Row-based, Auto-Sharding• Strong Consistency• Read Isolation• Coprocessors• Security (AuthN/AuthZ)• Backups

Page 6: Apache HBase Internals you hoped you Never Needed to Understand

6 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Naming Conventions

Servers– Hostname, Port, and Timestamp– RegionServer: r01n01.domain.com,16201,1475691463147– Master: r02n01.domain.com,16000,1475691462616

Regions– Table, Start RowKey, Region ID (timestamp), Replica ID, Encoded name– T1,\x04\x00\x00,1470324608597.c04d94cd4ee9797da2fb906b4dcd2e3c.– Or simply c04d94cd4ee9797da2fb906b4dcd2e3c

Page 7: Apache HBase Internals you hoped you Never Needed to Understand

7 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Regions

A sorted “shard” of a table At least one “column family”

– Physical partitions

Each family can have zero to many files Hosted by at most one RegionServer

– Can have many hosting RS’s for reads

In-memory locks for certain intra-row operations

Page 8: Apache HBase Internals you hoped you Never Needed to Understand

8 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Region Assignment

Coordinated by the HBase Master A Region must only be hosted by one RegionServer State tracked in hbase:meta

– hbck to fix issues

Region splits/merges make a hard problem even harder Moving towards ProcedureV2

Closed Offline Opening OpenPending Open

Normal Region Assignment States

Page 9: Apache HBase Internals you hoped you Never Needed to Understand

9 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

The File System

HDFS “Compatible”– Distributed, durable, ”write leases”

Physical storage of HBase Tables (HFiles) Write-ahead logs A parent directory in that FileSystem (hbase.rootdir)

Page 10: Apache HBase Internals you hoped you Never Needed to Understand

10 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

The File SystemPhysical Separation by HBase Namespace/hbase/data//hbase/data/default/<table1>/hbase/data/default/.tabledesc/.tableinfo…/hbase/data/default/<table2>/<region_id1>/hbase/data/default/<table2>/<region_id2>/hbase/data/my_custom_ns/<table3>/…/hbase/data/hbase/meta/…/hbase/archive/…

/hbase/WALs/<regionserver_name>/…/hbase/oldWALs/…/hbase/corrupt/…

Page 11: Apache HBase Internals you hoped you Never Needed to Understand

11 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

The File System for one Region

/hbase/data/default/<table2>/<region_id1>

…/.regioninfo…/.tmp…/<family1>/<hfile>…/<family1>/<hfile>…/<family2>/<hfile>…/<family3>/<hfile>…/recovered.edits/<number>.seqid

Page 12: Apache HBase Internals you hoped you Never Needed to Understand

12 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Writes into HBase

Mutations inserted into sorted in-memory structure and WAL– Fast lookups of recent data– Append-only log for durability and speed

Mutations are collected by destination Region Beware of hot-spotting Data in memory eventually flush’ed into sorted (H)files

Page 13: Apache HBase Internals you hoped you Never Needed to Understand

13 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Compactions and Flushes

Flush: Taking Key-Values from the In-Memory map and creating an HFile Minor Compaction: Rewriting a subset of HFiles for a Region into one HFile Major Compaction: Rewriting all HFiles for a Region into one HFile

Compactions balance improved query performance with cost of rewriting data– Compactions are good!– Must understand SLA’s to properly tune compactions

Page 14: Apache HBase Internals you hoped you Never Needed to Understand

14 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Reads into HBase

Merge-Sort over multiple streams of data– Memory– Disk (many files)

hbase:meta is the definitive source of where to find Regions

RowKey Region

hbase:meta

RegionServer

ZooKeeper

Page 15: Apache HBase Internals you hoped you Never Needed to Understand

15 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache ZooKeeper™

Distributed coordination is really hard Obvious use cases

– Service Discovery– Cluster Membership– “Root Table”

Non-obvious use cases– Assignment (sometimes)– Region Recovery– WAL Splitting– Cluster Replication– Distributed Procedures– HBase Snapshots

Apache ZooKeeper is a trademark of the Apache Software Foundation

Page 16: Apache HBase Internals you hoped you Never Needed to Understand

16 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Apache ZooKeeper™ Discovery/Leader ZNodes

– /hbase/rs/…– /hbase/master/…– /hbase/backup-masters/…

Consensus– /hbase/splitWAL/…– /hbase/flush-table-proc/...– /hbase/table-lock/...– /hbase/region-in-transition/...– /hbase/recovering-regions/...

Page 17: Apache HBase Internals you hoped you Never Needed to Understand

17 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Distributed Procedures

Resiliency in an unreliable system– How do we create a table?

“Procedure V2”– Resilient, finite state machine

HBase operations represented as ”procedures”

Clients are agnostic of Master state– Clients track procedure state

https://issues.apache.org/jira/secure/attachment/12679960/ProcedureV2.pdf

Page 18: Apache HBase Internals you hoped you Never Needed to Understand

18 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Distributed Procedures

Procedures are durable via Write-Ahead Log– /hbase/MasterProcWALs/…

Procedures only executed by the active HBase Master Reusable framework for the future

Page 19: Apache HBase Internals you hoped you Never Needed to Understand

19 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HBase RPCs

Internal and External HBase Communication

Half-Sync/Half-Async Model Many knobs to tweak

Listener Readers Scheduler Call Queues Call Runners/Handlers

Overview Components

Page 20: Apache HBase Internals you hoped you Never Needed to Understand

20 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

HBase RPCs

Listener

Reader

Reader

Reader

Reader

Scheduler

Call Queues Handlers

Priority

Read

Write

Replication

Request to Execution

Page 21: Apache HBase Internals you hoped you Never Needed to Understand

21 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Disaster Recovery

Multiple tools to ensure copies of data in the face of catastrophic failure CopyTable

– MapReduce job which reads all data from a source, writing to destination

Snapshots– A collection of Regions, their HFiles, and metadata

Backup & Restore– HBASE-7912, current targeted for HBase-2.0.0– Incremental and full backup/restore

Page 22: Apache HBase Internals you hoped you Never Needed to Understand

22 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Kerberos

Strong authentication for untrusted networks ”Standard” across Apache Hadoop and friends Requirements:

– Forward/Reverse DNS– Unlimited Strength Java Cryptography Extension

SASL used to build RPC systems “Practical Kerberos with Apache HBase” https://goo.gl/y0d9ZO

Page 23: Apache HBase Internals you hoped you Never Needed to Understand

23 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Finding an Hypothesis

Logs logs logs Application and System

Metrics exposed by JMX Graphing solutions

– Ambari Metrics Server + Grafana

Page 24: Apache HBase Internals you hoped you Never Needed to Understand

24 © Hortonworks Inc. 2011 – 2016. All Rights Reserved

Thank [email protected] / [email protected]