HBase: Where Online Meets Low Latency

42
HBase Low Latency Nick Dimiduk, Hortonworks (@xefyr) Nicolas Liochon, Scaled Risk (@nkeywal) HBaseCon May 5, 2014

description

Speakers: Nick Dimiduk (Hortonworks) and Nicolas Liochon (Scaled Risk) HBase is an online database so response latency is critical. This talk will examine sources of latency in HBase, detailing steps along the read and write paths. We'll examine the entire request lifecycle, from client to server and back again. We'll also look at the different factors that impact latency, including GC, cache misses, and system failures. Finally, the talk will highlight some of the work done in 0.96+ to improve the reliability of HBase.

Transcript of HBase: Where Online Meets Low Latency

Page 1: HBase: Where Online Meets Low Latency

HBase Low Latency

Nick Dimiduk, Hortonworks (@xefyr)Nicolas Liochon, Scaled Risk (@nkeywal)

HBaseCon May 5, 2014

Page 2: HBase: Where Online Meets Low Latency

Agenda• Latency, what is it, how to measure it

• Write path

• Read path

• Next steps

Page 3: HBase: Where Online Meets Low Latency

What’s low latency

Latency is about percentiles• Long tail issue• There are often order of magnitudes between « average » and « 95

percentile »• Post 99% = « magical 1% ». Work in progress here.

• Meaning from micro seconds (High Frequency Trading) to seconds (interactive queries)• In this talk milliseconds

Page 4: HBase: Where Online Meets Low Latency

Measure latency – during test

bin/hbase org.apache.hadoop.hbase.PerformanceEvaluation• More options related to HBase: autoflush, replicas, …• Latency measured in micro second• Easier for internal analysis

• YCSB• Useful for comparison between tools • Set of workload already defined

Page 5: HBase: Where Online Meets Low Latency

Measure latency : Exposed by HBase

"QueueCallTime_num_ops" : 33044, "QueueCallTime_min" : 0, "QueueCallTime_max" : 86, "QueueCallTime_mean" : 0.2525420651252875, "QueueCallTime_median" : 0.0, "QueueCallTime_75th_percentile" : 0.0, "QueueCallTime_95th_percentile" : 1.0, "QueueCallTime_99th_percentile" : 1.0,

a

"SyncTime_num_ops" : 379081, "SyncTime_min" : 0,"SyncTime_max" : 865, "SyncTime_mean" : 3.0293341000999785, "SyncTime_median" : 2.0, "SyncTime_75th_percentile" : 3.0, "SyncTime_95th_percentile" : 4.0, "SyncTime_99th_percentile" : 253.5899999999999,

Page 6: HBase: Where Online Meets Low Latency

HBase write path – high level

Page 7: HBase: Where Online Meets Low Latency

Deeper in the write path• Two parts• Single put (WAL)

• The client just sends the put• Multiple puts from the client (new behavior since 0.96)

• The client is much smarter

• Four stages to look at for latency• Start (establish tcp connections, etc.)• Steady: when expected conditions are met• Machine failure: expected as well• Overloaded system: you may need to add machines or tune your workload

Page 8: HBase: Where Online Meets Low Latency

Single put: communication• Create a « Call » object, with an id, as queries are multiplexed• protobuf it• tcp write (in trunk it can be queued for a separate thread as well)• Wait for the answer• Separate thread, separate queue

• unprotobuf the answer

• Implies locks and multiple threads communicating with queues

Page 9: HBase: Where Online Meets Low Latency

Single put: server side scheduling• Threads to receives « Call »• Threads to handle the call execution• Threads to write the answer on the wire

• Multiple threads, communicating with queues

Page 10: HBase: Where Online Meets Low Latency

Single put: real work

• The server must• Take a row lock (HBase strong consistency)• Write into the WAL queue• Write into the memstore• Sync the queue (HDFS flush)• Free the lock

• WALs queue is shared between all the regions/handlers• Sync is avoided if another handlers did the work• You may flush more than expected

Page 11: HBase: Where Online Meets Low Latency

Latency sources• Candidate one: network

• 0.5ms within a datacenter.

• Candidate two: HDFS Flush

• Millisecond world: everything can go wrong• Network• OS Scheduler• All this goes into the post 99% percentile

Metric Time in msMean 0.33

50% 0.26

95% 0.59

99% 1.24

Page 12: HBase: Where Online Meets Low Latency

Latency sources• Split (and presplits)

• Autosharding is great!• Puts have to wait• Impacts: seconds

• Balance• Regions move• Triggers a retry for the client

• hbase.client.pause = 100ms since HBase 0.96

• Garbage Collection• Impacts: 10’s of ms, even with a good config• Covered with the read path of this talk

Page 13: HBase: Where Online Meets Low Latency

From steady to loaded and oveloaded• Number of concurrent tasks is a factor of

• Number of cores• Number of disks• Number of remote machines used

• Difficult to estimate• Queues are doomed to happen

• So for low latency• Specific Scheduler since Hbase 0.98 (HBASE-8884). Requires specific code.• Priorities: work in progress.

Page 14: HBase: Where Online Meets Low Latency

Loaded & overloaded• Step 1: Loaded system

• Tasks are queued: creates latency• Specific metric in HBase

• Step 2: Limit reached• MemStore takes too much room: blocks until it’s flushed

• hbase.regionserver.global.memstore.size.lower.limit• hbase.regionserver.global.memstore.size• hbase.hregion.memstore.block.multiplier

• Too many Hfiles: blocks until compations keeps up• hbase.hstore.blockingStoreFiles

• Too many WALs files• Don’t change this

Page 15: HBase: Where Online Meets Low Latency

Machine failure• Failure• Dectect• Reallocate• Replay WAL

• Replaying WAL is NOT required for puts

• Failure = Dectect + Reallocate + Retry• That’s in the range of ~1s for simple failures• Silent failures leads puts you in the 10s range if the hardware does not help

Page 16: HBase: Where Online Meets Low Latency

Single puts

• Millisecond range• Spikes do happen in steady mode• 100ms• Causes: GC, load, splits

Page 17: HBase: Where Online Meets Low Latency

Streaming puts

Htable#setAutoFlushTo(false)Htable#putHtable#flushCommit

Page 18: HBase: Where Online Meets Low Latency

Streaming puts• Write into a buffer• When the buffer is full, in the background• Select the puts that matches load conditions• Send them• Manage retries and delay

• The buffer is freed for other client operations• Blocks only if there is an a not retryable error or if the buffer is full

Page 19: HBase: Where Online Meets Low Latency

Multiple puts• hbase.client.max.total.tasks (default 100)• hbase.client.max.perserver.tasks (default 5)• hbase.client.max.perregion.tasks (default 1)

• Decouple the client from a latency peak of a region server• Increase the throughput by 50%• Does not solve the problem of an unbalanced cluster• But makes split and GC more transparent

Page 20: HBase: Where Online Meets Low Latency

Conclusion on write path• Single puts can be very fast• It’s not a « hard real time » system: there are peaks

• Latency peaks can be hidden when streaming puts• Including autosplits

Page 21: HBase: Where Online Meets Low Latency

And now for the read path

Page 22: HBase: Where Online Meets Low Latency

HBase read path – high level

Page 23: HBase: Where Online Meets Low Latency

Deeper in the read path• Get/short scan are assumed for low-latency operations• Again, two APIs• Single get: HTable#get(Get)• Multi-get: HTable#get(List<Get>)

• Four stages, same as write path• Start (tcp connection, …)• Steady: when expected conditions are met• Machine failure: expected as well• Overloaded system: you may need to add machines or tune your workload

Page 24: HBase: Where Online Meets Low Latency

Multi get / Client

Page 25: HBase: Where Online Meets Low Latency

Multi get / ClientGroup Gets byRegionServer

Page 26: HBase: Where Online Meets Low Latency

Multi get / Client

Execute themone by one

Page 27: HBase: Where Online Meets Low Latency

Multi get / Server

Page 28: HBase: Where Online Meets Low Latency

Multi get / Server

http://hadoop-hbase.blogspot.com/2012/05/hbasecon.html

Page 29: HBase: Where Online Meets Low Latency

Access latency magnides

Dean/2009

Memory is 100000xfaster than disk!

Disk seek = 10ms

Page 30: HBase: Where Online Meets Low Latency

Known unknowns• For each candidate HFile• Exclude by file metadata

• Timestamp• Rowkey range

• Exclude by bloom filter

• StoreFileManager (0.96, HBASE-7678)

StoreFileScanner#shouldUseScanner()

Page 31: HBase: Where Online Meets Low Latency

Unknown knowns• Merge sort results polled from Stores• Seek each scanner to a reference KeyValue• Retrieve candidate data from disk

• Multiple HFiles => mulitple seeks• hbase.storescanner.parallel.seek.enable=true

• Short Circuit Reads• dfs.client.read.shortcircuit=true

• Block locality• Happy clusters compact!

HFileBlock#readBlockData()

Page 32: HBase: Where Online Meets Low Latency

Remembered knowns: BlockCache• Reuse previously read data• Smaller BLOCKSIZE => better utilization• TODO: compression (HBASE-8894)

BlockCache#getBlock()

Page 33: HBase: Where Online Meets Low Latency

BlockCache Showdown• LruBlockCache• Quite good most of the time• < 30 GB

• BucketCache• Offheap alternative• > 30 GB

http://www.n10k.com/blog/blockcache-showdown/

Page 34: HBase: Where Online Meets Low Latency

Latency enemies: Compactions• Fewer HFiles => fewer seeks

• Evict data blocks!• Evict Index blocks!!

• hfile.block.index.cacheonwrite• Evict bloom blocks!!!

• hfile.block.bloom.cacheonwrite

• OS buffer cache to the rescue• Compactected data is still fresh• Better than going all the way back to disk

Page 35: HBase: Where Online Meets Low Latency

Latency enemies: Garbage Collection

• Use Heap. Not too much. With CMS.• Max heap: 30GB, probably less• Healthy cluster load• regular, reliable collections• 25-100ms pause on regular interval

• Overloaded RegionServer suffers GC overmuch

Page 36: HBase: Where Online Meets Low Latency

Off-heap to the rescue?

• BucketCache (0.96, HBASE-7404)• Network interfaces (HBASE-9535)• MemStore et al (HBASE-10191)

Page 37: HBase: Where Online Meets Low Latency

Failure• Machine failure

• Detect + Reallocate + Replay

• Strong consistency requires replay

• Cache starts from scratch

Page 38: HBase: Where Online Meets Low Latency

Read latency in summary• Steady mode

• Cache hit: < 1 ms• Cache miss: + 10 ms per seek• Writing while reading: cache churn• GC: 25-100ms pause on regular interval

Network request + (1 - P(cache hit)) * 10 ms

• Same long tail issues as write• Overloaded: same scheduling issues as write• Partial failures hurt a lot

Page 39: HBase: Where Online Meets Low Latency

Hedging our bets• HDFS Hedged reads (since HDFS 2.4)• Strongly consistent• Works at the HDFS level

• Timeline consistency (HBASE-10070)• Reads on secondary regions

• If a region does not answer quickly enough, go to another one

• Not strongly consistent• Helps a lot latency for read path.

Page 40: HBase: Where Online Meets Low Latency

HBase ranges for 99% latency

Put Streamed Multiput Get Timeline get

Steady milliseconds milliseconds milliseconds milliseconds

Failure seconds seconds seconds milliseconds

GC10’s of milliseconds milliseconds

10’s of milliseconds milliseconds

Page 41: HBase: Where Online Meets Low Latency

What’s next• Less GC

• Use less objects• Offheap

• Prefered location (HBASE-4755)

• The « magical 1% »• Most tools stops at the 99% latency

• YCSB for example• What happens after is much more complex

• But key to improve average

Page 42: HBase: Where Online Meets Low Latency

Thanks!Nick Dimiduk, Hortonworks (@xefyr)

Nicolas Liochon, Scaled Risk (@nkeywal)

HBaseCon May 5, 2014