Download - HBase: Extreme makeover

Transcript
Page 1: HBase: Extreme makeover

HBase: Extreme makeover

Vladimir RodionovHadoop/HBase architectFounder of BigBase.org

HBaseCon 2014Features & Internal Track

Page 2: HBase: Extreme makeover

Agenda

Page 3: HBase: Extreme makeover

About myself• Principal Platform Engineer @Carrier IQ, Sunnyvale, CA • Prior to Carrier IQ, I worked @ GE, EBay, Plumtree/BEA.• HBase user since 2009.• HBase hacker since 2013.• Areas of expertise include (but not limited to) Java,

HBase, Hadoop, Hive, large-scale OLAP/Analytics, and in-memory data processing.

• Founder of BigBase.org

Page 4: HBase: Extreme makeover

What?

Page 5: HBase: Extreme makeover

BigBase

Page 6: HBase: Extreme makeover

BigBase = EM(HBase)

Page 7: HBase: Extreme makeover

BigBase = EM(HBase)

EM(*) = ?

Page 8: HBase: Extreme makeover

BigBase = EM(HBase)

EM(*) =

Page 9: HBase: Extreme makeover

BigBase = EM(HBase)

EM(*) =

Seriously?

Page 10: HBase: Extreme makeover

BigBase = EM(HBase)

EM(*) =

Seriously?for HBaseIt’s a Multi-Level Caching solution

Page 11: HBase: Extreme makeover

Real Agenda• Why BigBase?• Brief history of BigBase.org project• BigBase MLC high level architecture (L1/L2/L3)• Level 1 - Row Cache.• Level 2/3 - Block Cache RAM/SSD.• YCSB benchmark results• Upcoming features in R1.5, 2.0, 3.0.• Q&A

Page 12: HBase: Extreme makeover
Page 13: HBase: Extreme makeover

HBase

• Still lacks some original BigTable’s features.• Still not able to utilize efficiently all RAM. • No good mixed storage (SSD/HDD) support. • Single Level Caching only. Simple. • HBase + Large JVM Heap (MemStore) = ?

Page 14: HBase: Extreme makeover

BigBase

• Adds Row Cache and block cache compression.• Utilizes efficiently all RAM (TBs). • Supports mixed storage (SSD/HDD). • Has Multi Level Caching. Not that simple. • Will move MemStore off heap in R2.

Page 15: HBase: Extreme makeover

BigBase History

Page 16: HBase: Extreme makeover

Koda (2010)• Koda - Java off heap object cache, similar to

Terracotta’s BigMemory.• Delivers 4x times more transactions …• 10x times better latencies than BigMemory 4.• Compression (Snappy, LZ4, LZ4HC, Deflate).• Disk persistence and periodic cache snapshots.• Tested up to 240GB.

Page 17: HBase: Extreme makeover

Karma (2011-12)• Karma - Java off heap BTree implementation to support

fast in memory queries.• Supports extra large heaps, 100s millions – billions

objects.• Stores 300M objects in less than 10G of RAM.• Block Compression.• Tested up to 240GB.• Off Heap MemStore in R2.

Page 18: HBase: Extreme makeover

Yamm (2013)• Yet Another Memory Manager.– Pure 100% Java memory allocator.– Replaced jemalloc in Koda. – Now Koda is 100% Java.– Karma is the next (still on jemalloc).– Similar to memcached slab allocator.

• BigBase project started (Summer 2013).

Page 19: HBase: Extreme makeover

BigBase Architecture

Page 20: HBase: Extreme makeover

MLC – Multi-Level Caching

HBase 0.94

Disk

JVM

RA

M

LRUBlockCache

Page 21: HBase: Extreme makeover

MLC – Multi-Level Caching

HBase 0.94

Disk

JVM

RA

M

LRUBlockCache

HBase 0.96

Disk

JVM

RA

M

Bucket cache

One level of caching : • RAM (L2)

Page 22: HBase: Extreme makeover

MLC – Multi-Level Caching

HBase 0.94

Disk

JVM

RA

M

LRUBlockCache

HBase 0.96

Bucket cache

JVM

RA

M

One level of caching : • RAM (L2)• Or DISK (L3)

Page 23: HBase: Extreme makeover

MLC – Multi-Level Caching

HBase 0.94

Disk

JVM

RA

M

LRUBlockCache

HBase 0.96

Disk

JVM

RA

M

Bucket cache

BigBase 1.0

Block Cache L3SSD

JVM

RA

M

Row Cache L1

Block Cache L2

Page 24: HBase: Extreme makeover

MLC – Multi-Level Caching

HBase 0.94

Disk

JVM

RA

M

LRUBlockCache

HBase 0.96

Disk

JVM

RA

M

Bucket cache

BigBase 1.0

JVM

RA

M

Row Cache L1

Block Cache L2

BlockCache L3Network

Page 25: HBase: Extreme makeover

MLC – Multi-Level Caching

HBase 0.94

Disk

JVM

RA

M

LRUBlockCache

HBase 0.96

Disk

JVM

RA

M

Bucket cache

BigBase 1.0

JVM

RA

M

Row Cache L1

Block Cache L2

BlockCache L3memcached

Page 26: HBase: Extreme makeover

MLC – Multi-Level Caching

HBase 0.94

Disk

JVM

RA

M

LRUBlockCache

HBase 0.96

Disk

JVM

RA

M

Bucket cache

BigBase 1.0

JVM

RA

M

Row Cache L1

Block Cache L2

BlockCache L3DynamoDB

Page 27: HBase: Extreme makeover

BigBase Row Cache (L1)

Page 28: HBase: Extreme makeover

Where is BigTable’s Scan Cache?

• Scan Cache caches hot rows data. • Complimentary to Block Cache.• Still missing in HBase (as of 0.98). • It’s very hard to implement in Java (off heap).• Max GC pause is ~ 0.5-2 sec per 1GB of heap• G1 GC in Java 7 does not resolve the problem.• We call it Row Cache in BigBase.

Page 29: HBase: Extreme makeover

Row Cache vs. Block Cache

HFile Block HFile BlockHFile BlockHFile BlockHFile Block

Page 30: HBase: Extreme makeover

Row Cache vs. Block Cache

Page 31: HBase: Extreme makeover

Row Cache vs. Block Cache

BLOCK CACHE

ROW CACHE

Page 32: HBase: Extreme makeover

Row Cache vs. Block Cache

ROW CACHE

BLOCK CACHE

Page 33: HBase: Extreme makeover

Row Cache vs. Block Cache

ROW CACHE

BLOCK CACHE

Page 34: HBase: Extreme makeover

BigBase Row Cache

• Off Heap Scan Cache for HBase.• Cache size: 100’s of GBs to TBs. • Eviction policies: LRU, LFU, FIFO,

Random. • Pure 100% - compatible Java. • Sub-millisecond latencies, zero GC.• Implemented as RegionObserver

coprocessor.

Row Cache

YAMM Codecs Kryo SerDe

KODA

Page 35: HBase: Extreme makeover

BigBase Row Cache

• Read through cache. • It caches rowkey:CF. • Invalidates key on every mutation.• Can be enabled/disabled per table and

per table:CF.• New ROWCACHE attribute.• Best for small rows (< block size)

Row Cache

YAMM Codecs Kryo SerDe

KODA

Page 36: HBase: Extreme makeover

Performance-Scalability

• GET (small rows < 100 bytes): 175K operations per sec per one Region Server (from cache).

• MULTI-GET (small rows < 100 bytes): > 1M records per second (network limited) per one Region Server.

• LATENCY : 99% < 1ms (for GETs) with 100K ops.• Vertical scalability: tested up to 240GB (the maximum available

in Amazon EC2).• Horizontal scalability: limited by HBase scalability. • No more memcached farms in front of HBase clusters.

Page 37: HBase: Extreme makeover

BigBase Block Cache (L2, L3)

Page 38: HBase: Extreme makeover

What is wrong with Bucket Cache?

Scalability LIMITED

Multi-Level Caching (MLC) NOT SUPPORTED

Persistence (‘offheap’ mode) NOT SUPPORTED

Low latency apps NOT SUPPORTED

SSD friendliness (‘file’ mode) NOT FRIENDLY

Compression NOT SUPPORTED

Page 39: HBase: Extreme makeover

What is wrong with Bucket Cache?

Scalability LIMITED

Multi-Level Caching (MLC) NOT SUPPORTED

Persistence (‘offheap’ mode) NOT SUPPORTED

Low latency apps NOT SUPPORTED

SSD friendliness (‘file’ mode) NOT FRIENDLY

Compression NOT SUPPORTED

Page 40: HBase: Extreme makeover

What is wrong with Bucket Cache?

Scalability LIMITED

Multi-Level Caching (MLC) NOT SUPPORTED

Persistence (‘offheap’ mode) NOT SUPPORTED

Low latency apps NOT SUPPORTED

SSD friendliness (‘file’ mode) NOT FRIENDLY

Compression NOT SUPPORTED

Page 41: HBase: Extreme makeover

What is wrong with Bucket Cache?

Scalability LIMITED

Multi-Level Caching (MLC) NOT SUPPORTED

Persistence (‘offheap’ mode) NOT SUPPORTED

Low latency apps ?

SSD friendliness (‘file’ mode) NOT FRIENDLY

Compression NOT SUPPORTED

Page 42: HBase: Extreme makeover

What is wrong with Bucket Cache?

Scalability LIMITED

Multi-Level Caching (MLC) NOT SUPPORTED

Persistence (‘offheap’ mode) NOT SUPPORTED

Low latency apps NOT SUPPORTED

SSD friendliness (‘file’ mode) NOT FRIENDLY

Compression NOT SUPPORTED

Page 43: HBase: Extreme makeover

What is wrong with Bucket Cache?

Scalability LIMITED

Multi-Level Caching (MLC) NOT SUPPORTED

Persistence (‘offheap’ mode) NOT SUPPORTED

Low latency apps NOT SUPPORTED

SSD friendliness (‘file’ mode) NOT FRIENDLY

Compression NOT SUPPORTED

Page 44: HBase: Extreme makeover

Here comes BigBase

Scalability HIGH

Multi-Level Caching (MLC) SUPPORTED

Persistence (‘offheap’ mode) SUPPORTED

Low latency apps SUPPORTED

SSD friendliness (‘file’ mode) SSD-FRIENDLY

Compression SNAPPY, LZ4, LZHC, DEFLATE

Page 45: HBase: Extreme makeover

Here comes BigBase

Scalability HIGH

Multi-Level Caching (MLC) SUPPORTED

Persistence (‘offheap’ mode) SUPPORTED

Low latency apps SUPPORTED

SSD friendliness (‘file’ mode) SSD-FRIENDLY

Compression SNAPPY, LZ4, LZHC, DEFLATE

Page 46: HBase: Extreme makeover

Here comes BigBase

Scalability HIGH

Multi-Level Caching (MLC) SUPPORTED

Persistence (‘offheap’ mode) SUPPORTED

Low latency apps SUPPORTED

SSD friendliness (‘file’ mode) SSD-FRIENDLY

Compression SNAPPY, LZ4, LZHC, DEFLATE

Page 47: HBase: Extreme makeover

Here comes BigBase

Scalability HIGH

Multi-Level Caching (MLC) SUPPORTED

Persistence (‘offheap’ mode) SUPPORTED

Low latency apps SUPPORTED

SSD friendliness (‘file’ mode) SSD-FRIENDLY

Compression SNAPPY, LZ4, LZHC, DEFLATE

Page 48: HBase: Extreme makeover

Here comes BigBase

Scalability HIGH

Multi-Level Caching (MLC) SUPPORTED

Persistence (‘offheap’ mode) SUPPORTED

Low latency apps SUPPORTED

SSD friendliness (‘file’ mode) SSD-FRIENDLY

Compression SNAPPY, LZ4, LZHC, DEFLATE

Page 49: HBase: Extreme makeover

Here comes BigBase

Scalability HIGH

Multi-Level Caching (MLC) SUPPORTED

Persistence (‘offheap’ mode) SUPPORTED

Low latency apps SUPPORTED

SSD friendliness (‘file’ mode) SSD-FRIENDLY

Compression SNAPPY, LZ4, LZHC, DEFLATE

Page 50: HBase: Extreme makeover

Wait, there are more …

Scalability HIGHMulti-Level Caching (MLC) SUPPORTEDPersistence (‘offheap’ mode) SUPPORTEDLow latency apps SUPPORTEDSSD friendliness (‘file’ mode) SSD-FRIENDLYCompression SNAPPY, LZ4, LZHC, DEFLATENon disk–based L3 cache SUPPORTEDRAM Cache optimization IBCO

Page 51: HBase: Extreme makeover

Wait, there are more …

Scalability HIGHMulti-Level Caching (MLC) SUPPORTEDPersistence (‘offheap’ mode) SUPPORTEDLow latency apps SUPPORTEDSSD friendliness (‘file’ mode) SSD-FRIENDLYCompression SNAPPY, LZ4, LZHC, DEFLATENon disk–based L3 cache SUPPORTEDRAM Cache optimization IBCO

Page 52: HBase: Extreme makeover

BigBase 1.0 vs. HBase 0.98

BigBase HBase 0.98

Row Cache (L1) YES NO

Block Cache RAM (L2) YES (fully off heap) YES (partially off heap)

Block Cache (L3) DISK YES (SSD- friendly) YES (not SSD – friendly)

Block Cache (L3) NON DISK YES NO

Compression YES NO

RAM Cache persistence YES (both L1 and L2) NO

Low Latency optimized YES NO

MLC support YES (L1, L2, L3) NO (either L2 or L3)

Scalability HIGH MEDIUM (limited by JVM heap)

Page 53: HBase: Extreme makeover

YCSB Benchmark

Page 54: HBase: Extreme makeover

Test setup (AWS)

• HBase 0.94.15 – RS: 11.5GB heap (6GB LruBlockCache on heap); Master: 4GB heap.

• Clients: 5 (30 threads each), collocated with Region Servers.

• Data sets: 100M and 200M. 120GB / 240GB approximately. Only 25% fits in a cache.

• Workloads: 100% read (read100, read200, hotspot100), 100% scan (scan100, scan200) –zipfian.

• YCSB 0.1.4 (modified to generate compressible data). We generated compressible data (with factor of 2.5x) only for scan workloads to evaluate effect of compression in BigBase block cache implementation.

• Common – Whirr 0.8.2; 1 (Master + Zk) + 5 RS; m1.xlarge: 15GB RAM, 4 vCPU, 4x420 HDD

• BigBase 1.0 (0.94.15) – RS: 4GB heap (6GB off heap cache); Master: 4GB heap.

• HBase 0.96.2 – RS: 4GB heap (6GB Bucket Cache off heap); Master: 4GB heap.

Page 55: HBase: Extreme makeover

Test setup (AWS)

• HBase 0.94.15 – RS: 11.5GB heap (6GB LruBlockCache on heap); Master: 4GB heap.

• Clients: 5 (30 threads each), collocated with Region Servers.

• Data sets: 100M and 200M. 120GB / 240GB approximately. Only 25% fits in a cache.

• Workloads: 100% read (read100, read200, hotspot100), 100% scan (scan100, scan200) –zipfian.

• YCSB 0.1.4 (modified to generate compressible data). We generated compressible data (with factor of 2.5x) only for scan workloads to evaluate effect of compression in BigBase block cache implementation.

• Common – Whirr 0.8.2; 1 (Master + Zk) + 5 RS; m1.xlarge: 15GB RAM, 4 vCPU, 4x420 HDD

• BigBase 1.0 (0.94.15) – RS: 4GB heap (6GB off heap cache); Master: 4GB heap.

• HBase 0.96.2 – RS: 4GB heap (6GB Bucket Cache off heap); Master: 4GB heap.

Page 56: HBase: Extreme makeover

Benchmark results (RPS)

BigBase R1.0 HBase 0.96.2 HBase 0.94.150

2000

4000

6000

8000

10000

12000

14000

16000

11405

6123 55536265

4086 3850

15150

3512 28553224

1500709820 434 228

read100read200hotspot100scan100scan200

Page 57: HBase: Extreme makeover

Average latency (ms)

BigBase R1.0 HBase 0.96.2 HBase 0.94.150

100

200

300

400

500

600

700

800

13 24 2723 36 3910 44 5248102

223187

375

700

read100read200hotspot100scan100scan200

Page 58: HBase: Extreme makeover

95% latency (ms)

BigBase R1.0 HBase 0.96.2 HBase 0.94.150

100200300400500600700800900

1000

51 91 10088 124 13838

152197175

405

950

729

read100read200hotspot100scan100scan200

Page 59: HBase: Extreme makeover

99% latency (ms)

BigBase R1.0 HBase 0.96.2 HBase 0.94.150

100

200

300

400

500

600

700

800

900

133190 213225

304 338

111

554632

367

811

read100read200hotspot100scan100scan200

Page 60: HBase: Extreme makeover

YCSB 100% Read

BigBase R1.0 HBase 0.94.150

5001000150020002500300035004000 3621

1308

2281

11111253770

Per Server

50M 100M 200M

• 50M = 2.77X• 100M = 2.05X• 200M = 1.63X• 50M = 40% fits cache• 100M = 20% fits cache• 200M = 10% fits cache• What is the maximum?

Page 61: HBase: Extreme makeover

YCSB 100% Read

BigBase R1.0 HBase 0.94.150

5001000150020002500300035004000 3621

1308

2281

11111253770

Per Server

50M 100M 200M

• 50M = 2.77X• 100M = 2.05X• 200M = 1.63X• 50M = 40% fits cache• 100M = 20% fits cache• 200M = 10% fits cache• What is the maximum?• ~ 75X (hotspot 2.5/100)• 56K (BB) vs. 750 (HBase)• 100% in cache

Page 62: HBase: Extreme makeover

All data in cache

• Setup: BigBase 1.0, 48G RAM, (8/16) CPU cores – 5 nodes (1+ 4)

• Data set: 200M (300GB) • Test: Read 100%, hotspot• YCSB 0.1.4 – 4 clients• 40 threads – 100K• 100 threads – 168K• 200 threads – 224K• 400 threads - 262K

100,000 168,000 224,000 262,000

99%

1 2 3 7

95%

1 1 2 3

avg 0.4 0.6 0.9 1.5

0.52.54.56.5

Hotspot (2.5/100 – 200M data)La

tenc

y (m

s)

Page 63: HBase: Extreme makeover

All data in cache

• Setup: BigBase 1.0, 48G RAM, (8/16) CPU cores – 5 nodes (1+ 4)

• Data set: 200M (300GB) • Test: Read 100%, hotspot• YCSB 0.1.4 – 4 clients• 40 threads – 100K• 100 threads – 168K• 200 threads – 224K• 400 threads - 262K

100,000 168,000 224,000 262,000

99%

1 2 3 7

95%

1 1 2 3

avg 0.4 0.6 0.9 1.5

0.52.54.56.5

Hotspot (2.5/100 – 200M data)La

tenc

y (m

s)

100K ops: 99% < 1ms

Page 64: HBase: Extreme makeover

What is next?

• Release 1.1 (2014 Q2)– Support HBase 0.96, 0.98, trunk– Fully tested L3 cache (SSD)

• Release 1.5 (2014 Q3)– YAMM: memory allocator compacting mode .– Integration with Hadoop metrics.– Row Cache: merge rows on update (good for counters).– Block Cache: new eviction policy (LRU-2Q).– File read posix_fadvise ( bypass OS page cache).– Row Cache: make it available for server-side apps

Page 65: HBase: Extreme makeover

What is next?

• Release 2.0 (2014 Q3)– HBASE-5263: Preserving cache data on compaction– Cache data blocks on memstore flush (configurable). – HBASE-10648: Pluggable Memstore. Off heap implementation,

based on Karma (off heap BTree lib).• Release 3.0 (2014 Q4)

– Real Scan Cache – caches results of Scan operations on immutable store files.

– Scan Cache integration with Phoenix and with other 3rd party libs provided rich query API for HBase.

Page 66: HBase: Extreme makeover

Download/Install/Uninstall• Download BigBase 1.0 from www.bigbase.org• Installation/upgrade takes 10-20 minutes• Beatification operator EM(*) is invertible:

HBase = EM-1(BigBase) (the same 10-20 min)

Page 67: HBase: Extreme makeover

Q & A

Vladimir RodionovHadoop/HBase architectFounder of BigBase.org

HBase: Extreme makeoverFeatures & Internal Track