CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA...

20
FP7-257993 Filesystem Layer & DeltaOMID Ivan Kelly, Yahoo! Inc Francisco Perez-Sorrosal, Yahoo! Inc. CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013

Transcript of CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA...

Page 1: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

Filesystem Layer & DeltaOMID

Ivan Kelly, Yahoo! Inc

Francisco Perez-Sorrosal, Yahoo! Inc.

CumuloNimbo

Final Review Meeting

Brussels, Nov 27th, 2013

Page 2: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• Primary contributions to WP3

– Original goal was to build a filesystem, block cache

– Focus shifted due to existence of HBase/HDFS

– New focus:

• Tackle the fault tolerance issues in HDFS & HBase

• Provide direct Transactional access to HBase (OMID)

• Incremental processing framework on HBase (DeltaOMID)

• Year 3:

– HBase + BookKeeper

– DeltaOMID

Overview

Nov. 27th, 2013 CN Final Review 2

Page 3: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

Yahoo! Components in CN Stack

Nov. 27th, 2013 CN Final Review 3

HA Namenode

Delta

OMID

IP

App

OMID

HBase

BK

Page 4: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• HDFS namenode is SPOF for Hadoop

• Involved a large refactor of the namenode WAL, which was

previously file targeted

– JournalManager framework

– Bookkeeper implementation of JournalManager

• Ships with Hadoop since 2.0.3-alpha

• In production in Huawei, under evaluation in Yahoo

HDFS (D3.3)

Nov. 27th, 2013 CN Final Review 4

Page 5: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• Scalable reliable distributed write ahead logging

• Started as subproject of Apache Zookeeper

• Many important features added during CumuloNimbo

– Fencing

– Speculative reads

– Separate write and ack quorums

– Auto rereplication

• Plan to graduate to Apache Top Level Project in Q1 2014

• In production in Yahoo, Twitter, Huawei, HubSpot

Bookkeeper (D3.1, D5.6)

Nov. 27th, 2013 CN Final Review 5

Page 6: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• Through HDFS work, and speaking with HBase devs

discovered a persistence issue with HBase:

– WAL is only flushed out the sockets, no guarantee of hitting

disk

– In case of power loss → data loss

– Possible Split-brain

• Our solution was to integrate BookKeeper with HBase

– Strong persistence guarantees

– Fencing → prevents split-brain problems

HBase (D5.5)

Nov. 27th, 2013 CN Final Review 6

Page 7: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• Evaluate whether Bookkeeper (BK) can give acceptable read

performance

– BK highly write optimized, reads a concern

– BK already shown to match the write throughput required by HBase

HBase (Pre-evaluation)

Nov. 27th, 2013 CN Final Review 7

• Benchmark simulated read

pattern during node failure

• Result → Bookkeeper

matched the read rate for >

100 regions per node (D5.7)

• Gave us confidence to

continue implementation

Page 8: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• Prototype implementation of HBase+BK

– Functionally complete

– Benchmarks with YCSB revealed a very low-write throughput

(D5.7)

• Root cause: Synchronous architecture

– HBase Regionserver has a number of IPC handler threads

– IPC handlers block while handling a request

– Higher latency due to hitting disk = less requests handled per

second

– Increasing IPC handler count had minimal effect

HBase (Implementation + Benchmark)

Nov. 27th, 2013 CN Final Review 8

Page 9: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• Underlying problem is HBase architecture:

– Not possible to add strong persistency guarantee w/o

destroying write perf

– Using HDFS's new fsync functionality will see same problem

• Therefore we would recommend against using HBase

on rotational storage, as fsync latency will cripple

throughput.

HBase Evaluation + Recommendation

Nov. 27th, 2013 CN Final Review 9

Page 10: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• Incremental Processing on top of HBase:

– Big Data processing as small mutations to a large dataset

triggered by events

• Benefits:

– Low latency

– Data consistency

• Requires:

– Notifications → Allow register application interest in specific

data and trigger specific actions at the application level upon

updates → The Delta part

– Transaction management → Provides Concurrency and

Consistency of updates → The OMID part

DeltaOMID (D5.5)

Nov. 27th, 2013 CN Final Review 10

Page 11: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

DeltaOMID

Nov. 27th, 2013 CN Final Review 11

Notification Server Notification Server

App Container

Client App Obs Obs

Data

Store

ZK

Delta Client Library

Notification Server

HBase

Client HBase

Client Scanner

External

App/Event

Delta Omid Tree

Observers-Interests

Observers-App Instances

register () dataChanged()

Om

id T

x

Registration Flow

Notification Flow

Page 12: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

Transaction Management: OMID

11/29/2013 Omid - Yahoo! Research 12

• The Status Oracle (TSO)

• A Tx Manager on top of HBase

• Replicates Tx info (TSs) to

Transactional Clients (Reduce comm

overhead)

• Uses a BookKeeper to recover from

crashes

• Transactional Clients (TCs)

• Connect to the TSO to:

• Start Tx & Commit/Rollback Tx

• Perform the required operations in

HBase in the corresponding Tx Context

HBase

Clients

TS

s

HBase

Clients

TS

s

HBase

Clients

TS

s

App

HBase

Region

Servers

HBase

Region

Servers

Data

Store

HBase

Datastore

TCs

TS

s

TSO

Page 13: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

DeltaOMID + OMID Results

Nov. 27th, 2013 CN Final Review 13

Benchmarks to test Consistent IP Fmwk (D5.10 & 11) Simple App: 1 Observer/Actuator

• Injection rate 1000 updates/s

• Throughput and Idleness

Realistic App: 4 Observers/Actuators

• Injection rate 100 updates/s

• Throughput and Latency

0

50

100

150

200

250

300

350

400

0 2000 4000 6000 8000 10000 0

2000

4000

6000

8000

10000

12000

14000

16000

18000

Ope

ration

s/s

ec

Idle

ness C

oun

t

Time (s)

1 inst 2 inst 4 inst 8 inst 16 inst

Actuator processed updates/sec (1 min rate)Actuator idleness count

0

50

100

150

200

250

300

350

400

0 2000 4000 6000 8000 10000 0

2000

4000

6000

8000

10000

12000

14000

16000

18000

Ope

ration

s/s

ec

Idle

ness C

oun

t

Time (s)

1 inst 2 inst 4 inst 8 inst 16 inst

Actuator processed updates/sec (1 min rate)Actuator idleness count

0

50

100

150

200

250

300

350

400

0 2000 4000 6000 8000 10000 0

2000

4000

6000

8000

10000

12000

14000

16000

18000

Ope

ration

s/s

ec

Idle

ness C

oun

t

Time (s)

1 inst 2 inst 4 inst 8 inst 16 inst

Actuator processed updates/sec (1 min rate)Actuator idleness count

0

50

100

150

200

250

300

350

400

0 2000 4000 6000 8000 10000 0

2000

4000

6000

8000

10000

12000

14000

16000

18000

Ope

ration

s/s

ec

Idle

ness C

oun

t

Time (s)

1 inst 2 inst 4 inst 8 inst 16 inst

Actuator processed updates/sec (1 min rate)Actuator idleness count

0

20

40

60

80

100

120

0 20000 40000 60000 80000 100000 120000 140000

Thro

ug

hp

ut (o

ps/s

)

Time (s)

ContentInterests

Affinity

0

20000

40000

60000

80000

100000

120000

140000

0 20000 40000 60000 80000 100000 120000 140000

Late

ncy 9

9%

(m

s)

Time (s)

Inject -> ContentContent -> InterestInterest -> Affinity

0

20000

40000

60000

80000

100000

120000

140000

0 20000 40000 60000 80000 100000 120000 140000

Late

ncy 9

9%

(m

s)

Time (s)

Inject -> ContentContent -> InterestInterest -> Affinity

0

20

40

60

80

100

120

0 20000 40000 60000 80000 100000 120000 140000

Thro

ug

hp

ut (o

ps/s

)

Time (s)

ContentInterests

Affinity

Page 14: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• Main Contributions

– HDFS HA for CN storage layer (D3.3)

– HBase unfeasible for CN platform (D5.5 & D.5.7)

– Prove BK as a reliable and scalable WAL mechanism for CN

platform

– OMID - Transactions on key value store

– Delta OMID Consistent Incremental Processing Platform for

low-latency apps running in CN (D5.5 & D.5.7)

– Deployed in Flexiant Cloud Solutions, Private & Public

(D5.10-D5.11)

– Use of SAP PMF Monitoring (Demo)

Y! Results

Nov. 27th, 2013 CN Final Review 14

Page 15: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

Questions?

Nov. 27th, 2013 CN Final Review 15

Page 16: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

Yahoo Inc.

Exploitation

Page 17: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• Internal technology transfer

• Increase industry exposure through open source

– Raises profile of engineering in Yahoo

– Makes Yahoo attractive to developers

Yahoo Exploitation

Nov. 27th, 2013 CN Final Review 17

Page 18: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• Distributed write ahead logging

• In production during last 6 months

• Providing persistence layer in a major

infrastructure service

• Apache Top Level Project in Q1 2014

BookKeeper

Nov. 27th, 2013 CN Final Review 18

Page 19: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• Currently being evaluated by Grid Services Team

• Concurrent effort to build next generation

NameNode

– Will reuse much of the work from CumuloNimbo

• Already shipping with Hadoop 2.0.3-alpha

onwards

HDFS High Availability

Nov. 27th, 2013 CN Final Review 19

Page 20: CumuloNimbo Final Review Meeting Brussels, Nov 27th, 2013lsd.ls.fi.upm.es/Members/admin/07. Omid, HA for HDFS - Yahoo.pdf•Provide direct Transactional access to HBase (OMID) •Incremental

FP7-257993

• Omid open sourced on Github in 2011

• Pilot use-case discussed with product architects

• Will be presented at internal Yahoo conference in

December 2013

• Plan to open-source in Q1 2014

DeltaOMID

Nov. 27th, 2013 CN Final Review 20