HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE <...

139
> < HOPSFS & EPIPE HopsFS & ePipe Mahmoud Ismail <[email protected]> Gautier Berthou <[email protected]> 1 www.hops.io @hopshadoop github.com/hopshadoop

Transcript of HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE <...

Page 1: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE

HopsFS & ePipeMahmoud Ismail <[email protected]>Gautier Berthou <[email protected]>

1

www.hops.io @hopshadoop

github.com/hopshadoop

Page 2: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 2

Agenda

• From HDFS to HopsFS

• ePipe to enable a searchable HopsFS

• Tutorial

Page 3: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

next 3

A File System with a million ops/sec and searchable in sub

seconds?!

Page 4: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

next 4Bill Gates’ biggest product regret?

Page 5: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 5

WinFS

*http://www.zdnet.com/article/bill-gates-biggest-microsoft-product-regret-winfs/

•“WinFS was an attempt to bring the benefits of schema and relational databases to the Windows file system. …The WinFS effort was started around 1999 as the successor to the planned storage layer of Cairo and died in 2006 after consuming many thousands of hours of efforts from really smart engineers.” - [Brian Welcker]*

Page 6: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 6

HDFS

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

Page 7: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 6

HDFS

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

Page 8: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 6

HDFS

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

Page 9: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 6

HDFS

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

directory -> {f1, f2,..}

Page 10: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 6

HDFS

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

directory -> {f1, f2,..}file -> {b1, b2,..}

Page 11: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 6

HDFS

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

directory -> {f1, f2,..}file -> {b1, b2,..}

block -> {r1, r2,..}

Page 12: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 6

HDFS

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

directory -> {f1, f2,..}file -> {b1, b2,..}

block -> {r1, r2,..}….

Page 13: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 7

JVM Heap is the limit

Page 14: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 7

JVM Heap is the limit

• Storing NameNode metadata in JVM Heap

Page 15: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 7

JVM Heap is the limit

• Storing NameNode metadata in JVM Heap

• Very efficient, yet

Page 16: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 7

JVM Heap is the limit

• Storing NameNode metadata in JVM Heap

• Very efficient, yet

• Number of files/directories are limited

Page 17: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 7

JVM Heap is the limit

• Storing NameNode metadata in JVM Heap

• Very efficient, yet

• Number of files/directories are limited

• Garbage collection pause times

Page 18: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 8

One Lock to rule them all

/root

dir1 dir n...dir2

... ... ...

Page 19: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 8

One Lock to rule them all

/root

dir1 dir n...dir2

... ... ...

multi-reader, single writer concurrency semantics

Page 20: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 9

Move the NameNode metadata off the JVM Heap

Page 21: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 9

Move the NameNode metadata off the JVM Heap

• Stateless NameNode

Page 22: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 9

Move the NameNode metadata off the JVM Heap

• Stateless NameNode

• Multiple NameNodes to increase Throughput

Page 23: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 9

Move the NameNode metadata off the JVM Heap

• Stateless NameNode

• Multiple NameNodes to increase Throughput

• Throughput dependent on our chosen data store

Page 24: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 9

Move the NameNode metadata off the JVM Heap

• Stateless NameNode

• Multiple NameNodes to increase Throughput

• Throughput dependent on our chosen data store

• Choosing a data store?

Page 25: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 9

Move the NameNode metadata off the JVM Heap

• Stateless NameNode

• Multiple NameNodes to increase Throughput

• Throughput dependent on our chosen data store

• Choosing a data store?

• An in-memory storage system that can be efficiently queried and managed. Preferably Open-Source.

Page 26: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 9

Move the NameNode metadata off the JVM Heap

• Stateless NameNode

• Multiple NameNodes to increase Throughput

• Throughput dependent on our chosen data store

• Choosing a data store?

• An in-memory storage system that can be efficiently queried and managed. Preferably Open-Source.

• Row-level locking

Page 27: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 9

Move the NameNode metadata off the JVM Heap

• Stateless NameNode

• Multiple NameNodes to increase Throughput

• Throughput dependent on our chosen data store

• Choosing a data store?

• An in-memory storage system that can be efficiently queried and managed. Preferably Open-Source.

• Row-level locking

• Efficient Cross partition transaction

Page 28: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 10

MySQL Cluster (NDB) to the rescue

Page 29: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 10

MySQL Cluster (NDB) to the rescue

• NewSQL (Relational) DB

• User-defined partitioning

• Row-level Locking

• Distribution-aware transactions

• Partition-pruned index scans

• Real-time, 2-Phase Commit

• 1.2 sec TransactionInactive timeouts

Page 30: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 10

MySQL Cluster (NDB) to the rescue

• NewSQL (Relational) DB

• User-defined partitioning

• Row-level Locking

• Distribution-aware transactions

• Partition-pruned index scans

• Real-time, 2-Phase Commit

• 1.2 sec TransactionInactive timeouts

• Commodity Hardware

• Scales to 48 nodes

• Supports on-disk columns

• SQL API

• C++/Java Native API

• C++ Event API

Page 31: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 10

MySQL Cluster (NDB) to the rescue

• NewSQL (Relational) DB

• User-defined partitioning

• Row-level Locking

• Distribution-aware transactions

• Partition-pruned index scans

• Real-time, 2-Phase Commit

• 1.2 sec TransactionInactive timeouts

200 Million NoSQL Ops/sec

• Commodity Hardware

• Scales to 48 nodes

• Supports on-disk columns

• SQL API

• C++/Java Native API

• C++ Event API

Page 32: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 11

HopsFS

NNs

HopsFS / HDFS Clients

SbNN

Datanodes

ZooKeeper Nodes

HDFSClients

HDFS

Met

adat

a M

gm

Leader

DAL Driver

NDB

File

Blk

s

Journal Nodes

HopsFS

ANN

Datanodes

Page 33: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 12

From Memory to Database

/root

2014

dir1

... ...

us

c.log

b.log

nydir1

...

...

Quota

INode

Page 34: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 13

Apache NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

Journal Nodes

Client

Page 35: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 13

Apache NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

Journal Nodes

Client

ConnectionListListener

(Nio Thread)

Page 36: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 13

Apache NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

Journal Nodes

Client

Reader1 ReaderN…

ConnectionList

Call Queue

Listener(Nio Thread)

ipc.server.read.threadpool.size (default 1)

Page 37: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 13

Apache NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

Journal Nodes

Client

Reader1 ReaderN…

Handler1 HandlerM

ConnectionList

Call Queue

Listener(Nio Thread)

dfs.namenode.service.handlercount (default 10)

ipc.server.read.threadpool.size (default 1)

Page 38: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 13

Apache NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

Journal Nodes

Client

Reader1 ReaderN…

Handler1 HandlerM

ConnectionList

Call Queue

Meta Data & In-Memory EditLogFSNameSystem Lock

Listener(Nio Thread)

dfs.namenode.service.handlercount (default 10)

ipc.server.read.threadpool.size (default 1)

Handler1 HandlerM…

Page 39: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 13

Apache NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

Journal Nodes

Client

Reader1 ReaderN…

Handler1 HandlerM

ConnectionList

Call Queue

Meta Data & In-Memory EditLogFSNameSystem Lock

EditLog Buffer

Listener(Nio Thread)

dfs.namenode.service.handlercount (default 10)

ipc.server.read.threadpool.size (default 1)

Handler1 HandlerM…

Page 40: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 13

Apache NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

Journal Nodes

Client

Reader1 ReaderN…

Handler1 HandlerM

ConnectionList

Call Queue

Meta Data & In-Memory EditLogFSNameSystem Lock

EditLog Buffer

EditLog1 EditLog2 EditLog3

Listener(Nio Thread)

dfs.namenode.service.handlercount (default 10)

ipc.server.read.threadpool.size (default 1)

Handler1 HandlerM…

flush

Page 41: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 13

Apache NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

Journal Nodes

Client

Reader1 ReaderN…

Handler1 HandlerM

ConnectionList

Call Queue

Meta Data & In-Memory EditLogFSNameSystem Lock

EditLog Buffer

EditLog1 EditLog2 EditLog3

Listener(Nio Thread)

dfs.namenode.service.handlercount (default 10)

ipc.server.read.threadpool.size (default 1)

Handler1 HandlerM… Done RPCs

ackIdsflush

Page 42: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 13

Apache NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

Journal Nodes

Client

Reader1 ReaderN…

Handler1 HandlerM

ConnectionList

Call Queue

Meta Data & In-Memory EditLogFSNameSystem Lock

EditLog Buffer

EditLog1 EditLog2 EditLog3

Listener(Nio Thread)

Responder(Nio Thread)

dfs.namenode.service.handlercount (default 10)

ipc.server.read.threadpool.size (default 1)

Handler1 HandlerM… Done RPCs

ackIdsflush

Page 43: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 14

HopsFS NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

NDB

Client

DAL-Impl

Page 44: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 14

HopsFS NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

NDB

Client

ConnectionListListener

(Nio Thread)

DAL-Impl

Page 45: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 14

HopsFS NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

NDB

Client

Reader1 ReaderN…

ConnectionList

Call Queue

Listener(Nio Thread)

ipc.server.read.threadpool.size (default 1)

DAL-Impl

Page 46: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 14

HopsFS NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

NDB

Client

Reader1 ReaderN…

Handler1 HandlerM

ConnectionList

Call Queue

Listener(Nio Thread)

dfs.namenode.service.handlercount (default 10)

ipc.server.read.threadpool.size (default 1)

DAL-Impl

Page 47: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 14

HopsFS NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

NDB

Client

Reader1 ReaderN…

Handler1 HandlerM

ConnectionList

Call Queue

Listener(Nio Thread)

dfs.namenode.service.handlercount (default 10)

ipc.server.read.threadpool.size (default 1)

DAL-ImplDAL API

Page 48: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 14

HopsFS NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

NDB

Client

Reader1 ReaderN…

Handler1 HandlerM

ConnectionList

Call Queue

inodes block_infos replicas

Listener(Nio Thread)

dfs.namenode.service.handlercount (default 10)

ipc.server.read.threadpool.size (default 1)

Handler1 HandlerM…

leases…

DAL-ImplDAL API

Page 49: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 14

HopsFS NameNode InternalsClient: mkdir, getblocklocations, createFile,…..

NameNode

NDB

Client

Reader1 ReaderN…

Handler1 HandlerM

ConnectionList

Call Queue

inodes block_infos replicas

Listener(Nio Thread)

Responder(Nio Thread)

dfs.namenode.service.handlercount (default 10)

ipc.server.read.threadpool.size (default 1)

Handler1 HandlerM…

Done RPCs

ackIds

leases…

DAL-ImplDAL API

Page 50: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 15

Fine grain Locking

/root

2014 archiveeu

dir1 dir n...dir2

... ... ...

us

a.log

se dir1 dir2

... ...

de c.log

b.log

dept

d1 d2

... ...

...

d0

dn

Page 51: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 15

Fine grain Locking

/root

2014 archiveeu

dir1 dir n...dir2

... ... ...

us

a.log

se dir1 dir2

... ...

de c.log

b.log

dept

d1 d2

... ...

...

d0

dn

• Hierarchical Locking (Implicit Locking)

Page 52: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 15

Fine grain Locking

/root

2014 archiveeu

dir1 dir n...dir2

... ... ...

us

a.log

se dir1 dir2

... ...

de c.log

b.log

dept

d1 d2

... ...

...

d0

dn

• Hierarchical Locking (Implicit Locking)

• Subtree Locking

Page 53: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 16

Implicit Locking

/root

2014

dir1

... ...

us

c.log

b.log

nydir1

...

...

Quota

INode

Page 54: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 16

Implicit Locking

/root

2014

dir1

... ...

us

c.log

b.log

nydir1

...

...

Quota

INode

Page 55: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 16

Implicit Locking

/root

2014

dir1

... ...

us

c.log

b.log

nydir1

...

...

Quota

INode

Page 56: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 17

Pluggable DBs: Data Abstraction Layer

NameNode(Apache v2)

NDB-DAL-Impl(GPL v2)

DAL API(Apache v2)

Page 57: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 17

Pluggable DBs: Data Abstraction Layer

NameNode(Apache v2)

NDB-DAL-Impl(GPL v2)

Other DB(Other License)

DAL API(Apache v2)

Page 58: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 18

Spotify Workload

Op Name Percentage Op Name Percentageappend file 0.0% content summary 0.01%mkdirs 0.02% set permissions 0.03% [26.3%⇤]set replication 0.14% set owner 0.32 % [100%⇤]delete 0.75% [3.5%⇤] create file 1.2%rename 1.3% [0.03%⇤] add blocks 1.5%list (listStatus) 9% [94.5%⇤] stat (fileInfo) 17% [23.3%⇤]read (getBlkLoc) 68.73%

Table 1: Relative frequency of operations on a Spotify HDFS cluster.List, read, and stat operations account for ⇡ 95% of the metadata op-erations in the cluster.⇤Of which, the relative percentage is on directories

NameNode on which to execute file system operations.HopsFS clients periodically refresh the NameNode list,enabling new NameNodes to join an operational cluster.HDFS v2.x clients are fully compatible with HopsFS,although they do not distribute operations over Name-Nodes, as they assume there is a single active Name-Node.

MySQL Cluster: NDB is the storage enginefor MySQL Cluster and it is a shared-nothing, in-memory, auto-sharding, consistent, distributed, rela-tional database [38]. NDB frequently checkpoints thedata and supports both NDB datanode and cluster levelrecovery.

NDB horizontally partitions the tables among theNDB datanodes. It also supports application defined par-titioning (ADP) for the tables. The transaction coordina-tors are located on all the NDB datanodes, enabling highperformance transactions between data shards. Distri-bution aware transactions (DAT) are possible by provid-ing a hint, based on the application defined partitioningscheme, to start a transaction on the NDB datanode con-taining the data read/updated by the transaction. In par-ticular, single row read operations and partition prunedindex scans (scan operations in which a single data shardparticipates) benefit from distribution aware transactionsas they can read all their data locally [77]. Incorrect hintsresult in additional network traffic being incurred but oth-erwise correct system operation.

NDB only supports read-committed transaction isola-tion, which does not allow dirty reads but phantom andfuzzy (non-repeatable) reads can happen in a transac-tion [7]. NDB supports row level locks, exclusive (write)locks, shared (read) locks, and read-committed locks.HopsFS uses locks to serialize conflicting file system op-erations.

3 Partitioning Scheme and TransactionsMetadata for hierarchical distributed file systems typi-cally contains information on inodes, blocks, replicas,quotas, leases and mappings (directories to files, files toblocks, and blocks to replicas). When metadata is dis-tributed, an application defined partitioning scheme isneeded to shard the metadata and a distributed consensus

Figure 2: (a) Shows the relative cost of different operations in NewSQLdatabase. (b) HopsFS avoids FTS and IS operations as the cost theseoperation is relatively higher than PPIS, B, and PK operations.

protocol is required to ensure metadata integrity for op-erations that cross shards. Quorum-based consensus pro-tocols, such as Paxos, provide high performance withina single shard, but are typically combined with transac-tions, implemented using the two-phase commit proto-col, for operations that cross shards, as in Megastore [6]and Spanner [11]. File system operations in HopsFS areimplemented primarily using transactions and row-levellocks in MySQL Cluster to provide serializability [23]for metadata operations.

The choice of partitioning scheme for the hierarchi-cal namespace is a key design decision for distributedmetadata architectures. We base our partitioning schemeon the expected relative frequency of HDFS operationsand the cost of different database operations that can beused to implement the file system operations. Table 1shows the relative frequency of selected HDFS opera-tions in a workload generated by Hadoop applications(Pig, Hive, HBase, MapReduce, Tez, Spark, and Giraph)at Spotify. List, stat and file read operations alone ac-count for ⇡ 95% of the operations in the HDFS cluster.These statistics are similar to the published workloads forHadoop clusters at Yahoo [1], LinkedIn [53], and Face-book [66]. Figure 2a shows the relative cost of differentdatabase operations. We can see that the cost of a fulltable scan or an index scan, in which all database shardsparticipate, is much higher than a partition pruned indexscan in which only a single database shard participates.HopsFS implements common file system operations us-ing only the low cost database operations, that is, primarykey read, batched primary key reads and partition prunedindex scans. For example, the read and directory list-ing operations (see Table 3), are implemented using only(batched) primary key lookups and partition pruned in-dex scans. Index scans and full table scans were avoided,where possible, as they touch all database shards andscale poorly.

3.1 HopsFS Partitioned MetadataIn HopsFS, the file system metadata is stored in tableswhere a directory inode is represented by a single row inthe Inode table. File inodes, however, have more asso-ciated metadata, such as a set of blocks, block locations,and checksums that are stored in separate tables, with

3

Relative frequency of operations on a Spotify HDFS cluster. List, read, and stat operations account for ≈ 95% of the metadata operations in the cluster.

*Of which, the relative percentage is on directories

[Niazi, Salman, et al. "HopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases." arXiv preprint (2016)]

Page 59: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 19

To Infinity and Beyond

NDB Setup: 8 Nodes using Xeon E5-2620 2.40GHz Processors and 10GbE. NameNodes: Xeon E5-2620 2.40GHz Processors machines and 10GbE.

0

100k

200k

300k

400k

500k

600k

700k

1 6 11 16 21 26 31 36 41 46

ops/

sec

Number of Namenodes

HopsFS-SpotifyHDFS-Spotify

~8.5X

Page 60: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 19

To Infinity and Beyond

NDB Setup: 8 Nodes using Xeon E5-2620 2.40GHz Processors and 10GbE. NameNodes: Xeon E5-2620 2.40GHz Processors machines and 10GbE.

0

100k

200k

300k

400k

500k

600k

700k

1 6 11 16 21 26 31 36 41 46

ops/

sec

Number of Namenodes

HopsFS-SpotifyHDFS-Spotify

~8.5X

Page 61: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 19

To Infinity and Beyond

NDB Setup: 8 Nodes using Xeon E5-2620 2.40GHz Processors and 10GbE. NameNodes: Xeon E5-2620 2.40GHz Processors machines and 10GbE.

0

100k

200k

300k

400k

500k

600k

700k

1 6 11 16 21 26 31 36 41 46

ops/

sec

Number of Namenodes

HopsFS-SpotifyHDFS-Spotify

~8.5X

Page 62: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 19

To Infinity and Beyond

NDB Setup: 8 Nodes using Xeon E5-2620 2.40GHz Processors and 10GbE. NameNodes: Xeon E5-2620 2.40GHz Processors machines and 10GbE.

0

100k

200k

300k

400k

500k

600k

700k

1 6 11 16 21 26 31 36 41 46

ops/

sec

Number of Namenodes

HopsFS-SpotifyHDFS-Spotify

~8.5X

Page 63: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 20

Bigger Clusters

0 20 40 60 80

100 120 140 160

0.25 0.50 0.75 1.00 50

100

150

200

250

300

Hop

sFS

op ti

me

(sec

)

HD

FS o

p tim

e (m

s)

No of files in a directory (million)

HDFS renameHDFS delete

HopsFS renameHopsFS delete

Figure 7: Performance of rename and delete operations on large direct-ories. Note, the different time scales for HDFS and HopsFS. HopsFS isan order of magnitude slower as it reads all the descendant inodes of thesubtree from an external database.

the memory. However, due to the low frequency of suchoperations in typical industrial workloads (see Table 1),we think it is an acceptable trade-off for the higher per-formance of common file system operations in HopsFS.

7.3 Metadata (Namespace) ScalabilityIn HDFS, as the entire namespace metadata must fit onthe heap of single JVM, the data structures are highly op-timized to reduce the memory footprint [62]. In HDFS,a file with two blocks that are replicated three ways re-quires 448 + L bytes of metadata1 where L represents thefilename length. If the file names are 10 characters long,then a 1 GB JVM heap can store 2.3 million files. In real-ity the JVM heap size has to be significantly larger to ac-commodate secondary metadata, thousands of concurrentRPC requests, block reports that can each be tens of mega-bytes in size, as well as other temporary objects.

Number of FilesMemory HDFS HopsFS1 GB 2.3 million 0.44 million50 GB 115 million 22 million100 GB 230 million 44 million200 GB 460 million 88 million500 GB Does Not Scale 220 million1 TB Does Not Scale 440 million24 TB Does Not Scale 10.8 billion

Table 2: HDFS and HopsFS Metadata Scalability.

Migrating the metadata to a database causes an expan-sion in the amount of memory required to accommod-ate indexes, primary/foreign keys and padding. In orderto calculate the size of each entity we use a tool calledsizer [40]. In HopsFS the same file described above takes2420 bytes if the metadata is replicated twice. For ahighly available deployment with an active and standbynamenodes for HDFS, you will need twice the amount ofmemory, thus, HopsFS requires ⇡ 2 times more memorythan HDFS to store metadata that is highly available.

Table 2 shows the metadata scalability of HDFS andHopsFS. NDB supports up to 48 datanodes, which allowsit to scale up to 24 TB of data in a cluster with 512 GBRAM on each NDB datanode. HopsFS can store up to

1These size estimates are for HDFS version 2.0.4 from whichHopsFS was forked. Newer version of HDFS require additional memoryfor new features such as snapshots and extended attributes.

10.8 billion files using 24 TB of metadata, which is anorder of magnitude higher (24 times) than HDFS.

7.4 Industrial Workload ExperimentsWe benchmarked HopsFS using workloads based on oper-ational traces from Spotify that operates a Hadoop clusterconsisting of 1600+ nodes containing 60 petabytes ofdata. The namespace contains 13 million directories and218 million files where each file on average contains 1.3blocks. The Hadoop cluster at Spotify runs on averageforty thousand jobs of different applications, such as, Pig,Hive, HBase, MapReduce, Tez, Spark, and Giraph everyday. The file system workload generated by these applic-ation is summarized in Table 1, which shows the relativefrequency of HDFS operations. At Spotify the averagefile path depth is 7 and average inode name length is 34characters. On average each directory contains 16 filesand 2 sub-directories. There are 289 million blocks storedon the datanodes. We use these statistics to generate filesystem workloads that approximate HDFS usage in pro-duction at Spotify.

7.4.1 Scalability ArgumentFigure 8 shows that, for our industrial workload, HopsFSdelivers 2.6 times the throughput of HDFS with 12 name-nodes and 8 NDB nodes. Even for 2-node NDB deploy-ments, HopsFS can outperform HDFS and scale linearlyup to 8 namenodes.

As discussed before in medium to large Hadoopclusters 5 to 8 servers are required to provide high avail-ability for HDFS. With 6 servers, HopsFS delivers higherthroughput than HDFS, increasing further as more name-nodes are added to the system.

For higher numbers of namenodes, HopsFS’ through-put levels off because NDB becomes overloaded. Byincreasing the number of NDB datanode instances to 4,HopsFS increases throughput up to 11 namenodes, andby increasing the number of NDB datanode instances to8, HopsFS scales up to at least 12 namenodes. The 2-node NDB cluster has a similar performance as the 4-nodeNDB cluster because each NDB datanode in 2-node NDBcluster holds the complete copy of the entire metadata,which improves the performance of transactions as all the

20K

40K

60K

80K

100K

120K

140K

1 2 3 4 5 6 7 8 9 10 11 12

ops/

sec

Number of Namenodes

HopsFS with 2 NDB NodesHopsFS with 4 NDB NodesHopsFS with 8 NDB Nodes

HDFS (Using 5 Server)

Figure 8: HopsFS and HDFS throughput for our industrial workload.

9

Page 64: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 21

Tinker Friendly Metadata

Page 65: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 21

Tinker Friendly Metadata

• The Database (NDB) is the single source of truth

Page 66: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 21

Tinker Friendly Metadata

• The Database (NDB) is the single source of truth

• Extending INodes (files/directories)

Page 67: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 21

Tinker Friendly Metadata

• The Database (NDB) is the single source of truth

• Extending INodes (files/directories)

• Adding a new table with a foreign key to the nodes table

Page 68: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 21

Tinker Friendly Metadata

• The Database (NDB) is the single source of truth

• Extending INodes (files/directories)

• Adding a new table with a foreign key to the nodes table

• Attaching metadata to a file/directory

Page 69: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 21

Tinker Friendly Metadata

• The Database (NDB) is the single source of truth

• Extending INodes (files/directories)

• Adding a new table with a foreign key to the nodes table

• Attaching metadata to a file/directory

• Schema-less

Page 70: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 21

Tinker Friendly Metadata

• The Database (NDB) is the single source of truth

• Extending INodes (files/directories)

• Adding a new table with a foreign key to the nodes table

• Attaching metadata to a file/directory

• Schema-less

• Schema-based

Page 71: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 22

Schema-less Metadata

Page 72: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 22

Schema-less Metadata

inodeID Name parentId1 / 02 Users 13 alice.txt 2

Page 73: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 22

Schema-less Metadata

inodeID Name parentId1 / 02 Users 13 alice.txt 2

attach /Users/alice.txt ’{“age” : 20, “gender” : “female”, “about”: “I am alice”}’

Page 74: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 22

Schema-less Metadata

inodeID Name parentId1 / 02 Users 13 alice.txt 2

attach /Users/alice.txt ’{“age” : 20, “gender” : “female”, “about”: “I am alice”}’

inodeID metadata

3 {“age” : 20, “gender” : “female”, “about”: “I am alice”}

Page 75: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 22

Schema-less Metadata

inodeID Name parentId1 / 02 Users 13 alice.txt 2

attach /Users/alice.txt ’{“age” : 20, “gender” : “female”, “about”: “I am alice”}’

inodeID metadata

3 {“age” : 20, “gender” : “female”, “about”: “I am alice”}

Page 76: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 23

Schema-based Metadata

Page 77: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 23

Schema-based Metadata

Template

Page 78: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 23

Schema-based Metadata

Template

Page 79: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 23

Schema-based Metadata

Template TemplateTemplateTable

Page 80: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 23

Schema-based Metadata

Template TemplateTemplateTable

Page 81: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 23

Schema-based Metadata

Template TemplateTemplateTableColumnColumnColumnColumnColumnColumn

Page 82: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 23

Schema-based Metadata

Template TemplateTemplateTableColumnColumnColumnColumnColumnColumn

INode

Page 83: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 23

Schema-based Metadata

Template TemplateTemplateTableColumnColumnColumnColumnColumnColumn

INode

Page 84: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 23

Schema-based Metadata

Template TemplateTemplateTableColumnColumnColumnColumnColumnColumn

INode

Page 85: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 23

Schema-based Metadata

Template TemplateTemplateTableColumnColumnColumnColumnColumnColumn

INode Metadata Row

Page 86: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 23

Schema-based Metadata

Template TemplateTemplateTableColumnColumnColumnColumnColumnColumn

INode Metadata Row

Page 87: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 24

Searching through the Namespace

Page 88: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 24

Searching through the Namespace

• hdfs -find

Page 89: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 24

Searching through the Namespace

• hdfs -find

• limited

Page 90: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 24

Searching through the Namespace

• hdfs -find

• limited

• inefficient by design

Page 91: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 24

Searching through the Namespace

• hdfs -find

• limited

• inefficient by design

• Pig/MapReduce

Page 92: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 24

Searching through the Namespace

• hdfs -find

• limited

• inefficient by design

• Pig/MapReduce

• NameNode is a critical point, avoid overloading

Page 93: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 24

Searching through the Namespace

• hdfs -find

• limited

• inefficient by design

• Pig/MapReduce

• NameNode is a critical point, avoid overloading

• SQL Query on NDB

Page 94: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 24

Searching through the Namespace

• hdfs -find

• limited

• inefficient by design

• Pig/MapReduce

• NameNode is a critical point, avoid overloading

• SQL Query on NDB

• One Size does not fit all

Page 95: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 25

Polyglot Persistence: the Right Tool for the Job

Page 96: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 26

Monolithic vs Polyglot Persistence

[http://martinfowler.com/]

Page 97: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 27

Asynchronous Update of the Metadata

Eventual Consistency for Metadata. Metadata Integrity maintained by

Asynchronous Replication and Metadata Immutability.

FilesDirectoriesMetadata

Search Indexes

Page 98: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 27

Asynchronous Update of the Metadata

Eventual Consistency for Metadata. Metadata Integrity maintained by

Asynchronous Replication and Metadata Immutability.

FilesDirectoriesMetadata

Search Indexes

Database

Page 99: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 27

Asynchronous Update of the Metadata

Eventual Consistency for Metadata. Metadata Integrity maintained by

Asynchronous Replication and Metadata Immutability.

FilesDirectoriesMetadata

Search Indexes

Database

Page 100: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 27

Asynchronous Update of the Metadata

Eventual Consistency for Metadata. Metadata Integrity maintained by

Asynchronous Replication and Metadata Immutability.

FilesDirectoriesMetadata

Search Indexes

DatabaseElasticsearch

Page 101: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 27

Asynchronous Update of the Metadata

Eventual Consistency for Metadata. Metadata Integrity maintained by

Asynchronous Replication and Metadata Immutability.

FilesDirectoriesMetadata

Search Indexes

DatabaseElasticsearch

Page 102: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 27

Asynchronous Update of the Metadata

Eventual Consistency for Metadata. Metadata Integrity maintained by

Asynchronous Replication and Metadata Immutability.

FilesDirectoriesMetadata

Search Indexes

DatabaseElasticsearch

Page 103: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 27

Asynchronous Update of the Metadata

Eventual Consistency for Metadata. Metadata Integrity maintained by

Asynchronous Replication and Metadata Immutability.

FilesDirectoriesMetadata

Search Indexes

DatabaseElasticsearch

ePipe

Page 104: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 27

Asynchronous Update of the Metadata

Eventual Consistency for Metadata. Metadata Integrity maintained by

Asynchronous Replication and Metadata Immutability.

FilesDirectoriesMetadata

Search Indexes

DatabaseElasticsearch one-way replication

ePipe

Page 105: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 27

Asynchronous Update of the Metadata

Eventual Consistency for Metadata. Metadata Integrity maintained by

Asynchronous Replication and Metadata Immutability.

FilesDirectoriesMetadata

Search Indexes

DatabaseElasticsearch one-way replicationimmutable data

ePipe

Page 106: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 28

HopsFS | ElasticSearch

MySQL Cluster

ElasticHandlerTailer

Table Unit

Batcher Reader

ePipe

ElasticSearchCluster

Page 107: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

next 29Supporting Project-Level

Multi-Tenancy

How can we introduce GitHub-style projects to Hadoop?

Page 108: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 30

Tutorial

Page 110: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 30

Tutorial

• Karamel Automated Installation

• http://www.hops.io/?q=boss

Page 111: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 30

Tutorial

• Karamel Automated Installation

• http://www.hops.io/?q=boss

• HopsWorks Clusters

Page 112: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 30

Tutorial

• Karamel Automated Installation

• http://www.hops.io/?q=boss

• HopsWorks Clusters

• http://52.210.205.125:8080/hopsworks/

Page 113: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 30

Tutorial

• Karamel Automated Installation

• http://www.hops.io/?q=boss

• HopsWorks Clusters

• http://52.210.205.125:8080/hopsworks/

• http://52.209.143.68:8080/hopsworks/

Page 114: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 30

Tutorial

• Karamel Automated Installation

• http://www.hops.io/?q=boss

• HopsWorks Clusters

• http://52.210.205.125:8080/hopsworks/

• http://52.209.143.68:8080/hopsworks/

• http://52.18.186.135:8080/hopsworks/

Page 115: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 31

Conclusions

• Hops is a next-generation distribution of Hadoop.

• HopsWorks is a frontend to Hops that supports multi-tenancy, free-text search, interactive analytics with Zeppelin/Flink/Spark, and batch jobs.

• Looking for contributors/committers

• Check out our github (github.com/hopshadoop)

Page 116: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

next 32Questions

Page 117: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

next 33

Page 118: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

next 34Karamel

Page 119: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 35

Automated Installation

• Vagrant/Chef to spin up on a single host

• Karamel/Chef to deploy on AWS/GCE/OpenStack or on-premises

name: HopsWorks ec2: type: m3.medium cookbooks: hadoop: github: "hopshadoop/hopsworks-chef" version: "v0.1" groups: ui: size: 1 recipes: - hopsworks metadata: size: 2 recipes: - hops::nn - hops::rm datanodes: size: 50 recipes: - hops::dn - hops::nm

Page 120: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

next 36HopsWorks

Page 121: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 37

Problem: Sensitive Data needs its own Cluster

NSA DataSet

has access to

Alice

Page 122: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 37

Problem: Sensitive Data needs its own Cluster

NSA DataSet

has access to

User DataSet

give access toAlice

Page 123: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 37

Problem: Sensitive Data needs its own Cluster

NSA DataSet

has access to

User DataSet

give access to

Alice can copy/cross-link between data sets

Alice

Page 124: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 37

Problem: Sensitive Data needs its own Cluster

NSA DataSet

has access to

User DataSet

give access to

Alice can copy/cross-link between data sets

Alice has only one Kerberos Identity. Neither attribute-based access control nor dynamic roles supported in Hadoop.

Alice

Page 125: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 38

Solution: Project-Specific UserIDs

Project NSA

Project UsersMember of

NSA__Alice

Users__Alice

Member of

Page 126: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 38

Solution: Project-Specific UserIDs

Project NSA

Project UsersMember of

NSA__Alice

Users__Alice

Member of

HDFS enforcesaccess control

Page 127: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 38

Solution: Project-Specific UserIDs

Project NSA

Project UsersMember of

NSA__Alice

Users__Alice

Member of

HDFS enforcesaccess control

How can we share DataSets between Projects?

Page 128: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 39

Sharing DataSets between Projects

Project NSA

Project UsersMember of

NSA__Alice

Users__Alice

Member of

Page 129: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 39

Sharing DataSets between Projects

Project NSA

Project UsersMember of

DataSetowns

NSA__Alice

Users__Alice

Member of

Page 130: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 39

Sharing DataSets between Projects

Project NSA

Project UsersMember of

DataSetowns

Add members of Project NSA to the DataSet group

NSA__Alice

Users__Alice

Member of

Page 131: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 40

HopsWorks enforces dynamic Roles

[email protected]

NSA__Alice

Authenticate

Users__Alice

HopsWorks

HopsFS

HopsYARN

Projects

Kafka

Page 132: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 40

HopsWorks enforces dynamic Roles

[email protected]

NSA__Alice

Authenticate

Users__Alice

HopsWorks

HopsFS

HopsYARN

Projects

Kafka

Page 133: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 40

HopsWorks enforces dynamic Roles

[email protected]

NSA__Alice

Authenticate

Users__Alice

HopsWorks

HopsFS

HopsYARN

Projects

Kafka

Page 134: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 40

HopsWorks enforces dynamic Roles

[email protected]

NSA__Alice

Authenticate

Users__Alice

HopsWorks

HopsFS

HopsYARN

Projects

Kafka

Page 135: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 40

HopsWorks enforces dynamic Roles

[email protected]

NSA__Alice

Authenticate

Users__Alice

HopsWorks

HopsFS

HopsYARN

ProjectsSecure

Impersonation

Kafka

X.509 Certificates

Page 136: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 41

X.509 Certificate per project specific user

[email protected]

Authenticate

Add/Del Users

Distributed Database

Insert/Remove CertsProject Mgr

Root CA

Services Hadoop Spark Kafka etc

Cert Signing Requests

Page 137: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 42

Project

• A project is a collection of

• Members

• HDFS DataSets

• Kafka Topics

• Notebooks, Jobs

• A project has an owner

• A project has quotas

projectdataset 1

dataset N

Topic 1

Topic N

Kafka

HDFS

Page 138: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 43

Project Roles

•Data Owner Privileges

- Import/Export data

- Manage Membership

- Share DataSets, Topics

•Data Scientist Privileges

- Write and Run code

Page 139: HopsFS & ePipe - TU Berlinboss.dima.tu-berlin.de/media/BOSS16-hopsfs-epipe.pdf · HOPSFS & EPIPE < 10 > MySQL Cluster (NDB) to the rescue • NewSQL (Relational) DB • User-defined

><HOPSFS & EPIPE 43

Project Roles

•Data Owner Privileges

- Import/Export data

- Manage Membership

- Share DataSets, Topics

•Data Scientist Privileges

- Write and Run codeWe delegate administration of privileges to users