Bamboo: An effort towards Hadoop as a Service

30
Bamboo An effort towards Hadoop As A Service © Copyright 2012 HP Lab Singapore 1

description

This is for the talk given to the BigData.SG meetup on April 20. Bamboo is a prototype of a platform that provides Hadoop as A service. It instruments the provisioning of virtual hadoop clusters and provides a web GUI with which users use drag-and-drop blocks to design Hadoop jobs.

Transcript of Bamboo: An effort towards Hadoop as a Service

Page 1: Bamboo: An effort towards Hadoop as a Service

Bamboo An effort towards Hadoop As A Service

© Copyright 2012 HP Lab Singapore 1

Page 2: Bamboo: An effort towards Hadoop as a Service

Apache™ Hadoop™

• Inspired by Google Map/Reduce and Google File System

• Reliable and Scalable

• Open-source

• Wide adoption and active user community

© Copyright 2012 HP Lab Singapore 2

Page 4: Bamboo: An effort towards Hadoop as a Service

After that …

1. Find a few commodity servers

2. Install Linux systems

3. Setup the network: Wiring, configuring interfaces, SSH, etc.

4. Install JDK and Hadoop packages

5. Initialize HDFS

6. Start to write test code

© Copyright 2012 HP Lab Singapore 4

Page 5: Bamboo: An effort towards Hadoop as a Service

But, not everyone wants to be a Hadoop expert

© Copyright 2012 HP Lab Singapore 5

Page 6: Bamboo: An effort towards Hadoop as a Service

Two Problems

1. System infrastructure setup

2. Translate works into Map/Reduce jobs

© Copyright 2012 HP Lab Singapore 6

Page 7: Bamboo: An effort towards Hadoop as a Service

Hadoop In Public Cloud

© Copyright 2012 HP Lab Singapore

• Only simplifies the infrastructure setup • Targeted end-users are experienced Hadoop

users

7

Page 8: Bamboo: An effort towards Hadoop as a Service

How to enable technical people, knowing little about Hadoop, to

utilize the power of Hadoop?

© Copyright 2012 HP Lab Singapore 8

Page 9: Bamboo: An effort towards Hadoop as a Service

© Copyright 2012 HP Lab Singapore

Bamboo!

9

Page 10: Bamboo: An effort towards Hadoop as a Service

What Is Bamboo Made Of

© Copyright 2012 HP Lab Singapore

Job Assembler

Deployment & Configuration

Provisioning

10

Page 11: Bamboo: An effort towards Hadoop as a Service

Provisioning System

© Copyright 2012 HP Lab Singapore 11

Page 12: Bamboo: An effort towards Hadoop as a Service

KVM Based Virtualization

© Copyright 2012 HP Lab Singapore

Core

Driver

Web Service

12

Page 13: Bamboo: An effort towards Hadoop as a Service

Why To Have Our Own Provisioning System?

© Copyright 2012 HP Lab Singapore 13

• We don’t want to have dedicated hypervisor servers • Our private cloud should require minimum changes

to both the servers and network devices • We need to support more users with limited

resources • A simple way to customize VMs is required • We can

Because

Page 14: Bamboo: An effort towards Hadoop as a Service

Use Overlay Images

© Copyright 2012 HP Lab Singapore

$ qemu-img create -f qcow2 -b <base image>.raw <new image>.qcow2

Base Image

Vm-0.qcow2 Vm-1.qcow2 Vm-n.qcow2

• Reduce image preparation time • Identical VM images cross the whole cluster • Over-provisioning is possible

• Slight IO performance penalty

14

Page 15: Bamboo: An effort towards Hadoop as a Service

Deployment & Configuration

© Copyright 2012 HP Lab Singapore

Deploy/Configure before clusters booting up

Mount VM images to hypervisors

chroot to the mounted partition

Deploy/configure packages in the VM

15

Page 16: Bamboo: An effort towards Hadoop as a Service

Mount VM Images To System

© Copyright 2012 HP Lab Singapore

$ losetup /dev/loop0 <image>.raw

$ kpartx -a /dev/loop0

$ mount /dev/mapper/loop0p1 <mount point>

Raw images:

$ qemu-nbd -c /dev/nbd0 <image>.qcow2

$ mount /dev/nbd0p1 <mount point>

Qcow2 images:

$ umount -l <mount point>

$ kpartx -d /dev/loop0

$ losetup -d /dev/loop0

$ umount -l <mount point>

$ qemu-nbd -d /dev/nbd0

Mount Umount

Mount Umount

16

Page 17: Bamboo: An effort towards Hadoop as a Service

Extend To Other Cloud Platforms

© Copyright 2012 HP Lab Singapore 17

Page 18: Bamboo: An effort towards Hadoop as a Service

Job Assembler

© Copyright 2012 HP Lab Singapore 18

Page 19: Bamboo: An effort towards Hadoop as a Service

System Structure

© Copyright 2012 HP Lab Singapore

Web-base GUI

Code Generator

Coordinator

Dataflow graph in JSON

19

Page 20: Bamboo: An effort towards Hadoop as a Service

Web-Based GUI

© Copyright 2012 HP Lab Singapore

• An open-source javascript library (MIT License) • Convenient in creating dataflow graphs • Support drag-and-drop editing

20

Page 21: Bamboo: An effort towards Hadoop as a Service

© Copyright 2012 HP Lab Singapore 21

Page 22: Bamboo: An effort towards Hadoop as a Service

Code Generator

© Copyright 2012 HP Lab Singapore

A Python package under BSD License

>>> import networkx as nx

>>> G=nx.Graph()

>>> G.add_node("spam")

>>> G.add_edge(1,2)

>>> print(G.nodes())

[1, 2, 'spam']

>>> print(G.edges())

[(1, 2)]

22

Page 23: Bamboo: An effort towards Hadoop as a Service

Dataflow = Directed Acyclic Graph (DAG)

© Copyright 2012 HP Lab Singapore 23

3

2

4 1

5

Edges: Dependency

Nodes: Map/Reduce Operations

Topological Sorting

5 3 1 4 2

Page 24: Bamboo: An effort towards Hadoop as a Service

Messaging Queue

Driver

Job Coordinator

© Copyright 2012 HP Lab Singapore

Web Service

24

Page 25: Bamboo: An effort towards Hadoop as a Service

© Copyright 2012 HP Lab Singapore 25

Client

NewbieTest

X

“hadoop”

Q: “

New

bie

Test

Q: “

KD

D2

01

2”

Client

KDD2012

Coordinator

MongoDB

Driven By Messages

Page 26: Bamboo: An effort towards Hadoop as a Service

Run A Hadoop Job

© Copyright 2012 HP Lab Singapore 26

Page 27: Bamboo: An effort towards Hadoop as a Service

© Copyright 2012 HP Lab Singapore 27

Track 1: Predict which users, one might follow in Tencent Weibo (one of the largest micro-blogging websites in China)

Dataset User profiles, demographics, follow history, social graph, item categories of 200 millions registered users

Page 28: Bamboo: An effort towards Hadoop as a Service

Feature Extraction

© Copyright 2012 HP Lab Singapore 28

User Target Action Re-tweet Comment

1000006 1675399 0 53 12

1000006 1760322 2 21 0

User actions

User Target Action Re-tweet Comment

1000006 1675399 0% 1% 3%

1000006 1760322 10% 100% 0%

Page 29: Bamboo: An effort towards Hadoop as a Service

REST API

Hadoop As A Service

© Copyright 2012 HP Lab Singapore

Authentication

Bamboo

Individuals

Applications

Other Services

29

Page 30: Bamboo: An effort towards Hadoop as a Service

Thank You

© Copyright 2012 HP Lab Singapore 30

Dr. Liu Xiaohui HP Labs Singapore [email protected]