Hadoop, Hadoop, Hadoop!!! Jerome Mitchell Indiana University.
Bamboo: An effort towards Hadoop as a Service
-
Upload
xiaohui-liu -
Category
Technology
-
view
1.343 -
download
3
description
Transcript of Bamboo: An effort towards Hadoop as a Service
Bamboo An effort towards Hadoop As A Service
© Copyright 2012 HP Lab Singapore 1
Apache™ Hadoop™
• Inspired by Google Map/Reduce and Google File System
• Reliable and Scalable
• Open-source
• Wide adoption and active user community
© Copyright 2012 HP Lab Singapore 2
The Hadoop Journey Often Starts With
• Online materials
– Yahoo! Hadoop Tutorial
– Apache Map/Reduce Tutorial
– Cloudera Training
• Books
© Copyright 2012 HP Lab Singapore 3
After that …
1. Find a few commodity servers
2. Install Linux systems
3. Setup the network: Wiring, configuring interfaces, SSH, etc.
4. Install JDK and Hadoop packages
5. Initialize HDFS
6. Start to write test code
© Copyright 2012 HP Lab Singapore 4
But, not everyone wants to be a Hadoop expert
© Copyright 2012 HP Lab Singapore 5
Two Problems
1. System infrastructure setup
2. Translate works into Map/Reduce jobs
© Copyright 2012 HP Lab Singapore 6
Hadoop In Public Cloud
© Copyright 2012 HP Lab Singapore
• Only simplifies the infrastructure setup • Targeted end-users are experienced Hadoop
users
7
How to enable technical people, knowing little about Hadoop, to
utilize the power of Hadoop?
© Copyright 2012 HP Lab Singapore 8
© Copyright 2012 HP Lab Singapore
Bamboo!
9
What Is Bamboo Made Of
© Copyright 2012 HP Lab Singapore
Job Assembler
Deployment & Configuration
Provisioning
10
Provisioning System
© Copyright 2012 HP Lab Singapore 11
KVM Based Virtualization
© Copyright 2012 HP Lab Singapore
Core
Driver
Web Service
12
Why To Have Our Own Provisioning System?
© Copyright 2012 HP Lab Singapore 13
• We don’t want to have dedicated hypervisor servers • Our private cloud should require minimum changes
to both the servers and network devices • We need to support more users with limited
resources • A simple way to customize VMs is required • We can
Because
Use Overlay Images
© Copyright 2012 HP Lab Singapore
$ qemu-img create -f qcow2 -b <base image>.raw <new image>.qcow2
Base Image
Vm-0.qcow2 Vm-1.qcow2 Vm-n.qcow2
• Reduce image preparation time • Identical VM images cross the whole cluster • Over-provisioning is possible
• Slight IO performance penalty
14
Deployment & Configuration
© Copyright 2012 HP Lab Singapore
Deploy/Configure before clusters booting up
Mount VM images to hypervisors
chroot to the mounted partition
Deploy/configure packages in the VM
15
Mount VM Images To System
© Copyright 2012 HP Lab Singapore
$ losetup /dev/loop0 <image>.raw
$ kpartx -a /dev/loop0
$ mount /dev/mapper/loop0p1 <mount point>
Raw images:
$ qemu-nbd -c /dev/nbd0 <image>.qcow2
$ mount /dev/nbd0p1 <mount point>
Qcow2 images:
$ umount -l <mount point>
$ kpartx -d /dev/loop0
$ losetup -d /dev/loop0
$ umount -l <mount point>
$ qemu-nbd -d /dev/nbd0
Mount Umount
Mount Umount
16
Extend To Other Cloud Platforms
© Copyright 2012 HP Lab Singapore 17
Job Assembler
© Copyright 2012 HP Lab Singapore 18
System Structure
© Copyright 2012 HP Lab Singapore
Web-base GUI
Code Generator
Coordinator
Dataflow graph in JSON
19
Web-Based GUI
© Copyright 2012 HP Lab Singapore
• An open-source javascript library (MIT License) • Convenient in creating dataflow graphs • Support drag-and-drop editing
20
© Copyright 2012 HP Lab Singapore 21
Code Generator
© Copyright 2012 HP Lab Singapore
A Python package under BSD License
>>> import networkx as nx
>>> G=nx.Graph()
>>> G.add_node("spam")
>>> G.add_edge(1,2)
>>> print(G.nodes())
[1, 2, 'spam']
>>> print(G.edges())
[(1, 2)]
22
Dataflow = Directed Acyclic Graph (DAG)
© Copyright 2012 HP Lab Singapore 23
3
2
4 1
5
Edges: Dependency
Nodes: Map/Reduce Operations
Topological Sorting
5 3 1 4 2
Messaging Queue
Driver
Job Coordinator
© Copyright 2012 HP Lab Singapore
Web Service
24
© Copyright 2012 HP Lab Singapore 25
Client
NewbieTest
X
“hadoop”
Q: “
New
bie
Test
”
Q: “
KD
D2
01
2”
Client
KDD2012
Coordinator
MongoDB
Driven By Messages
Run A Hadoop Job
© Copyright 2012 HP Lab Singapore 26
© Copyright 2012 HP Lab Singapore 27
Track 1: Predict which users, one might follow in Tencent Weibo (one of the largest micro-blogging websites in China)
Dataset User profiles, demographics, follow history, social graph, item categories of 200 millions registered users
Feature Extraction
© Copyright 2012 HP Lab Singapore 28
User Target Action Re-tweet Comment
1000006 1675399 0 53 12
1000006 1760322 2 21 0
…
User actions
User Target Action Re-tweet Comment
1000006 1675399 0% 1% 3%
1000006 1760322 10% 100% 0%
…
REST API
Hadoop As A Service
© Copyright 2012 HP Lab Singapore
Authentication
Bamboo
Individuals
Applications
Other Services
29