Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

24
Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan

Transcript of Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

Page 1: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

Apache Hadoopon the

Open Cloud

David Dobbins

Nirmal Ranganathan

Page 2: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

Who is using Apache Hadoop

•Traditionally = Developers

•Increasingly = Business Users / Data Scientists

•Why does this matter?

Page 3: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

3

Configuring and managing a Hadoop cluster is hard

Page 4: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

4

Resources / Expertise

Page 5: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

5

Multiple Performance and Design Variables

Page 6: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

6

The Cloud solves some of these

Page 7: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

7

Advantages of using the cloud

FastEasy

Flexible

Page 8: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

8

You still require expertise

Page 9: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

9

Lets check out another option

Page 10: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

10

Hadoop in the Cloud Use Cases

Page 11: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

11

Development / POC Clusters

Page 12: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

12

Dynamic Clusters

Page 13: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

13

Growth Clusters

Page 14: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

14

Your data is already in the Cloud

Page 15: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

15

Demo

Run an actual job

Page 16: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

16

Swift Filesystem for Hadoop: HADOOP-8545

•New filesystem URL, swift://•Read from, write to local & remote Swift clusters

•Keep long-lived data in Swift; upload while Hadoop cluster off-line

The challenges of running Map Reduce jobs against Swift..

• Identity management

• Block size

• Object store vs file paths

• Direct API into swift from HDFS

Page 17: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

17

Map Reduce to Swift (via “HDFS”)

HDFS

MapReduce

Application X

HDFS Proxy

MapReduce

Application X

SWIFT

Page 18: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

18

Hadoop + Openstack

Page 19: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

19

Cloud Big Data Platform

•Hortonworks Data Platform• HDP 1.1

• HDP 1.3

• Pig, Hive, HCatalog

• Coming soon HDP 2.0

Page 20: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

20

Cloud Big Data Platform

•Secure by default

•Comes pre-optimized

•Web UI, CLI, REST API

Page 21: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

21

Built on Openstack

Page 22: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

22

Why an Open Platform mattersSandbox on

Rackspace Cloud

Sandbox

VM

RAX

Resell

Page 23: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

Cool stuff

Page 24: Apache Hadoop on the Open Cloud David Dobbins Nirmal Ranganathan.

@caffiend@rnirmal

http://www.rackspace.com/big-data