Riding the Elephant - Hadoop 2.0

Post on 06-May-2015

218 views 4 download

description

Hadoop 2.0, and in particular YARN has opened up a lot of potential applications beyond MapReduce. This presentation explains some of the ways this happened, and what you can now do that you couldn't before. It also introduces some new tools (Spark) and infrastructure pieces (Mesos) to achieve even more efficient cluster use.

Transcript of Riding the Elephant - Hadoop 2.0

Simon Elliston Ball Head of Big Data - Red Gate Ventures

@sireb

Riding the Elephant: Hadoop 2.0

http://bit.ly/RidingElephants

Append only distributed file-system

In the beginning…

Map Reduce

Java.

JVM Based (scala, groovy, jython, clojure)

More languages

Streaming (python, whatever)HDP for Windows and .NET SDK

Abstraction

Photo: https://www.flickr.com/photos/puroticorico/

Hive, Pig

Cascading

Scalding

SQL on Hadoop

Learning to share the toys

HBase

Solr on Hadoop

Sharing HDFS…

Map Reduce v1

JobTracker

Job

Head Node

TaskTrackerTask (Map /

Reduce)

Data Node

m slot 1

m slot 2

…m slot

n

Task

Task

Task

r slot 1

r slot 2

…r slot

nTaskTrack

erTask (Map / Reduce)

Data Node

m slot 1

m slot 2

…m slot

n

r slot 1

r slot 2

…r slot

n

TaskTrackerTask (Map /

Reduce)

Data Node

m slot 1

m slot 2

…m slot

n

r slot 1

r slot 2

…r slot

n

Map Reduce v1

JobTracker

Job

Head Node

TaskTrackerTask (Map /

Reduce)

Data Node

m slot 1

m slot 2

…m slot

n

MR Status

MR Status

MR

Statu

s

r slot 1

r slot 2

…r slot

nTaskTrack

erTask (Map / Reduce)

Data Node

m slot 1

m slot 2

…m slot

n

r slot 1

r slot 2

…r slot

n

TaskTrackerTask (Map /

Reduce)

Data Node

m slot 1

m slot 2

…m slot

n

r slot 1

r slot 2

…r slot

n

Typical Hadoop 1.x setup

HBase

Production

Adhoc

Typical Hadoop 1.x setup

HBase

Production

Adhoc

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

YARN architecture

Container

Application

Master

Container

Data Node

Node Manager

Container

Container

Container

Data Node

Node Manager

Application

Master

Container

Free Slot

Data Node

Node Manager

ResourceManager

YARN Client

Removing the choke point

Advantages

60%-150% better usageLong running applications

Not quite…

Operating system for Big Data?

Security

…but a framework for Big Data Apps

Data Access abstraction

Storm on YARN

A whole batch of new applications

HOYA

Tez (Stinger)

MapReduce 2

Giraph

<Insert your application here>

Batch applications

Spinning YARNs with Spring

Services

Direct to YARN APIs

Spring Data Hadoop abstraction

Streaming

Why?

Machine Learning

Graphs

Services

Distributed Shell - Anything.

Spark

A higher abstraction

Hadoop based?

… but can run on YARN

In MemoryDistributedFault tolerantReal-time

✓✓✓

✓❌

RRDs

Mesos

Wider sharing

Hadoop

Spark

Aurora

Mesos Framework

Hardware

YARN

MapReduce

HBase

etc

HDFS

Hadoop is more than MapReduce

The new world

YARN opens up new paradigms

Infrastructure maturing: better sharing

Hadoop and beyond!

Thank you

Questions?Simon Elliston Ball Head of Big Data - Red Gate Ventures

@sirebsimon@simonellistonball.com

http://bit.ly/RidingElephants