Introduction to the Hadoop Ecosystem (FrOSCon Edition)

80
Introduction to the Hadoop Ecosystem

Transcript of Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Page 1: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013

Introduction to the Hadoop Ecosystem

uweseiler

Page 2: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 About me

Big Data Nerd

TravelpiratePhotography Enthusiast

Hadoop Trainer MongoDB Author

Page 3: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 About us

is a bunch of…

Big Data Nerds Agile Ninjas Continuous Delivery Gurus

Enterprise Java Specialists Performance Geeks

Join us!

Page 4: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Agenda

• What is Big Data & Hadoop?

• Core Hadoop

• The Hadoop Ecosystem

• Use Cases

• What‘s next? Hadoop 2.0!

Page 5: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Why Big Data?

The volume of datasets is constantly growing…

Page 6: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Volume

2008

200 PB a day

2009

2,5 PB user data15 TB a day

2009

6,5 PB User Data

50 TBa day

2011~200 PB Data

Page 7: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Why Big Data?

The velocity of data generation is getting faster and faster…

Page 8: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Velocity

Page 9: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Why Big Data?

The variety of data is increasing…

Page 10: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Variety

Structured data

Semi-structured data Unstruct

ured data

Page 11: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 The 3 V’s of Big Data

VarietyVolume Velocity

Page 12: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 My favorite definition

Page 13: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Why Hadoop?

Traditional dataStores are expensive to scale and by Design difficult to Distribute

Scale out is the way to go!

Page 14: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 How to scale data?

“Data“

r� r�

“Result“

w� w�

worker workerworker

w�

r�

Page 15: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 But…

Parallel processing is complicated!

Page 16: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 But…

Data storage is not trivial!

Page 17: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 What is Hadoop?

Distributed Storage and Computation Framework

Page 18: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 What is Hadoop?

«Big Data» != Hadoop

Page 19: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 What is Hadoop?

Hadoop != Database

Page 20: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 What is Hadoop?

Page 21: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 What is Hadoop?

“Swiss army knife of the 21st century”

http://www.guardian.co.uk/technology/2011/mar/25/media-guardian-innovation-awards-apache-hadoop

Page 22: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 The Hadoop App Store

HDFS MapRed HCat Pig Hive HBase Ambari Avro Cassandra

Chukwa

Intel

Sync

Flume Hana HyperT Impala Mahout Nutch Oozie Scoop

Scribe Tez Vertica Whirr ZooKee Horton Cloudera MapR EMC

IBM Talend TeraData Pivotal Informat Microsoft. Pentaho Jasper

Kognitio Tableau Splunk Platfora Rack Karma Actuate MicStrat

Page 23: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013

Functionalityless more

ApacheHadoop

HadoopDistributions

Big DataSuites

• HDFS• MapReduce• Hadoop Ecosystem• Hadoop YARN

• Test & Packaging• Installation• Monitoring• Business Support

+• Integrated Environment• Visualization• (Near-)Realtime analysis• Modeling• ETL & Connectors

+

The Hadoop App Store

Page 24: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 The essentials …

Core Hadoop

Page 25: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Data Storage

OK, first things first!

I want to store all of my <<Big Data>>

Page 26: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Data Storage

Page 27: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Hadoop Distributed File System

• Distributed file system for redundant storage

• Designed to reliably store data on commodity hardware

• Built to expect hardware failures

Page 28: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Hadoop Distributed File System

Intended for • large files• batch inserts

Page 29: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 HDFS Architecture

NameNode

Master

Block Map

Slave Slave Slave

Rack 1 Rack 2

Journal Log

DataNode DataNode DataNode

File

Client

Secondary NameNode

Helper

periodical merges#1 #2

#1 #1 #1

Page 30: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Data Processing

Data stored, check!

Now I want to create insights from my data!

Page 31: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Data Processing

Page 32: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 MapReduce

• Programming model for distributed computations at a massive scale

• Execution framework for organizing and performing such computations

• Data locality is king

Page 33: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Typical large-data problem

• Iterate over a large number of records

• Extract something of interest from each

• Shuffle and sort intermediate results

• Aggregate intermediate results

• Generate final output

Map

Reduce

Page 34: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 MapReduce Flow

�� �� �� �� �� �� �� �� �� ���� ��

Combine Combine Combine Combine

a � b 2 c 9 a 3 c 2 b 7 c 8

Partition Partition Partition Partition

Shuffle and Sort

Map Map Map Mapa � b 2 c 3 c 6 a 3 c 2 b 7 c 8

a 1 3 b � 7 c 2 8 9

Reduce Reduce Reduce

a 4 b 9 c 19

Page 35: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Combined Hadoop Architecture

Client

NameNode

Master

Slave

TaskTracker

Secondary NameNode

Helper

JobTracker

DataNode

File

Job

Block

Task

Slave

TaskTracker

DataNode

Block

Task

Slave

TaskTracker

DataNode

Block

Task

Page 36: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Word Count Mapper in Java

public class WordCountMapper extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>

{

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter) throws IOException

{

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

while (tokenizer.hasMoreTokens())

{

word.set(tokenizer.nextToken());

output.collect(word, one);

}

}

}

Page 37: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Word Count Reducer in Java

public class WordCountReducer extends MapReduceBase

implements Reducer<Text, IntWritable, Text, IntWritable>

{

public void reduce(Text key, Iterator values, OutputCollectoroutput, Reporter reporter) throws IOException

{

int sum = 0;

while (values.hasNext())

{

IntWritable value = (IntWritable) values.next();

sum += value.get();

}

output.collect(key, new IntWritable(sum));

}

}

Page 38: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Scripting for Hadoop

Java for MapReduce? I dunno, dude…

I’m more of a scripting guy…

Page 39: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Scripting for Hadoop

Page 40: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Apache Pig

• High-level data flow language

• Made of two components:• Data processing language Pig Latin• Compiler to translate Pig Latin to

MapReduce

Page 41: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Pig in the Hadoop ecosystem

HDFSHadoop Distributed File System

MapReduceDistributed Programming Framework

HCatalogMetadata Management

PigScripting

Page 42: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Pig Latin

users = LOAD 'users.txt' USING PigStorage(',') AS (name, age);

pages = LOAD 'pages.txt' USING PigStorage(',') AS (user, url);

filteredUsers = FILTER users BY age >= 18 and age <=50;

joinResult = JOIN filteredUsers BY name, pages by user;

grouped = GROUP joinResult BY url;

summed = FOREACH grouped GENERATE group, COUNT(joinResult) as clicks;

sorted = ORDER summed BY clicks desc;

top10 = LIMIT sorted 10;

STORE top10 INTO 'top10sites';

Page 43: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Pig Execution Plan

Page 44: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Try that with Java…

Page 45: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 SQL for Hadoop

OK, Pig seems quite useful…

But I’m more of a SQL person…

Page 46: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 SQL for Hadoop

Page 47: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Apache Hive

• Data Warehousing Layer on top of Hadoop

• Allows analysis and queries using a SQL-like language

Page 48: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Hive in the Hadoop ecosystem

HDFSHadoop Distributed File System

MapReduceDistributed Programming Framework

HCatalogMetadata Management

PigScripting

HiveQuery

Page 49: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Hive Architecture

Hive

Hive Engine

HDFS

MapReduce

Meta-store

Thrift Applications

JDBC Applications

ODBC Applications

Hive Thrift Driver

Hive JDBC Driver

Hive ODBC Driver

Hive ServerH

ive

Sh

ell

Page 50: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Hive Example

CREATE TABLE users(name STRING, age INT);

CREATE TABLE pages(user STRING, url STRING);

LOAD DATA INPATH '/user/sandbox/users.txt' INTO TABLE 'users';

LOAD DATA INPATH '/user/sandbox/pages.txt' INTO TABLE 'pages';

SELECT pages.url, count(*) AS clicks FROM users JOIN pages ON (users.name = pages.user)

WHERE users.age >= 18 AND users.age <= 50

GROUP BY pages.url

SORT BY clicks DESC

LIMIT 10;

Page 51: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 But there’s still more…

More components of theHadoop Ecosystem

Page 52: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013

HDFSData storage

MapReduceData processing

HCatalogMetadata Management

PigScripting

HiveSQL-like queries

HB

ase

No

SQ

L D

ata

base

MahoutMachine Learning

ZooK

eeper

Clu

ster C

oo

rdin

atio

n

ScoopImport & Export of relational data

Am

ba

riC

luste

r insta

llatio

n&

man

ag

em

en

t

Oozie

Wo

rkflo

w a

uto

matiz

atio

n

FlumeImport & Export of data flows

Page 53: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Bringing it all together…

Use Cases

Page 54: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013

Da

ta S

ourc

esD

ata

Sys

tem

sA

pp

lica

tion

s

Traditional Sources

RDBMS OLTP OLAP …

Traditional Systems

RDBMS EDW MPP …

BusinessIntelligence

BusinessApplications

CustomApplications

Operation

Manage &

Monitor

Dev Tools

Build &

Test

Classical enterprise platform

Page 55: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013

Da

ta S

ourc

esD

ata

Sys

tem

sA

pp

lica

tion

s

Traditional Sources

RDBMS OLTP OLAP …

Traditional Systems

RDBMS EDW MPP …

BusinessIntelligence

BusinessApplications

CustomApplications

Operation

Manage &

Monitor

Dev Tools

Build &

Test

New Sources

Logs Mails Sensor …SocialMedia

EnterpriseHadoopPlattform

Big Data Platform

Page 56: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013

Da

ta S

ourc

esD

ata

Sys

tem

sA

pp

lica

tion

s

Traditional Sources

RDBMS OLTP OLAP …

Traditional Systems

RDBMS EDW MPP …

BusinessIntelligence

BusinessApplications

CustomApplications

New Sources

Logs Mails Sensor …SocialMedia

EnterpriseHadoopPlattform

1

23

4

1

2

3

4

Capture all data

Processthe data

Exchange usingtraditional systems

Process & Visualizewithtraditional applications

Pattern #1: Refine data

Page 57: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013

Da

ta S

ourc

esD

ata

Sys

tem

sA

pp

lica

tion

s

Traditional Sources

RDBMS OLTP OLAP …

Traditional Systems

RDBMS EDW MPP …

BusinessIntelligence

BusinessApplications

CustomApplications

New Sources

Logs Mails Sensor …SocialMedia

EnterpriseHadoopPlattform

1

2

31

2

3

Captureall data

Processthe data

Explore thedata usingapplicationswith supportfor Hadoop

Pattern #2: Explore data

Page 58: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013

Da

ta S

ourc

esD

ata

Sys

tem

sA

pp

lica

tion

s

Traditional Sources

RDBMS OLTP OLAP …

Traditional Systems

RDBMS EDW MPP …

BusinessApplications

CustomApplications

New Sources

Logs Mails Sensor …SocialMedia

EnterpriseHadoopPlattform

1

3 1

2

3

Capture all data

Processthe data

Directlyingest thedata

Pattern #3: Enrich data

2

Page 59: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Bringing it all together…

One example…

Page 60: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Digital Advertising

• 6 billion ad deliveries per day

• Reports (and bills) for the advertising companies needed

• Own C++ solution did not scale

• Adding functions was a nightmare

Page 61: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013

CampaignDatabase

FFM AMS

TCP Interface

TCP Interface

Custom Flume Source

Custom Flume Source

Flume HDFS Sink

Local files

CampaignData

Hadoop Cluster

BinaryLog Format

Synchronisation

Pig Hive

Temporäre Daten

NAS

Aggregateddata

Report Engine

DirectDownload

Job Scheduler

Config UI Job ConfigXML

Start

Ad

Ser

ver

Ad

Ser

ver

AdServing Architecture

Page 62: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 What’s next?

Hadoop 2.0aka YARN

Page 63: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013

HDFS

Hadoop 1.0

Built for web-scale batch apps

HDFS HDFS

Single App

Batch

Single App

Batch

Single App

Batch

Single App

Batch

Single App

Batch

Page 64: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 MapReduce is good for…

• Embarrassingly parallel algorithms

• Summing, grouping, filtering, joining

• Off-line batch jobs on massive datasets

• Analyzing an entire large dataset

Page 65: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 MapReduce is OK for…

• Iterative jobs (i.e., graph algorithms)• Each iteration must read/write data to

disk• I/O and latency cost of an iteration is

high

Page 66: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 MapReduce is not good for…

• Jobs that need shared state/coordination• Tasks are shared-nothing• Shared-state requires scalable state store

• Low-latency jobs

• Jobs on small datasets

• Finding individual records

Page 67: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 MapReduce limitations

• Scalability– Maximum cluster size ~ 4,500 nodes – Maximum concurrent tasks – 40,000– Coarse synchronization in JobTracker

• Availability– Failure kills all queued and running jobs

• Hard partition of resources into map & reduce slots– Low resource utilization

• Lacks support for alternate paradigms and services – Iterative applications implemented using MapReduce are 10x

slower

Page 68: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013

Hadoop 1.0

HDFSRedundant, reliable

storage

Hadoop 2.0: Next-gen platform

MapReduceCluster resource mgmt

+ data processing

Hadoop 2.0

HDFS 2.0Redundant, reliable storage

MapReduceData processing

Single use systemBatch Apps

Multi-purpose platformBatch, Interactive, Streaming, …

YARNCluster resource management

OthersData processing

Page 69: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 YARN: Taking Hadoop beyond batch

Applications run natively in Hadoop

HDFS 2.0Redundant, reliable storage

BatchMapReduce

Store all data in one placeInteract with data in multiple ways

YARNCluster resource management

InteractiveTez

OnlineHOYA

StreamingStorm, …

GraphGiraph

In-MemorySpark

OtherSearch,

Page 70: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 A brief history of YARN

• Originally conceived & architected by the team at Yahoo! – Arun Murthy created the original JIRA in 2008 and now is

the YARN release manager

• The team at Hortonworks has been working on YARN for 4 years: – 90% of code from Hortonworks & Yahoo!

• YARN based architecture running at scale at Yahoo! – Deployed on 35,000 nodes for 6+ months

• Going GA at the end of 2013?

Page 71: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 YARN concepts

• Application – Application is a job submitted to the framework – Example: Map Reduce job

• Container – Basic unit of allocation – Fine-grained resource allocation across multiple

resources (memory, CPU, disk, network, GPU, …) • container_0 = 2GB, 1CPU • container_1 = 1GB, 6 CPU

– Replaces the fixed map/reduce slots

Page 72: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 YARN architecture

Split up the two major functions of the JobTrackerCluster resource management & Application life-cycle management

ResourceManager

NodeManager NodeManager NodeManager NodeManager

NodeManager NodeManager NodeManager NodeManager

Scheduler

AM 1

Container 1.2

Container 1.1

AM 2

Container 2.1

Container 2.2

Container 2.3

Page 73: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 YARN architecture

• Resource Manager – Global resource scheduler – Hierarchical queues

• Node Manager – Per-machine agent – Manages the life-cycle of container – Container resource monitoring

• Application Master – Per-application – Manages application scheduling and task execution – e.g. MapReduce Application Master

Page 74: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 YARN architecture

ResourceManager

NodeManager NodeManager NodeManager NodeManager

NodeManager NodeManager NodeManager NodeManager

Scheduler

MapReduce 1

map 1.2

map 1.1

MapReduce 2

map 2.1

map 2.2

reduce 2.1

NodeManager NodeManager NodeManager NodeManager

reduce 1.1 Tez map 2.3

reduce 2.2

vertex 1

vertex 2

vertex 3

vertex 4

HOYA

HBase Master

Region server 1

Region server 2

Region server 3 Storm

nimbus 1

nimbus 2

Page 75: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 YARN summary

1. Scale

2. New programming models & Services

3. Improved cluster utilization

4. Agility

5. Beyond Java

Page 76: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Getting started…

One more thing…

Page 77: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 User Groups

HUG Rhein-Ruhr (Düsseldorf)– https://www.xing.com/net/hugrheinruhr/

HUG Rhein-Main (Frankfurt)– https://www.xing.com/net/hugrheinmain/

– http://www.meetup.com/HUG-Rhein-Main/

Big Data Beers (Berlin)– http://www.meetup.com/Big-Data-Beers/

HUG München– http://www.meetup.com/Hadoop-User-Group-Munich/

HUG Karlsruhe/Stuttgart– http://www.meetup.com/Hadoop-and-Big-Data-User-Group-in-Karlsruhe-Stuttgart/

Page 78: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Books about Hadoop

Hadoop, The Definite Guide; Tom White; 3rd ed.; O’Reilly; 2012.

Hadoop in Action; Chuck Lam; Manning; 2011.

Programming Pig; Alan Gates; O’Reilly; 2011.

Hadoop Operations; Eric Sammer; O’Reilly; 2012.

Page 79: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Hortonworks Sandbox

http://hortonworks.com/products/hortonworsk-sandbox

Page 80: Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Sankt Augustin24-25.08.2013 Hadoop Training

• Programming with Hadoop• 4-day class• 16.09. – 19.09.2013, München• 28.10. – 31.10.2013, Frankfurt• 02.12. – 05.12.2013, Düsseldorf

• Administration of Hadoop• 3-day class• 23. – 25.09.2013, München• 04. – 06.11.2013, Frankfurt• 09. – 11.12.2013, Düsseldorf

http://www.codecentric.de/portfolio/schulungen-und-workshops