Introduction to the Hadoop Ecosystem (SEACON Edition)

Introduction to the Hadoop ecosystem

About me

About us

Why Hadoop?

How to scale data?

w1 w2 w3

r1 r2 r3

But…

What is Hadoop?

The Hadoop App Store

HDFS MapRed HCat Pig Hive HBase Ambari Avro Cassandra

Chukwa

Flume Hana HyperT Impala Mahout Nutch Oozie Scoop

Scribe Tez Vertica Whirr ZooKee Horton Cloudera MapR EMC

IBM Talend TeraData Pivotal Informat Microsoft. Pentaho Jasper

Kognitio Tableau Splunk Platfora Rack Karma Actuate MicStrat

Data Storage

Hadoop Distributed File System

HDFS Architecture

Data Processing

MapReduce

Typical large-data problem

MapReduce Flow

𝐤𝟏 𝐯𝟏 𝐤𝟐 𝐯𝟐 𝐤𝟒 𝐯𝟒 𝐤𝟓 𝐯𝟓 𝐤𝟔 𝐯𝟔 𝐤𝟑 𝐯𝟑

a 𝟏 b 2 c 9 a 3 c 2 b 7 c 8

a 𝟏 b 2 c 3 c 6 a 3 c 2 b 7 c 8

a 1 3 b 𝟐 7 c 2 8 9

a 4 b 9 c 19

Combined Hadoop Architecture

Word Count Mapper in Java

public class WordCountMapper extends MapReduceBase implements

Mapper<LongWritable, Text, Text, IntWritable>

private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(LongWritable key, Text value, OutputCollector<Text,

IntWritable> output, Reporter reporter) throws IOException

String line = value.toString();

StringTokenizer tokenizer = new StringTokenizer(line);

while (tokenizer.hasMoreTokens())

word.set(tokenizer.nextToken());

output.collect(word, one);

Word Count Reducer in Java

public class WordCountReducer extends MapReduceBase

implements Reducer<Text, IntWritable, Text, IntWritable>

public void reduce(Text key, Iterator values, OutputCollector

output, Reporter reporter) throws IOException

int sum = 0;

while (values.hasNext())

IntWritable value = (IntWritable) values.next();

sum += value.get();

output.collect(key, new IntWritable(sum));

Scripting for Hadoop

Apache Pig

••

Pig in the Hadoop ecosystem

Distributed Programming Framework

Metadata Management

Scripting

Pig Latin

users = LOAD 'users.txt' USING PigStorage(',') AS (name,

pages = LOAD 'pages.txt' USING PigStorage(',') AS (user,

filteredUsers = FILTER users BY age >= 18 and age <=50;

joinResult = JOIN filteredUsers BY name, pages by user;

grouped = GROUP joinResult BY url;

summed = FOREACH grouped GENERATE group,

COUNT(joinResult) as clicks;

sorted = ORDER summed BY clicks desc;

top10 = LIMIT sorted 10;

STORE top10 INTO 'top10sites';

Pig Execution Plan

Try that with Java…

SQL for Hadoop

Apache Hive

Hive in the Hadoop ecosystem

Distributed Programming Framework

Metadata Management

Scripting Query

Hive Architecture

Hive Example

CREATE TABLE users(name STRING, age INT);

CREATE TABLE pages(user STRING, url STRING);

LOAD DATA INPATH '/user/sandbox/users.txt' INTO

TABLE 'users';

LOAD DATA INPATH '/user/sandbox/pages.txt' INTO

TABLE 'pages';

SELECT pages.url, count(*) AS clicks FROM users JOIN

pages ON (users.name = pages.user)

WHERE users.age >= 18 AND users.age <= 50

GROUP BY pages.url

SORT BY clicks DESC

LIMIT 10;

Bringing it all together…

Online AdServing

AdServing Architecture

Getting started…

Hortonworks Sandbox

Hadoop Training

••

Introduction to the Hadoop Ecosystem (SEACON Edition)

Technology

Transcript of Introduction to the Hadoop Ecosystem (SEACON Edition)

Apache NiFi in the Hadoop Ecosystem

Druid at Hadoop Ecosystem

Introduction to the Hadoop Ecosystem (FrOSCon Edition)

Streaming Live Data and the Hadoop Ecosystem

Hadoop And Their Ecosystem

Hadoop ecosystem

Hadoop & Ecosystem - An Overview

Hadoop architecture and ecosystem - PoliTO

2. Hadoop - lsd.ls.fi.upm.eslsd.ls.fi.upm.es/nuevas-tendencias-en-sistemas-distribuidos/Hadoop_… · Hadoop Hadoop Software Ecosystem Hadoop MapReduce Hadoop Distributed File System

Spark in the Hadoop Ecosystem

Taming Operations in the Hadoop Ecosystem

Introduction to the Hadoop Ecosystem (codemotion Edition)

Securing the Hadoop Ecosystem

Hadoop Ecosystem - CRS4dassia.crs4.it/.../2014/11/Dassia4_HadoopEcosystem.pdf · 2015-05-04 · Hadoop Ecosystem Data Acquisition. Data Acquisition Sorgenti distribuite ed eterogenee

Introduction to the Hadoop EcoSystem

Practical Hadoop Ecosystem: A Definitive Guide to Hadoop-Related Frameworks and Tools

Hadoop architecture and ecosystem - dbdmg.polito.it€¦ · Hadoop architecture and ecosystem Author: paolo Created Date: 5/21/2020 6:05:26 PM ...

The Hadoop Ecosystem - York UniversityManaging the whole ecosystem Hadoop cluster provisioning – Step by step process for installing hadoop on many hosts – Handles Hadoop cluster

PROFESSIONAL HADOOP® SOLUTIONS - Startseite€¦ · The Hadoop Ecosystem 7 Hadoop Core Components 7 Hadoop Distributions 10 Developing Enterprise Applications with Hadoop 12 Summary

A BigBench Implementation in the Hadoop Ecosystem