Introduction to the hadoop ecosystem by Uwe Seiler
-
Upload
codemotion -
Category
Technology
-
view
449 -
download
7
description
Transcript of Introduction to the hadoop ecosystem by Uwe Seiler
![Page 1: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/1.jpg)
Introduction to the Hadoop ecosystem
![Page 2: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/2.jpg)
About me
![Page 3: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/3.jpg)
About us
![Page 4: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/4.jpg)
Why Hadoop?
![Page 5: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/5.jpg)
Why Hadoop?
![Page 6: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/6.jpg)
Why Hadoop?
![Page 7: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/7.jpg)
Why Hadoop?
![Page 8: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/8.jpg)
Why Hadoop?
![Page 9: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/9.jpg)
Why Hadoop?
![Page 10: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/10.jpg)
Why Hadoop?
![Page 11: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/11.jpg)
How to scale data?
w1 w2 w3
r1 r2 r3
![Page 12: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/12.jpg)
But…
![Page 13: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/13.jpg)
But…
![Page 14: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/14.jpg)
What is Hadoop?
![Page 15: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/15.jpg)
What is Hadoop?
![Page 16: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/16.jpg)
What is Hadoop?
![Page 17: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/17.jpg)
What is Hadoop?
![Page 18: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/18.jpg)
The Hadoop App Store
HDFS MapRed HCat Pig Hive HBase Ambari Avro Cassandra
Chukwa
Intel
Sync
Flume Hana HyperT Impala Mahout Nutch Oozie Scoop
Scribe Tez Vertica Whirr ZooKee Cloudera Horton MapR EMC
IBM Talend TeraData Pivotal Informat Microsoft. Pentaho Jasper
Kognitio Tableau Splunk Platfora Rack Karma Actuate MicStrat
![Page 19: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/19.jpg)
Data Storage
![Page 20: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/20.jpg)
Data Storage
![Page 21: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/21.jpg)
Hadoop Distributed File System
•
•
•
![Page 22: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/22.jpg)
Hadoop Distributed File System
•
•
![Page 23: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/23.jpg)
HDFS Architecture
![Page 24: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/24.jpg)
Data Processing
![Page 25: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/25.jpg)
Data Processing
![Page 26: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/26.jpg)
MapReduce
•
•
•
![Page 27: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/27.jpg)
Typical large-data problem
•
•
•
•
•
![Page 28: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/28.jpg)
MapReduce Flow
𝐤𝟏 𝐯𝟏 𝐤𝟐 𝐯𝟐 𝐤𝟒 𝐯𝟒 𝐤𝟓 𝐯𝟓 𝐤𝟔 𝐯𝟔 𝐤𝟑 𝐯𝟑
a 𝟏 b 2 c 9 a 3 c 2 b 7 c 8
a 𝟏 b 2 c 3 c 6 a 3 c 2 b 7 c 8
a 1 3 b 𝟐 7 c 2 8 9
a 4 b 9 c 19
![Page 29: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/29.jpg)
Jobs & Tasks
•
•
•
•
![Page 30: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/30.jpg)
Combined Hadoop Architecture
![Page 31: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/31.jpg)
Word Count Mapper in Java
public class WordCountMapper extends MapReduceBase implements
Mapper<LongWritable, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector<Text,
IntWritable> output, Reporter reporter) throws IOException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}
![Page 32: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/32.jpg)
Word Count Reducer in Java
public class WordCountReducer extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable>
{
public void reduce(Text key, Iterator values, OutputCollector
output, Reporter reporter) throws IOException
{
int sum = 0;
while (values.hasNext())
{
IntWritable value = (IntWritable) values.next();
sum += value.get();
}
output.collect(key, new IntWritable(sum));
}
}
![Page 33: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/33.jpg)
Scripting for Hadoop
![Page 34: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/34.jpg)
Scripting for Hadoop
![Page 35: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/35.jpg)
Apache Pig
•
••
•
![Page 36: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/36.jpg)
Pig in the Hadoop ecosystem
Hadoop Distributed File System
Distributed Programming Framework
Metadata Management
Scripting
![Page 37: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/37.jpg)
Pig Latin
users = LOAD 'users.txt' USING PigStorage(',') AS (name,
age);
pages = LOAD 'pages.txt' USING PigStorage(',') AS (user,
url);
filteredUsers = FILTER users BY age >= 18 and age <=50;
joinResult = JOIN filteredUsers BY name, pages by user;
grouped = GROUP joinResult BY url;
summed = FOREACH grouped GENERATE group,
COUNT(joinResult) as clicks;
sorted = ORDER summed BY clicks desc;
top10 = LIMIT sorted 10;
STORE top10 INTO 'top10sites';
![Page 38: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/38.jpg)
Pig Execution Plan
![Page 39: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/39.jpg)
Try that with Java…
![Page 40: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/40.jpg)
SQL for Hadoop
![Page 41: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/41.jpg)
SQL for Hadoop
![Page 42: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/42.jpg)
Apache Hive
•
•
![Page 43: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/43.jpg)
Hive in the Hadoop ecosystem
Hadoop Distributed File System
Distributed Programming Framework
Metadata Management
Scripting Query
![Page 44: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/44.jpg)
Hive Architecture
![Page 45: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/45.jpg)
Hive Example
CREATE TABLE users(name STRING, age INT);
CREATE TABLE pages(user STRING, url STRING);
LOAD DATA INPATH '/user/sandbox/users.txt' INTO
TABLE 'users';
LOAD DATA INPATH '/user/sandbox/pages.txt' INTO
TABLE 'pages';
SELECT pages.url, count(*) AS clicks FROM users JOIN
pages ON (users.name = pages.user)
WHERE users.age >= 18 AND users.age <= 50
GROUP BY pages.url
SORT BY clicks DESC
LIMIT 10;
![Page 46: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/46.jpg)
Bringing it all together…
![Page 47: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/47.jpg)
Online Advertising
![Page 48: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/48.jpg)
Getting started…
![Page 49: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/49.jpg)
Hortonworks Sandbox
![Page 50: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/50.jpg)
Hadoop Training
••
•
••
•
••
•
![Page 51: Introduction to the hadoop ecosystem by Uwe Seiler](https://reader030.fdocuments.in/reader030/viewer/2022020110/5564eb24d8b42ab34e8b4aa2/html5/thumbnails/51.jpg)
The end…or the beginning?