Hadoop 101 (v1) (20150730)

17
Hadoop 101 Big Data Technology

Transcript of Hadoop 101 (v1) (20150730)

Page 1: Hadoop 101 (v1) (20150730)

Hadoop 101Big Data Technology

Page 2: Hadoop 101 (v1) (20150730)

What is Big Data?

Page 3: Hadoop 101 (v1) (20150730)

Big Data is ...

- A Technology that capable of handling a:- massive and complex data (petabytes+)- stream of data in (near) real time- extremely large infrastructure

Page 4: Hadoop 101 (v1) (20150730)

$ whoami

Firman GautamaSoftware Engineer at ADSKOM Indonesia

[email protected]

Page 5: Hadoop 101 (v1) (20150730)

What is Hadoop?- Hadoop is:

- scalable.- a “Framework”.- not a drop in replacement

for RDBMS.- great for pipelining

massive amounts of data to achieve the end result.

Page 6: Hadoop 101 (v1) (20150730)

- Hadoop was created by Doug Cutting and Mike Cafarella. Cutting, who was working at Yahoo! at the time, named it after his son’s toy elephant.

- Yahoo! has the single largest Hadoop cluster in the world (4,500 nodes). (according to the Apache Hadoop website)

- Yes, there is a Hadoop GPU Framework available!

Hadoop Fun Facts

Page 7: Hadoop 101 (v1) (20150730)

Hadoop Core Components

Page 8: Hadoop 101 (v1) (20150730)

Hadoop 1.x- HDFS (storage)

- NameNode- DataNode- Secondary NameNode*

- MapReduce (processing)- JobTracker- TaskTrackers- JobHistoryServer

Hadoop Core Components (Details)

Hadoop 2.x- HDFS (storage)

- NameNode- DataNode- Secondary NameNode*

- YARN (processing)- ResourceManager- ApplicationMaster- NodeManager- JobHistoryServer

Page 9: Hadoop 101 (v1) (20150730)

Hadoop Compatible Components (1)

- Manipulate/Querying Data:- Apache Hive (SQL like query)- Cloudera Impala (SQL like query)- Apache Pig (Scripting based query)

- MapReduce (Library)

- Key Value Storage- HBase- Cassandra

Page 10: Hadoop 101 (v1) (20150730)

Hadoop Compatible Components (2)

- Message Queueing:- Kafka (Similar to RabbitMQ, Pub-Sub, etc)

- Advanced Processing- Spark (Up to 100x faster than MapReduce)

- Scheduler/Workflow- Oozie (Similar to Crontab)

Page 11: Hadoop 101 (v1) (20150730)

Hadoop Compatible Components (3)

- Data Export/Import:- Flume (Stream: Text Files/Logs to HDFS)- Sqoop (RDBMS to HDFS or vice versa)

and many more.. :)

Page 12: Hadoop 101 (v1) (20150730)

Most Popular Hadoop Distributions

source: datanami.com

Page 13: Hadoop 101 (v1) (20150730)

Real Example of Using Hadoop* (1)

Page 14: Hadoop 101 (v1) (20150730)

Real Example of Using Hadoop* (2)

Page 15: Hadoop 101 (v1) (20150730)

Real Example of Using Hadoop* (3)

(near) Real Time Analytics

Page 16: Hadoop 101 (v1) (20150730)

QA Session

Join our Linkedin Group

Big Data Indonesiahttps://www.linkedin.com/grp/home?gid=6970225

Page 17: Hadoop 101 (v1) (20150730)

Hadoop 101Thank You # EOFUnless stated, all images used in this slides belong to their respective owners.