Hadoop core concepts

Post on 16-Jan-2017

202 views 0 download

Transcript of Hadoop core concepts

Hadoop Basic ConceptsMarian Faryna

Buzzword BingoBigData Ecosystem HDFS Replica

factor

Data locality Hive Data variety Column-based storage

Commodity hardware

Resource manager

MapReduce High Availability

Coordination service Shuffling Eventual

consistencyName Node

WHY HADOOP?

PROBLEMSWITHRELATIONAL DATABASES

HADOOP CAN:

● Process large data sets effectively● Work with structured/unstructured data● Process data in different modes

ADDITIONAL HADOOP PROS

● High Availability● Horizontal Scalability● Commodity Hardware● BASE Principle

Hadoop BASE Principle :

● Basically Available● Soft state● Eventually consistent

100 nodes cluster. 800 scrobblers per second. 40 million per day

Hadoop cluster consist of 532 nodes, 120 million active users, 300+ million search queries daily

1100 nodes, processing 12 PB storage data. 200+ million active users. 30 million users update their statuses at least once each day

1650 nodes cluster 75+ millions of active users, 30+ million songs 1+ billion plays per day

Who is using Hadoop

WHAT IS HADOOP?

Hadoop is an ecosystem for distributed processing and distributed storage large sets of data

HADOOPEcosystem

column-based storage

coordinationservice

New Node

Journal Node

Basically Available

Highly Available

replica factor

HDFS

Eventual Consistency

BASE Principle

Soft State

BASE Principle

MAPREDUCE

MapReduce algorithm

MapReduce example

Altogether

data locality

resource manager

IT’S ALL ABOUT DATAprocessing

And remember

Thanks!Any questions?