Big data&hadoop

Post on 14-Apr-2017

266 views 0 download

Transcript of Big data&hadoop

BIGDATA AND HADOOP

By Ram and Raghavendra

BIGDATAWhat is bigdata…..?

Bigdata is the type of data which contains large volume of files in the form of vedios, audios,Pictures, documents etc……….

SOURCES OF BIGDATA

BIG DATA

Face book TWITTER

Google

ONEDRIVEYahoo

Media ,Government,Flipkart etc……………

Types of bigdata

1.Structured data

2.Un structured data

Structured data:

It is the similar type of data which contains same category of files. ex: 1.text files

Text file1 Text file2 ………..

picture1 picture22.pictures ……..

Unstructured data:

It is the combination of different types of data.

vedios audios pictures documents

Three Characteristics of Big Data V3s

Volume• Data

quantity

Velocity• Data Speed

Variety• Data Types

ABOUT BIGDATA• Everyday we are creating 2.5 quintillion bytes of data

• 90% of data in the world has been created in the last two years

• Facebook generates 500+ terabytes of data per a day

Difficults with bigdataIt is too difficult to manage this bigdata for 1. analysis 2. capture 3.curation 4.search5.sharing 6.storage7.transfer 8.visualization and information privacy , with standard database management systems like DBMS and RDBMS.

WHAT IS HADOOP....?

Hadoop Framework Of ToolsIs

Open source(APACHE)

Objective :

Hadoop Running applications on Bigdata

SUPPORTS

Challenging points to Hadoop

velocity varietyvolume

Traditional Approach• Enterprise Approach:

Big Data Processed By Powerful computer

Traditional Approach:• Enterprise Approach:

Big Data Processing limit Powerful computer

Only so much data could be

processed

Breaking the Data

Big Data Is broken into pieces

move computation to the data

Big DataCombined result

COM

PUTA

TIO

N

ARCHITECTURE

MAP REDUCE

FILE SYSTEM(HDFS)

PROJECTS

DISTRIBUTED MODEL

• 1.THESE ARE LOW COST COMPURTERS• 2. WORKS ON LINUX BASED MACHINES

LINUX LINUX LINUX LINUX

TASK TRACKER AND DATA NODES

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

SLAVES

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

SLAVES

TASK TRACKER

NAME DATA

NODE NODE

MASTER JOB TRACKER

COMPONENTS

MAP REDUCE

FILE SYSTEM(HDFS)

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

SLAVES

TASK TRACKER

NAME DATA

NODE NODE

M

Map Reduce

JOB TRACKER

MASTER M

ap

Redu

ce

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

SLAVES

TASK TRACKER

NAME DATA

NODE NODE

M

HDFS

JOB TRACKER

MASTER HD

FS

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

SLAVES

TASK TRACKER

NAME DATA

NODE NODE

M

Batch processing

JOB TRACKER

MASTER

Application Queue

Batch

processi

ng

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

SLAVES

TASK TRACKER

NAME DATA

NODE NODE

M

Job Tracker

JOB TRACKER

MASTER

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

SLAVES

TASK TRACKER

NAME DATA

NODE NODE

M

FAULT TOLERANCE FOR DATA NODE

JOB TRACKER

MASTER HD

FS

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

SLAVES

TASK TRACKER

NAME DATA

NODE NODE

M

FAULT TOLERANCE FOR PROCESSING

JOB TRACKER

MASTER M

AP

REDU

CE

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

TASK TRACKER

DATANODE

TASK TRACKER

DATA NODE

SLAVES

TASK TRACKER

NAME DATA

NODE NODE

M

Master Backup

JOB TRACKER

MASTER

Tables a

re

backe

d up

Easy programming

Do not worryabout

1.Where the file is located

2.How to manage failures

3.How to break competitions into pieces

programmers

4.Scalability

Name•Name was given by Doug cutting•Created by Doug cutting Mike cafarella(yahoo) in 2005•Yahoo donated HADOOP to Apache in 2006

Usage Areas •Social media •Retail•Financial services•Searching tools•Government • Intelligence

Companies• Yahoo• Facebook• Amazon• eBay• American airlines• The NEW YORK Times• Chevron• IBM• Federal Reserve Board

Future outlook

yahoo

By 2015 50% of enterprise data will be processed by Hadoop

Thank you