Hadoop

30
red red red red red red red red red red red red red red red red red red red red CYS14011 - Rithu P Ravi CYS14012 - Saumya K

description

Hadoop:An Overview

Transcript of Hadoop

  • red red red red red red red red red red red red red red red red red red red red

    CYS14011 - Rithu P Ravi

    CYS14012 - Saumya K

    red 1/1

  • Big Data Hadoop... HDFS Map Reduce

    Why and What HADOOP?...

    Apache Hadoop is an open-source software framework

    A tool to process big data

    Rithu P Ravi,SaumyaK HADOOP 2/30

  • Big Data Hadoop... HDFS Map Reduce

    Outline

    1 Big Data

    2 Hadoop...

    3 HDFS

    4 Map Reduce

    Rithu P Ravi,SaumyaK HADOOP 3/30

  • Big Data Hadoop... HDFS Map Reduce

    Big Data

    Data beyond storage and processing power

    3 Vs

    Volume

    Velocity

    Variety

    Rithu P Ravi,SaumyaK HADOOP 4/30

  • Big Data Hadoop... HDFS Map Reduce

    Big Data

    Exponential growth of data

    Challenges to Google, Yahoo, Microsoft, Amazon

    Need to go through TBs and PBs of data ?

    Existing tools became inadequate to process such largedata sets.

    Rithu P Ravi,SaumyaK HADOOP 5/30

  • Big Data Hadoop... HDFS Map Reduce

    Big ElephantNumerous small chicken..?

    Rithu P Ravi,SaumyaK HADOOP 6/30

  • Big Data Hadoop... HDFS Map Reduce

    How to handle such BIG ?

    Issues

    How to handle a system up and downs ?

    How to combine the data from all the systems ?

    Rithu P Ravi,SaumyaK HADOOP 7/30

  • Big Data Hadoop... HDFS Map Reduce

    Problem1 : Systems Ups Downs

    Commodity hardware for data storage and analysis

    Chances of failure are very high

    Replication of data across some machines

    GFS (Google File System)

    GFS

    Divides data into chunks and stores in the file System

    Can store data in ranges of PBs also

    Rithu P Ravi,SaumyaK HADOOP 8/30

  • Big Data Hadoop... HDFS Map Reduce

    Problem 2 : How to combine the data ?

    Analyze data across different machines .

    Merge-, Data has to travel across network.

    Doing this is notoriously challenging

    Again GoogleMapReduce

    Rithu P Ravi,SaumyaK HADOOP 9/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Provides a programming model

    Abstracts disk reads and writes

    Converts to (keys,values) pair

    Two Phases

    MapReduce

    Rithu P Ravi,SaumyaK HADOOP 10/30

  • Big Data Hadoop... HDFS Map Reduce

    Outline

    1 Big Data

    2 Hadoop...

    3 HDFS

    4 Map Reduce

    Rithu P Ravi,SaumyaK HADOOP 11/30

  • Big Data Hadoop... HDFS Map Reduce

    HADOOP

    A reliable shared storage system

    Analysis system

    Rithu P Ravi,SaumyaK HADOOP 12/30

  • Big Data Hadoop... HDFS Map Reduce

    History

    Google was the first to launch GFS and MapReduce

    Published a paper 2004

    A brand new technology

    Was well proven in Google by 2004 itself

    Rithu P Ravi,SaumyaK HADOOP 13/30

  • Big Data Hadoop... HDFS Map Reduce

    History

    Doug Cutting

    Open source version of MapReduce system called Hadoop

    Yahoo and others rallied around to support this effort.

    Now Hadoop is core part in : Facebook, Yahoo, LinkedIn,Twitter

    Rithu P Ravi,SaumyaK HADOOP 14/30

  • Big Data Hadoop... HDFS Map Reduce

    Core Concepts

    HDFS

    Map Reduce

    Rithu P Ravi,SaumyaK HADOOP 15/30

  • Big Data Hadoop... HDFS Map Reduce

    Outline

    1 Big Data

    2 Hadoop...

    3 HDFS

    4 Map Reduce

    Rithu P Ravi,SaumyaK HADOOP 16/30

  • Big Data Hadoop... HDFS Map Reduce

    HDFS...Hadoop Distributed File System

    Streaming very large files on commodity cluster

    1 Very Large Files : MBs to PBs2 Streaming

    Write once read many approachNo modifiationTime to read the whole data is more important

    3 Commodity Cluster

    No High end ServersYes, high chance of failure (But HDFS is tolerantenough)Replication is done

    Rithu P Ravi,SaumyaK HADOOP 17/30

  • Big Data Hadoop... HDFS Map Reduce

    HDFSHadoop Distributed File System...

    Services

    Masters

    Name Node

    Secondary Name Node

    Job Tracker

    Slaves

    Data Node

    Task Tracker

    Rithu P Ravi,SaumyaK HADOOP 18/30

  • Big Data Hadoop... HDFS Map Reduce

    HDFSHadoop Distributed File System...

    Name Node

    Master Node

    Maintains Name System

    Meta Data

    Secondary Name Node

    Periodically updating fsimage file

    Data Node

    Slaves

    Actual Storage

    Rithu P Ravi,SaumyaK HADOOP 19/30

  • Big Data Hadoop... HDFS Map Reduce

    HDFS Architecture

    Rithu P Ravi,SaumyaK HADOOP 20/30

  • Big Data Hadoop... HDFS Map Reduce

    Outline

    1 Big Data

    2 Hadoop...

    3 HDFS

    4 Map Reduce

    Rithu P Ravi,SaumyaK HADOOP 21/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Large scale data processing in parallel.

    It provides

    Automatic parallelization and distributionFault-tolerance

    Two Phases in Map Reduce

    MapReduce

    Rithu P Ravi,SaumyaK HADOOP 22/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Job Tracker

    Master

    Manages the jobes in the cluster

    Task Tracker

    Slaves

    Responsible for Map Reduce

    Rithu P Ravi,SaumyaK HADOOP 23/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Rithu P Ravi,SaumyaK HADOOP 24/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Map Phase

    map(inKey,invalue)-list(outKey, intermediateValue)

    Processes input key/value pair

    Produces set of intermediate pairs

    Reduce Phase

    reduce(outKey,list(intermediateValue))- list(outValue)

    Combines all intermediate values for a particular key

    Produces a set of merged output values (usually just one)

    Rithu P Ravi,SaumyaK HADOOP 25/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Rithu P Ravi,SaumyaK HADOOP 26/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Rithu P Ravi,SaumyaK HADOOP 27/30

  • Big Data Hadoop... HDFS Map Reduce

    Map Reduce

    Rithu P Ravi,SaumyaK HADOOP 28/30

  • Big Data Hadoop... HDFS Map Reduce

    ReferencesIf you want to improve this style

    Hadoop Tutorial-Durga Softhttps://www.youtube.com/watch?v=DLutRT6K2rM/

    Hadoop Official Sitehttp://hadoop.apache.org/index.html/

    Processing Big Data using Hadoop FrameworkPrashant D. Londhe, Satish S. Kumbhar, Ramakant S.Sul, Amit J. Khadse

    Rithu P Ravi,SaumyaK HADOOP 29/30

  • Big Data Hadoop... HDFS Map Reduce

    Happy Hadooping.... :)

    Rithu P Ravi,SaumyaK HADOOP 30/30

    Big DataHadoop...HDFSMap Reduce