Download - Intro hadoop ecosystem components, hadoop ecosystem tools

Transcript
  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    1/15

    Introduction to HadoopMapReduce and HDFS f

    Big Data Applications

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    2/15

    Hadoop History and GeneralInformation

    Apache Hadoop is an open-source software framework for distributdistributed processing of very large data sets on computer clusters b

    commodity hardware. All the modules in Hadoop are designed with a fundamental assum

    hardware failures are common and should be automatically handleframework.

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    3/15

    Hadoop Main Components

    Hadoop consists of MapReduce, the Hadoop distributed file systemnumber of related projects such as Apache Hive, HBase and Zooke

    MapReduce and Hadoop distributed file system (HDFS) are the maHadoop.

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    4/15

    Hadoop Architecture

    Hadoop framework includes following four modules:

    Hadoop Common: These are Java libraries and utilities requiredmodules. These libraries provides filesystem and OS level abstractionnecessary Java files and scripts required to start Hadoop.

    Hadoop YARN: This is a framework for job scheduling and cluster resou Hadoop Distributed File System  (HDFS™): A distributed file system t

    throughput access to application data.

    Hadoop MapReduce: This is YARN-based system for parallel processsets.

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    5/15

    Hadoop Cluster

    Normally any set of loosely connected or tightly connected computogether as a single system is called Cluster. In simple words, a comfor Hadoop is called Hadoop Cluster.

    Hadoop cluster is a special type of computational cluster designedanalyzing vast amount of unstructured data in a distributed compu

    These clusters run on low cost commodity computers. Hadoop clusters are often referred to as "shared nothing" systems b

    thing that is shared between nodes is the network that connects th

    Large Hadoop Clusters are arranged in several racks. Network traffdifferent nodes in the same rack is much more desirable than netwthe racks.

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    6/15

    Core Components of Hadoop Cluster

    Hadoop cluster has 3 components:

    Client

    Master 

    Slave

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    7/15

    Task Tracker

    1. Each Task Tracker is responsible to execute and manage the individual tasTracker.

    2. Task Tracker also handles the data motion between the map and reduce

    3. One Prime responsibil ity of Task Tracker is to constantly communicate withstatus of the Task.

    4. If the JobTracker fails to receive a heartbeat from a TaskTracker within a sptime, it wil l assume the TaskTracker has crashed and will resubmit the correother nodes in the cluster.

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    8/15

    Hadoop Heart - MapReduce

    MapReduce is a programming model which is used to process large dataprocessing manner.

    A MapReduce program is composed of

    a Map() procedure that performs filtering and sorting (such as sorting studinto queues, one queue for each name)

    and a Reduce() procedure that performs a summary operation (such as cnumber of students in each queue, yielding name frequencies).

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    9/15

    Facts about MapReduce

    Apache Hadoop Map-Reduce is an open source implementation of GooFramework.

    Although there are so many map-reduce implementation like Dryad fromfrom Nokia which have been developed for distributed systems but Hadoopopular among them offering open source implementation of Map-reduc

    Hadoop Map-Reduce framework works on Master/Slave architecture.

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    10/15

    MapReduce Architecture

    Hadoop MapReduce is composed of two components

    Job tracker playing the role of master and runs on MasterNode (Namenod

    Task tracker playing the role of slave per data node and runs o

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    11/15

    Job Tracker

    Job Tracker is the one to which client application submit mapreduce prog

    Job Tracker schedule clients jobs and allocates task to the slave task trarunning on individual worker machines(date nodes).

    Job tracker manage overall execution of Map-Reduce job.

    Job tracker manages the resources of the cluster 

    Manage the data nodes i.e. task tracker.

    To keep track of the consumed and available resource.

    To keep track of already running task, to provide fault-tolerance for task e

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    12/15

    Hadoop HDFS

    Hadoop File System was developed using distributed file system designcommodity hardware. Unlike other distributed systems, HDFS is highlydesigned using low-cost hardware.

    HDFS holds very large amount of data and provides easier access. Todata, the files are stored across multiple machines. These files are st

    fashion to rescue the system from possible data losses in case of fmakes applications available to parallel processing.

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    13/15

    Features of HDFS

    It is suitable for the distributed storage and processing.

    Hadoop provides a command interface to interact with HDFS.

    The built-in servers of namenode and datanode help users to easily chcluster.

    Streaming access to file system data. HDFS provides file permissions and authentication.

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    14/15

    Hadoop - Big Data Solutions

    In this approach, an enterprise will have a computer to store and proceHere data will be stored in an RDBMS like Oracle Database, MS SQL Sesophisticated softwares can be written to interact with the database, prequired data and present it to the users for analysis purpose.

  • 8/17/2019 Intro hadoop ecosystem components, hadoop ecosystem tools

    15/15

    Thank you!

    REBECCA THO, HADOOP DEVELOPER AT KYVOS INSIGHTS

    HTTP://WWW.KYVOSINSIGHTS.COM