Hadoop course online

of 26 /26

Embed Size (px)

Transcript of Hadoop course online

  • Hadoop Course Online


  • The Contents

    About Hadoop01

    Types of data comes under big data02

    Benefits of big data03

    Solution to process big data04

  • The Contents

    Hadoop architecture05


    Hadoop Distributed File System07

    Working with Hadoop08

  • The Contents

    Advantages of Hadoop09

    Hadoop Environment setup10

    Overview of HDFS11

    Features of HDFS12

  • The Contents

    Architecture of HDFS13

    Operations of HDFS14

    Hadoop MapReduce15

    Big Data And Hadoop For Beginners16

  • About Hadoop

    Hadoop is an open source software

    framework which allows the user to store and

    process a large amount of data.

    Hadoop consists of computer clusters which

    are built from commodity software.

    Hadoop framework was designed by Apache

    software foundation, and it was originally

    released on 25th Dec 2011.

    The storage section is often referred as

    Hadoop Distributed File System (HDFS), and

    the processing part is performed by using

    MapReduce programming model.

  • Types of data comes under big data

    The big data has data generated by

    various applications and devices.

    Different types of data which come

    under the category of big data

    black box data,

    social media data,

    power grid data,

    search engine data,

    transport data,

    stock exchange data.

  • Benefits of big data

    In hospitals, big data plays a vital role,

    and the data analytics are storing the

    patients medical history using big data


    This will help the doctors to provide

    quick service to the patients.

    The main challenges of big data are

    capturing data, storage, curation,

    searching, transfer, sharing, analysis

    and presentation.

  • Solution to process big data

    If we have the small amount of data, we

    can store and process the data using the

    traditional approach.

    In this approach, the data is typically

    stored in RDBMS like MS SQL Server,

    Oracle database, etc.

    While dealing with huge amount of data, it

    is not possible for storing and processing

    the data using the traditional database


  • Hadoop architecture

    Hadoop framework has four modules

    such as Hadoop YARN, Hadoop common,

    Hadoop MapReduce, and Hadoop

    Distributed File System (HDFS).

  • MapReduce

    Hadoop MapReduce is a software

    framework which is used to write

    applications for processing the vast

    amount of data with the help of

    thousands of nodes.

  • Hadoop Distributed File


    HDFS provides the distributed file system

    that is used to run massive clusters of

    small computer machines in fault

    tolerant and reliable manner. This

    distributed system is based on the Google

    file system (GFS).

  • Working with Hadoop

    The application or user submits the job to the Hadoop client. Then the Hadoop client sends the job and its configuration to the job tracker which is responsible for splitting and distributing the configuration to the slaves. Job tracker is also responsible for scheduling the works and monitored them only then it can provide the status to the job client. After completing this process, the task trackers on different nodes perform the job using MapReduce algorithm, and finally, the output files are stored in the file system.

  • Advantages of Hadoop

    Since Hadoop is an open source framework,

    so it is compatible with all the platforms.

    We can add or remove the servers

    dynamically. This process doesnt interrupt

    the Hadoop in any way.

    The users can write and test the distributed

    systems quickly using Hadoop framework.

  • Hadoop Environment setup

    Linux operating system supports the

    Hadoop framework. The Linux users can

    easily setup the Hadoop environment on

    their computers. Before starting to install

    the Hadoop, the users have to setup the

    Linux using Secure Shell (SSH). If the users

    have the OS other than Linux, then they

    need to install the software called

    Virtualbox that have the Linux OS inside it.

  • Overview of HDFS

    HDFS stores huge amount of data and

    provides easier access. This distributed file

    system is fault tolerant, and it is designed

    with low-cost hardwares.

  • Features of HDFS

    It provides file authentication and


    Interaction with HDFS is possible using

    common interface system which is provided

    by Hadoop.

    HDFS is perfectly suitable for distributed

    storage and processing purposes.

  • Architecture of HDFS

    Hadoop distributed file system follows the

    master-slave architecture which has the

    following components such as namenode,

    datanode, and block.

    Intention of HDFS

    Process the large datasets efficiently

    Fault detection and recovery

    Hardware at data

  • Operations of HDFS

    Firstly the users have to format the

    configured HDFS file system and start the

    distributed file system.

    Then listing files in HDFS which means

    loading information into the server.

    After that, the users have to insert data into

    Hadoop Distributed File System.

    Retrieve the data from HDFS.

    Finally shut down the HDFS.

  • Operations of HDFS

    Firstly the users have to format the

    configured HDFS file system and start the

    distributed file system.

    Then listing files in HDFS which means

    loading information into the server.

    After that, the users have to insert data into

    Hadoop Distributed File System.

    Retrieve the data from HDFS.

    Finally shut down the HDFS.

  • Hadoop MapReduce

    MapReduce is used to process the enormous amount of data, and it is otherwise known as processing technique. This algorithm performs two tasks to process the data completely. The works include the map and reduce. Here map is used to convert a set of data into another set of data.

  • Hadoop MapReduce

    The individual elements are splitting into tuples. Then the output of the map is taken as the input by the reduce task which combines the data tuples into the smaller set of tuples. MapReduce program executes the process in three stages comprises of map stage, shuffle stage and reduce stage.

  • Big Data And Hadoop For Beginners

    Beginner, Students, Manager, and

    Developer, can take this course if you are

    interested in learning Big Data. This 3 hours

    hadoop course online has six sections with 1

    article and six supplemental resources.

    The prime motto of this course is making

    you understand the Hadoop components and

    its complex architectures.

  • Hadoop Course Online 2017

    Hadoop Tutorial

    Learn Big Data

    Learn Hadoop And MapReduce For

    Big Data Problems

    Big Data And Hadoop For Beginners

    Hadoop Course Online


    able link


  • Follow us