Learn to setup a Hadoop Multi Node Cluster

22
www.edureka.co/hadoop- admin

Transcript of Learn to setup a Hadoop Multi Node Cluster

Page 1: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Page 2: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

What will you learn today?

Let us have a quick poll, do you know the following topics?

Hadoop Components and Configurations Modes of a Hadoop Cluster Hadoop Multi Node Cluster Setting up a Cluster (Hands-On)

Page 3: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Hadoop Components and Configurations

Page 4: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Hadoop 2.x Core Components

HDFS YARN

DataNode

NameNode Resource Manager

Node Manager

Master

Slave

SecondaryNameNode

Hadoop 2.x Core Components

Storage Processing

Page 5: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

HDFS Components

Hadoop 2.x Core Components

® NameNode:

» Master of the system» Maintains and manages the blocks which are

present on the DataNodes

® DataNodes:

» Slaves which are deployed on each machine and provide the actual storage

» Responsible for serving read and write requests for the clients

® Client » Submits a MapReduce Job

® Resource Manager» Cluster Level resource manager» Long Life, High Quality Hardware

® Node Manager» One per Data Node» Monitors resources on Data Node

MapReduce Components

Page 6: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Hadoop Cluster: A Typical Use Case

RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 coresEthernet: 3 x 10 GB/sOS: 64-bit CentOS

RAM: 16GBHard disk: 6 x 2TBProcessor: Xenon with 2 cores.Ethernet: 3 x 10 GB/sOS: 64-bit CentOS

RAM: 64 GB,Hard disk: 1 TBProcessor: Xenon with 8 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply

RAM: 32 GB,Hard disk: 1 TBProcessor: Xenon with 4 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply

Active NameNodeSecondary NameNode

DataNode DataNode

RAM: 64 GB,Hard disk: 1 TBProcessor: Xenon with 8 CoresEthernet: 3 x 10 GB/sOS: 64-bit CentOSPower: Redundant Power Supply

StandBy NameNode

Page 7: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Hadoop 2.x Configuration Files

Configuration Filenames Description of Log Files

hadoop-env.sh Environment variables that are used in the scripts to run Hadoop.

core-site.xmlConfiguration settings for Hadoop Core such as I/O settings that are common to HDFS and MapReduce.

hdfs-site.xmlConfiguration settings for HDFS daemons, the namenode, the secondary namenode and the data nodes.

mapred-site.xml Configuration settings for MapReduce Applications.yarn-site.xml Configuration settings for ResourceManager and NodeManager.masters A list of machines (one per line) that each run a secondary namenode.

slaves A list of machines (one per line) that each run a Datanode and a NodeManager.

Page 8: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Hadoop 2.x Configuration Files – Apache Hadoop

Core

HDFS

core-site.xml

hdfs-site.xml

yarn-site.xmlYARN

mapred-site.xml

Map Reduce

Page 9: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

core-site.xml

-------------------------------------------------core-site.xml-----------------------------------------------------

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- core-site.xml --><configuration>

<property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value></property>

</configuration>

------------------------------------------------core-site.xml-----------------------------------------------------

The name of the default file system. The url's authority is used to

determine the host, port, etc. for a filesystem.

Page 10: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

hdfs-site.xml ---------------------------------------------------------hdfs-site.xml-------------------------------------------------------------

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- hdfs-site.xml --><configuration>

<property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.permissions</name> <value>false</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>/home/edureka/hadoop-2.2.0/hadoop2_data/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>/home/edureka/hadoop-2.2.0/hadoop2_data/hdfs/datanode</value> </property>

</configuration> ---------------------------------------------------------hdfs-site.xml-------------------------------------------------------------

Determines where on the local filesystem the DFS name node

should store the name table(fsimage).

If "true", enable permission checking in HDFS. If "false",

permission checking is turned off.

Determines where on the local filesystem the DFS name node

should store the name table(fsimage).

Determines where on the local filesystem an DFS data node

should store its blocks.

Page 11: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

mapred-site.xml

-----------------------------------------------mapred-site.xml---------------------------------------------------

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- mapred-site.xml --><configuration>

<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property>

</configuration>

-----------------------------------------------mapred-site.xml---------------------------------------------------

The runtime framework for executing MapReduce jobs. Can be one of local,

classic or yarn.

Page 12: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

yarn-site.xml

-----------------------------------------------yarn-site.xml---------------------------------------------------

<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><!-- yarn-site.xml --><configuration>

<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property>

</configuration>

-----------------------------------------------yarn-site.xml---------------------------------------------------

The auxiliary service name.

The auxiliary service class to

use.

Page 13: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Per-Process RunTime Environment

Set parameter JAVA_HOMEJVMhadoop-env.sh

® This file also offers a way to provide custom parameters for each of the servers.

® Hadoop-env.sh is sourced by all of the Hadoop Core scripts provided in the hadoop directory which is present in

hadoop installation directory (hadoop-2.2.0/etc/hadoop).

® Examples of environment variables that you can specify:

export HADOOP_HEAPSIZE=“512"

export HADOOP_DATANODE_HEAPSIZE=“128"

® NameNode status: http://localhost:50070/dfshealth.jsp® ResourceManager status: http://localhost:8088/cluster® MapReduce JobHistory Server status: http://localhost:

19888/jobhistory

Page 14: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Master & Slave nodes for Hadoop Multi Node Cluster

Page 15: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Slaves and Masters

® The ‘Masters’ file on the Slave Node is blank.

® The ‘Slaves’ file on the MasterNode contains a list of hosts that run DataNode and NodeManager.

Masters

Slaves

® The ‘Masters’ file on the MasterNode contains the Hostname and IP Address of Secondary NameNode server.

® The ‘Slaves’ file on the SlaveNode contains its own IP address.

Page 16: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Modes of a Hadoop Cluster

Page 17: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Hadoop Cluster Modes

Pseudo-Distributed Mode

Fully-Distributed Mode

® No daemons, everything runs in a single JVM.® Suitable for running MapReduce programs during development.® Has no DFS.

® Hadoop daemons run on the local machine.

® Hadoop daemons run on a cluster of machines.

Standalone (or Local) Mode

Hadoop can run in any of the following three modes:

Page 18: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Terminal Commands

Page 19: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Terminal Commands

Page 20: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Setting up of a Hadoop Multi Node Cluster

Page 21: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Course Details

Page 22: Learn to setup a Hadoop Multi Node Cluster

www.edureka.co/hadoop-admin

Course Details

Edureka's Hadoop Administration course: • The Hadoop Cluster Administration training course is designed to provide knowledge and skills to become a

successful Hadoop Architect. It starts with the fundamental concepts of Apache Hadoop and Hadoop Cluster. It covers topics to deploy, configure, manage, monitor, and secure a Hadoop Cluster.

• Online Live Courses: 24 hours• Assignments: 30 hours• Project: 20 hours• Lifetime Access + 24 X 7 Support

Go to www.edureka.co/hadoop-admin

Batch starts from 7 November (Weekend Batch)

Hadoop Administration Course