Hadoop basics

60
Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduc Big Data & Hadoop D. Praveen Kumar Junior Research Fellow Department of Computer Science & Engineering Indian Institute of Technology (Indian School of Mines) Dhanbad, Jharkhand, India Head of IT & ITES, Skill Subsist Impels Ltd, Tirupati. March 25, 2017 Sree Venkateswara College of Engineering, Nellore, A. P. Big Data & Hadoop March 25, 2017 Slide: 1 / 60

Transcript of Hadoop basics

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Big Data & Hadoop

D. Praveen KumarJunior Research Fellow

Department of Computer Science & EngineeringIndian Institute of Technology (Indian School of Mines)

Dhanbad, Jharkhand, India

Head of IT & ITES, Skill Subsist Impels Ltd, Tirupati.

March 25, 2017

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 1 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

1 Introduction

2 Big Data

3 Sources of Big Data

4 Tools

5 HDFS

6 Installation

7 Configuration

8 Starting & Stopping

9 Map Reduce

10 Execution

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 2 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Data

Data means a value or set of values.

Examples:march 1st 201720, 30, 40ΨΦϕ

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 3 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Information

Meaningful or preprocessed data we called as Information.Examples:

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 4 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Data Types

The kind of data that may appear in a computer.

Examples: intfloatchardoubleAbstract data types -user defined data types.

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 5 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Traditional approaches

Traditional approaches to store and process the data

1 File system

2 RDBMS (Relational Database Management Systems)

3 Data Warehouse & Mining Tools

4 Grid Computing

5 Volunteer Computing

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 6 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

GUESTS =4

Transportation from railway station to yourhome( one Auto/car is sufficient)

mom can prepare food or snacks without risk.

Your house is sufficient for Accommodation.

Facilities like bed, bathrooms, water and TV areprovided which you use.

You can talk to each other and crack jokes andyou can make them happy

Expenditure is nearly Rs.1000/-

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 7 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

GUESTS =100

Transportation = 25 autos/car or twobuses

Food = catering.

Accommodation = Lodge.

Facilities = AC, TV, and all other facilities

Maintenance= somewhat difficult

Expenditure =nearly Rs. 90,000/-

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 8 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

GUESTS =10000

Transportation = 2500 autos or 500 buses

Food = catering.

Accommodation = all Lodges, functionhalls and cottages in the town.

Facilities = AC, TV, and all otherfacilities are somewhat difficult to provide.

Maintenance= more difficult

Expenditure =nearly Rs. 2,00,000/-

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 9 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Grid Computing

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 10 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Volunteer Computing

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 11 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

GUESTS =10000000

Transportation=how many autos=?

Food =?

Accommodation =?

Facilities =?

Maintenance=?

Cost =?

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 12 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Problems

Same we assume in computing environment

Difficult to handle a huge and ever growing amount of data

Processing of data can not be possible with few machines

distributing large data sets is difficult

Construction of online or offline models are very difficult

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 13 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Solution

A single solution to all these problems is

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 14 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

What is Big Data?

Big data refers to voluminous amounts of structured orunstructured data that organizations can potentially mine andanalyze.

Big data is huge amount of large data sets characterized by

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 15 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Data generation

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 16 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

How Data generated

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 17 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Internet of Events

Internet is the main source to generating the wast amount of data.

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 18 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

4 Internet of Events

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 19 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

4 Questions of Data Analysts

1 What happened?

2 Why did it happen?

3 What will happen?

4 What is the best that can happen?

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 20 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Big Data Platforms and Analytical Software

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 21 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Hadoop

Here we go with

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 22 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Hadoop History

Hadoop was created by Doug Cutting, creator of Lucene.

He also involved in a project called Nutch. (It is basic versionof hadoop)

Nutch is a combination of MapReduce and NDFS (NutchDistributed File System)

Later Nutch renamed to Hadoop. (Mapreduce + HDFS(Hadoop Distributed File System))

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 23 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Hadoop

Apache Hadoop is an open-source software framework fordistributed storage and distributed processing of very large datasets on computer clusters built from commodity hardware.

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 24 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Hadoop

The base Apache Hadoop framework is composed of the followingmodules:

Hadoop Common contains libraries and utilities needed byother Hadoop modules

Hadoop Distributed File System (HDFS) a distributedfile-system that stores data

Hadoop YARN a resource-management platform

Hadoop MapReduce for large scale data processing.

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 25 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Hadoop Components

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 26 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Hadoop Components

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 27 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

HDFS- Goals

The design goals of HDFS

1 Very Large files

2 Streaming Data Access

3 Commodity Hardware

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 28 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

HDFS- Failed in

HDFS is Not FIT for

1 Lots of small files

2 Low latency database access

3 Multiple writers, arbitrary file modifications

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 29 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

HDFS- Concepts

1 Blocks

2 Namenodes

3 Datanodes

4 HDFS Federation

5 HDFS High Availability

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 30 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Requirements

Necessary

Java >= 7

ssh

Linux OS (Ubuntu >=14.04)

Hadoop framework

Optional

Eclipse

Internet connection

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 31 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Java 7 & Installation

Hadoop requires a working Java installation. However, usingjava 1.7 or more is recommended.

Following command is used to install java in linux platformsudo apt-get install openjdk-7-jdk (or)

sudo apt-get install default-jdk

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 32 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Java PATH Setup

We need to set JAVA path

Open the .bashrc file located in home directorygedit ~/.bashrc

Add below line at the end:export JAVA HOME=/usr/lib/jvm/java−7−openjdk−amd64

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 33 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Installation & Configuration of SSH

Hadoop requires SSH(Secure Shell) access to manage itsnodes, i.e. remote machines plus your local machine if youwant to use Hadoop on it.

Install SSH using following commandsudo apt-get install ssh

First, we have to generate DSA an SSH key for user.ssh-keygen -t dsa -P ’’ -f ~ /.ssh/id dsa

cat ~ /.ssh/id dsa.pub >> ~ /.ssh/authorized keys

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 34 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Download & Extract Hadoop

Download Hadoop from the Apache Download Mirrors

http://mirror.fibergrid.in/apache/hadoop/common/

Extract the contents of the Hadoop package to a location of yourchoice. I picked /usr/local/hadoop.$ cd /usr/local

$ sudo tar xzf hadoop-2.7.2.tar.gz

$ sudo mv hadoop-2.7.2 hadoop

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 35 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Add Hadoop configuration in .bashrc

Add Hadoop configuration in .bashrc in home directory.export HADOOP INSTALL=/usr/local/hadoop

export PATH=$PATH:$HADOOP INSTALL/bin

export PATH=$PATH:$HADOOP INSTALL/sbin

export HADOOP MAPRED HOME=$HADOOP INSTALL

export HADOOP HDFS HOME=$HADOOP INSTALL

export HADOOP COMMON HOME=$HADOOP INSTALL

export YARN HOME=$HADOOP INSTALL

export HADOOP COMMON LIB NATIVE DIR=$HADOOP INSTALL/lib/native

export HADOOP OPTS="-Djava.library.path=$HADOOP INSTALL/lib"

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 36 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Create temp file, DataNode & NameNode

Execute below commands to create NameNodemkdir -p /usr/local/hadoopdata/hdfs/namenode

Execute below commands to create DataNodemkdir -p /usr/local/hadoopdata/hdfs/datanode

Execute below code to create the tmp directory in hadoopsudo mkdir -p /app/hadoop/tmp

sudo chown hadoop1:hadoop1 /app/hadoop/tmp

sudo chmod 750 /app/hadoop/tmp

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 37 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Files to Configure

The following are the files we need to configure

core-site.xml

hadoop-env.sh

mapred-site.xml

hdfs-site.xml

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 38 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Add properties in /usr/local/hadoop/etc/core-site.xml

Add the following snippets between the< configuration > ... < /configuration > tags in the core-site.xmlfile.

Add below property to specify the location of tmp< property >< name > hadoop.tmp.dir < /name >< value > /app/hadoop/tmp < /value >< /property >

Add below property to specify the location of default filesystem and its port number.< property >< name > fs.default.name < /name >< value > hdfs : //localhost : 9000 < /value >

< /property >

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 39 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Add properties in /usr/local/hadoop/etc/hadoop-env.sh

Un-Comment the JAVA HOME and Give Correct Path ForJava.export JAVA HOME=/usr/lib/jvm/java-7-openjdk-amd64

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 40 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Add property in/usr/local/hadoop/etc/hadoop/mapred-site.xml

In file we add The host name and port that the MapReduce jobtracker runs at. Add following in mapred-site.xml :< property >< name > mapred .job.tracker < /name >< value > localhost : 54311 < /value >< /property >

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 41 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Add properties in ... etc/hadoop/hdfs-site.xml

In file hdfs-site.xml add following:

Add replication factor< property >< name > dfs.replication < /name >< value > 1 < /value >

< /property >

Specify the NameNode< property >< name > dfs.namenode.name.dir < /name >< value > file : /usr/local/hadoopdata/hdfs/namenode < /value >

< /property >

Specify the DataNode< property >< name > dfs.datanode.name.dir < /name >< value > file : /usr/local/hadoopdata/hdfs/datanode < /value >

< /property >

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 42 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Formatting the HDFS filesystem via the NameNode

The first step to starting up your Hadoop installation is

Formatting the Hadoop file system

We need to do this the first time you set up a Hadoop.

Do not format a running Hadoop filesystem as you will lose allthe data currently in HDFS

To format the filesystem, run the commandhadoop namenode -format

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 43 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Starting single-node cluster

Run the command:start-all.sh

This will startup a NameNode,SecondaryNameNode,DataNode, ResourceManager and a NodeManager on yourmachine.

A nifty tool for checking whether the expected Hadoopprocesses are running is jpshadoop1@hadoop1:/usr/local/hadoop$ jps

2598 NameNode3112 ResourceManager3523 Jps2917 SecondaryNameNode2727 DataNode3242 NodeManager

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 44 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Stopping your single-node cluster

Run the commandstop-all.sh

To stop all the daemons running on your machine output will belike this.stopping NodeManagerlocalhost: stopping ResourceManagerstopping NameNodelocalhost: stopping DataNode

localhost: stopping SecondaryNameNode

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 45 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Map-Reduce Framework

Map Reduce programming paradigm

It relies basically on two functions, Map and Reduce

Map Reduce used to manage many large-scale computations

The framework takes care of scheduling tasks, monitoringthem and re-executes the failed tasks.

The framework to effectively schedule tasks on the nodeswhere data is already present

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 46 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Map-Reduce Computation Steps

The key-value pairs from each Map task are collected by amaster controller and sorted by key. The keys are dividedamong all the Reduce tasks, so all key-value pairs with thesame key wind up at the same Reduce task.

The Reduce tasks work on one key at a time, and combineall the values associated with that key in some way. Themanner of combination of values is determined by the codewritten by the user for the Reduce function.

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 47 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Hadoop - MapReduce

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 48 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Hadoop - MapReduce (Word Count) Example

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 49 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

MapReduce - WordCountMapper

In WordCountMapper class we perform the following operations

Read a line from file

Split line into Words

Assign Count 1 to each word

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 50 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

WordCountMapper source code

public static class WordCountMapper

extends Mapper<Object, Text, Text, IntWritable>{private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(Object key, Text value, Context context ) throws

IOException, InterruptedException {StringTokenizer itr = new StringTokenizer(value.toString());

while (itr.hasMoreTokens()) {word.set(itr.nextToken());

context.write(word, one);

}}}

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 51 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

MapReduce - WordCountReducer

In WordCountReducer class we perform the following operations

Sum the list of values

Assign sum to corresponding word

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 52 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

WordCountReducer source code

public static class WordCountReducer

extends Reducer<Text,IntWritable,Text,IntWritable> {private IntWritable result = new IntWritable();

public void reduce(Text key, Iterable<IntWritable> values,

Context context ) throws IOException, InterruptedException {int sum = 0;

for (IntWritable val : values) {sum += val.get();

}result.set(sum);

context.write(key, result);

}}

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 53 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

WordCountJob

public class WordCountJob {public static void main(String[] args) throws Exception {Configuration conf = new Configuration();

Job job = new Job(conf, "word count");

job.setJarByClass(WordCountJob.class);

job.setMapperClass(WordCountMapper.class);

job.setCombinerClass(WordCountReducer.class);

job.setReducerClass(WordCountReducer.class);

job.setOutputKeyClass(Text.class);

job.setOutputValueClass(IntWritable.class);

FileInputFormat.addInputPath(job, new Path(args[0]));

FileOutputFormat.setOutputPath(job, new Path(args[1]));

System.exit(job.waitForCompletion(true) ? 0 : 1);

}}

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 54 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Header Files to include

import java.io.IOException;

import java.util.StringTokenizer;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.io.IntWritable;

import org.apache.hadoop.io.Text;

import org.apache.hadoop.mapreduce.Job;

import org.apache.hadoop.mapreduce.Mapper;

import org.apache.hadoop.mapreduce.Reducer;

import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;

import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

import org.apache.hadoop.util.GenericOptionsParser;

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 55 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Execution of Hadoop Program in Eclipse

Step1:

1 Starting Hadoop in terminal using command:$ Start-all.sh

2 Use JPS command to check all services of Hadoop are startedor not.

Step 2: open EclipseStep 3: Go to file ⇒ New ⇒ ProjectSelect Java Project and click on Next buttonWrite project name and click on Finish button

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 56 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Continue...

Step 4: Right side it creates a project

1 Right click on Project ⇒ New ⇒ Class

2 Write Name of Class and then Click Finish

3 Write MapReduce program in that class

Step 5: Write JAVA Program

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 57 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Continue...

Step 6: Importing JAR files

1 Right click on Project and select properties (Alt+Enter)

2 Select Java Build Path ⇒ Click on Libraries, then click on addexternal JARS

3 Select the following jars from Hadoop library./usr/local/Hadoop/share/Hadoop/common/libs

/usr/local/Hadoop/share/Hadoop/hdfs/libs

/usr/local/Hadoop/share/Hadoop/httpfs/libs

/usr/local/Hadoop/share/Hadoop/mapreduce/libs

/usr/local/Hadoop/share/Hadoop/yarn/libs

/usr/local/Hadoop/share/Hadoop/tools/

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 58 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

Continue ....

Step 7: Set input file path

1 Create folder in home dir

2 copy text files in to that

3 Select path of Input

Step 8: Set input and output path

1 right click on source ⇒ Run As ⇒ Run Configuration ⇒Argument

2 Enter your input and out put path with a single space

3 click on Run

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 59 / 60

Outline Introduction Big Data Sources of Big Data Tools HDFS Installation Configuration Starting & Stopping Map Reduce Execution

thank You

Sree Venkateswara College of Engineering, Nellore, A. P.

Big Data & Hadoop

March 25, 2017 Slide: 60 / 60