HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP...

42
HADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap And Garrett Poppe

Transcript of HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP...

Page 1: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

HADOOP

Installation and Deployment of a Single Node on a Linux System

Presented by:

Liv Nguekap

And

Garrett Poppe

Page 2: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Topics

●  Create hadoopuser and group

●  Edit sudoers

●  Set up SSH

●  Install JDK

●  Install Hadoop

●  Editting Hadoop settings

●  Running Hadoop

●  Resources

Page 3: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Add Hadoopuser

Page 4: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Edit sudoers

Page 5: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Set up SSH

●  sudo chown hadoopuser ~/.ssh

●  sudo chmod 700 ~/.ssh

●  sudo chmod 600 ~/.ssh/id_rsa

●  sudo cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

●  sudo chmod 600 ~/.ssh/authorized_keys

●  Edit /etc/ssh/sshd_config

Page 6: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Install JDK

●  Login as hadoopuser ●  Uninstall previous

versions of JDK ●  Download current

version of JDK ●  Install JDK ●  Edit JAVA_HOME

and PATH variables in “~/.bashrc” file

Page 7: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Install Hadoop ●  Download current stable release

●  Untar the download

●  tar xzvf hadoop-2.4.1.tar.gz

●  Move the untarred folder

●  sudo mv hadoop-2.4.1 /usr/local/hadoop

●  Change ownership and create nodes

●  sudo chown -R hadoopuser:hadoopgroup /usr/local/hadoop

●  mkdir -p ~/hadoopspace/hdfs/namenode

●  mkdir -p ~/hadoopspace/hdfs/datanode

Page 8: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Install Hadoop

●  Edit Hadoop variables in “~/.bashrc” file

●  After editing file, use command to apply.

●  “source ~/.bashrc”

Page 9: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Editing Hadoop settings

●  Go to directory located at /usr/local/hadoop/etc/hadoop

●  Create a copy of mapred-site.xml.template as mapred-site.xml

Page 10: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Editing Hadoop settings

●  Edit mapred-site.xml ●  Add code between

<configuration> tabs

<property> <name>mapreduce.fra

mework.name </name> <value>yarn</value> </property>

Page 11: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Editing Hadoop settings

●  Edit yarn-site.xml ●  Add code between

<configuration> tabs

<property> <name>yarn.nodemana

ger.aux-services </name> <value> mapreduce_shuffle </

value> </property>

Page 12: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Editing Hadoop settings

●  Edit core-site.xml ●  Add code between

<configuration> tabs

<property> <name> fs.default.name </name> <value> hdfs://localhost:9000 </value> </property>

Page 13: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Editing Hadoop settings

●  Edit hdfs-site.xml

●  Add code between <configuration> tabs

<property>

<name>

dfs.replication

</name>

<value>

1

</value>

</property>

<property>

<name>

dfs.name.dir

</name>

<value>

file:///home/hadoopuser/hadoopspace/hdfs/namenode

</value>

</property>

<property>

<name>

dfs.data.dir

</name>

<value>

file:///home/hadoopuser/hadoopspace/hdfs/datanode

</value>

</property>

Page 14: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Editing Hadoop settings

●  Edit “hadoop-env.sh” ●  Create the

JAVA_HOME variable using current JDK path.

Page 15: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Editting Hadoop settings

●  Format the namenode using the command “hdfs namenode -format”

Page 16: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Running Hadoop

●  Start services ●  “start-dfs.sh” ●  “start-yarn.sh”

Page 17: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Running Hadoop

●  Use jps command to make sure all services are running.

Page 18: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Running Hadoop

●  Open web browser. ●  Type “localhost:

50070” into address bar to access web interface.

Page 19: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

● WRITING MAPREDUCE PROGRAMS FOR HADOOP

Part 2

Page 20: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Languages/scripts used

●  We will talk about two languages used to write mapreduce programs in Hadoop:

●  1) Pig Script (also called Pig Latin) ●  2) Java

Page 21: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Pig

●  What is Pig? ●  Pig is a high-level platform for creating

MapReduce programs used with Hadoop. ●  It is somewhat similar to SQL

Page 22: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

How Pig Works

●  Pig has two modes of execution:

●  1) Local Mode - To run Pig in local mode, you need access to a single machine.

●  2) Mapreduce Mode - To run Pig in mapreduce mode, you need access to a Hadoop cluster and HDFS installation.

Page 23: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Syntax to run Pig

●  To run Pig in Local Mode, use: ●  pig -x local id.pig

●  To run Pig in Mapreduce Mode, use: ●  pig id.pig or pig -x mapreduce id.pig

Page 24: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Ways to run Pig

●  Whether in local or mapreduce mode, there are 3 ways of running Pig:

●  1) Grunt shell ●  2) Batch or script file ●  3) Embedded Program

Page 25: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Sample Grunt Shell Code

Page 26: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Grunt Shell Commands

Page 27: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Grunt Shell Commands

Page 28: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Batch

●  To run Pig with batch files, the pig script is written entirely into a Pig file and the file run with Pig.

●  A sample syntax for the file totalmiles.pig is: ●  Pig totalmiles.pig

Page 29: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Content of file totalmiles.pig

Page 30: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Content of 1987 flight data file

Page 31: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

JAVA

●  We tested the mapreduce function of Hadoop on a java program called WordCount.java

●  The wordcount.class is provided in the examples that come with hadoop installation

Page 32: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Where to find the Hadoop Examples

Page 33: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

JAVA

Page 34: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Launching WordCount job

Page 35: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

WordCount Processing

Page 36: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

WordCount Processing

Page 37: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Results

Page 38: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Results

Page 39: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

WordCount.Java - Map

Page 40: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

WordCount.java - Reduce

Page 41: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

● Fin

● Thank YOU!!

Page 42: HADOOP - Computer Science Departmentcsc.csudh.edu/btang/seminar/slides/liv_garrett.pdfHADOOP Installation and Deployment of a Single Node on a Linux System Presented by: Liv Nguekap

Resources

●  http://alanxelsys.com/hadoop-v2-single-node-installation-on-centos-6-5/

●  http://tecadmin.net/setup-hadoop-2-4-single-node-cluster-on-linux/

●  http://hadoop.apache.org/ ●  http://cs.smith.edu/dftwiki/index.php/

Hadoop_Tutorial_1_--_Running_WordCount ●  https://pig.apache.org/docs/r0.10.0/basic.html