Lab 2: Running a Hadoop Application

2: Running a Hadoop Application

Zubair Nabi

[email protected]

April 18, 2013

Zubair Nabi 2: Running a Hadoop Application April 18, 2013 1 / 8

Running Hadoop

The first order of the day is to format the Hadoop DFS

Jump to the Hadoop directory and execute: bin/hadoopnamenode -format

To run Hadoop and HDFS: bin/start-all.sh

To terminate them: bin/stop-all.sh

Running Hadoop

Generating a dataset

Create a temporary directory to hold the data: mkdir/tmp/gutenberg

Jump to it: cd /tmp/gutenberg

Download text files:I wget www.gutenberg.org/etext/20417I wget www.gutenberg.org/etext/5000I wget www.gutenberg.org/etext/4300

Copying the dataset to the HDFS

Jump to the Hadoop directory and execute: bin/hadoop dfs-copyFromLocal /tmp/gutenberg /ccw/gutenberg

Running Wordcount

Execute: bin/hadoop jar hadoop-examples-1.0.4.jarwordcount /ccw/gutenberg /ccw/gutenberg-output

Retrieving results from the HDFS

Copy to the local FS: bin/hadoop dfs -getmerge/ccw/gutenberg-output /tmp/gutenberg-output

Accessing the web interface

JobTracker: http://localhost:50030

TaskTracker: http://localhost:50060

Accessing the web interface

JobTracker: http://localhost:50030

TaskTracker: http://localhost:50060

Reference(s)

Running Hadoop on Ubuntu Linux (Single-Node Cluster):http://www.michael-noll.com/tutorials/running-hadoop-on-ubuntu-linux-single-node-cluster/

Technology