Hadoop Setup€¦ · Logging into the VM (Linux / OS X) I master: ssh -p 2222...

Hadoop Setup

webis @ Bauhaus Universitat Weimar

October 27, 2014

1 / 22

:q

2 / 22

Starting the VM for the first time. . .

3 / 22

Starting without graphical output:

I master:

VBoxHeadless -s "BigData Hadoop VM (master)"

I slave1:

VBoxHeadless -s "BigData Hadoop VM (slave1)"

I slave2:

VBoxHeadless -s "BigData Hadoop VM (slave2)"

4 / 22

Logging into the VM (Linux / OS X)

I master: ssh -p 2222 hadoop-admin@localhost

I slave1: ssh -p 2223 hadoop-admin@localhost

I slave2: ssh -p 2224 hadoop-admin@localhost

Password: hadoop-admin

5 / 22

Logging into the VM (Windows)

I PuTTY. . .

I KiTTY. . .

I OpenSSH (Cygwin). . .

Same ports, same user, same password.

6 / 22

Basic Linux shell commands

I Changing directory: cd DIRNAME

I Printing current working directory: pwd

I Creating a directory: mkdir DIRNAME

I Deleting a file: rm FILENAME

I Deleting an (empty) directory: rmdir DIRNAME

I Editing a text file: nano FILENAME / vim FILENAME

I Making a file executable: chmod +x FILENAME

I Running a command as root: sudo COMMAND

I Getting the hell outta there: exit / logout

7 / 22

Download and unpack Hadoop

I Download Hadoop:

wget http://webis5/bigdata/hadoop-2.5.1.tar.gz

I Unpack it:

tar xf hadoop-2.5.1.tar.gz

I Move it to /opt:

sudo mv hadoop-2.5.1 /opt/hadoop

8 / 22

Set necessary environment variables

I Create the file /etc/profile.d/99-hadoop.sh with thefollowing contents:

export HADOOP_PREFIX="/opt/hadoop"

export PATH="$PATH:/opt/hadoop/bin:/opt/hadoop/sbin"

I Make it executable and source it:

sudo chmod +x /etc/profile.d/99-hadoop.sh

source /etc/profile

I Set JAVA HOME in/opt/hadoop/etc/hadoop/hadoop-env.sh, line 25:

export JAVA_HOME="/usr/lib/jvm/java-7-oracle"

9 / 22

Test Hadoop Binary

$ hadoop

Usage: hadoop [--config confdir] COMMAND

where COMMAND is one of:

fs run a generic filesystem user client

version print the version

jar <jar> run a jar file

...

10 / 22

Configure HDFS

I /opt/hadoop/etc/hadoop/core-site.xml:

<configuration>

<property>

<name>fs.defaultFS</name>

<value>hdfs://master:9000</value>

</property>

</configuration>

11 / 22

Configure HDFS

I /opt/hadoop/etc/hadoop/hdfs-site.xml:

<configuration>

<property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoop-admin/dfs/dn</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>/home/hadoop-admin/dfs/nn</value>

</property>

<property>

<name>dfs.permission.supergroup</name>

<value>hadoop-admin</value>

</property>

</configuration>

12 / 22

Configure MapReduce

I /opt/hadoop/etc/hadoop/mapred-site.xml:

<configuration>

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

</configuration>

13 / 22

Configure YARN

I /opt/hadoop/etc/hadoop/yarn-site.xml:

<configuration>

<property>

<name>yarn.resourcemanager.hostname</name>

<value>master</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce_shuffle</value>

</property>

</configuration>

14 / 22

Configure slave host names

I /opt/hadoop/etc/hadoop/slaves:

slave1

slave2

15 / 22

Copy configuration to slave hosts

I Log into each slave:

ssh slave[1|2]

I Copy over the Hadoop distribution and environment scripts:

scp -r master:/opt/hadoop .

scp master:/etc/profile.d/99-hadoop.sh .

sudo mv hadoop /opt/hadoop

sudo cp 99-hadoop.sh /etc/profile.d/

rm 99-hadoop.sh

16 / 22

Start Hadoop

I On the master node format the HDFS. . .

hdfs namenode -format

I . . . and start it:

start-dfs.sh

I Then start YARN:

start-yarn.sh

17 / 22

Test Hadoop

I Check with jps that everything is running:

$ jps

5178 Jps

4646 NameNode

1339 SecondaryNameNode

4076 ResourceManager

I On the slave nodes:

$ jps

3900 DataNode

4161 Jps

3994 NodeManager

18 / 22

Test Hadoop

If everything looks fine, you should be able to access the web UI inyour browser:

I NodeManager: http://10.42.23.101:50070/

I ResourceManager: http://10.42.23.101:8088/

19 / 22

Create user home directory on HDFS

You can now access the HDFS, but you need to create a homedirectory for your user:

hadoop fs -mkdir -p /user/hadoop-admin

Browsing the HDFS:

I List files: hadoop fs -ls DIRNAME

I Remove (empty) directory: hadoop fs -rmdir DIRNAME

I Remove file: hadoop fs -rm FILENAME

I Copy from local FS to HDFS:hadoop fs -copyFromLocal SOURCE DEST

20 / 22

Start a first MapReduce Job

Once everything is set up, we can start one of the standard exampleMapReduce jobs:

cd /opt/hadoop/share/hadoop/mapreduce/

yarn jar hadoop-mapreduce-examples-*.jar pi 16 1000000

Output:

Number of Maps = 16

Samples per Map = 1000000

Wrote input for Map #0

Wrote input for Map #1

...

Job Finished in 186.733 seconds

Estimated value of Pi is 3.14159125000000000000

21 / 22

Questions?

22 / 22

Hadoop Setup€¦ · Logging into the VM (Linux / OS X) I master: ssh -p 2222...

Documents

Transcript of Hadoop Setup€¦ · Logging into the VM (Linux / OS X) I master: ssh -p 2222...