Hadoop Setup€¦ · Logging into the VM (Linux / OS X) I master: ssh -p 2222...
Transcript of Hadoop Setup€¦ · Logging into the VM (Linux / OS X) I master: ssh -p 2222...
Hadoop Setup
webis @ Bauhaus Universitat Weimar
October 27, 2014
1 / 22
:q
2 / 22
Starting the VM for the first time. . .
3 / 22
Starting without graphical output:
I master:
VBoxHeadless -s "BigData Hadoop VM (master)"
I slave1:
VBoxHeadless -s "BigData Hadoop VM (slave1)"
I slave2:
VBoxHeadless -s "BigData Hadoop VM (slave2)"
4 / 22
Logging into the VM (Linux / OS X)
I master: ssh -p 2222 hadoop-admin@localhost
I slave1: ssh -p 2223 hadoop-admin@localhost
I slave2: ssh -p 2224 hadoop-admin@localhost
Password: hadoop-admin
5 / 22
Logging into the VM (Windows)
I PuTTY. . .
I KiTTY. . .
I OpenSSH (Cygwin). . .
Same ports, same user, same password.
6 / 22
Basic Linux shell commands
I Changing directory: cd DIRNAME
I Printing current working directory: pwd
I Creating a directory: mkdir DIRNAME
I Deleting a file: rm FILENAME
I Deleting an (empty) directory: rmdir DIRNAME
I Editing a text file: nano FILENAME / vim FILENAME
I Making a file executable: chmod +x FILENAME
I Running a command as root: sudo COMMAND
I Getting the hell outta there: exit / logout
7 / 22
Download and unpack Hadoop
I Download Hadoop:
wget http://webis5/bigdata/hadoop-2.5.1.tar.gz
I Unpack it:
tar xf hadoop-2.5.1.tar.gz
I Move it to /opt:
sudo mv hadoop-2.5.1 /opt/hadoop
8 / 22
Set necessary environment variables
I Create the file /etc/profile.d/99-hadoop.sh with thefollowing contents:
export HADOOP_PREFIX="/opt/hadoop"
export PATH="$PATH:/opt/hadoop/bin:/opt/hadoop/sbin"
I Make it executable and source it:
sudo chmod +x /etc/profile.d/99-hadoop.sh
source /etc/profile
I Set JAVA HOME in/opt/hadoop/etc/hadoop/hadoop-env.sh, line 25:
export JAVA_HOME="/usr/lib/jvm/java-7-oracle"
9 / 22
Test Hadoop Binary
$ hadoop
Usage: hadoop [--config confdir] COMMAND
where COMMAND is one of:
fs run a generic filesystem user client
version print the version
jar <jar> run a jar file
...
10 / 22
Configure HDFS
I /opt/hadoop/etc/hadoop/core-site.xml:
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
</configuration>
11 / 22
Configure HDFS
I /opt/hadoop/etc/hadoop/hdfs-site.xml:
<configuration>
<property>
<name>dfs.datanode.data.dir</name>
<value>/home/hadoop-admin/dfs/dn</value>
</property>
<property>
<name>dfs.namenode.name.dir</name>
<value>/home/hadoop-admin/dfs/nn</value>
</property>
<property>
<name>dfs.permission.supergroup</name>
<value>hadoop-admin</value>
</property>
</configuration>
12 / 22
Configure MapReduce
I /opt/hadoop/etc/hadoop/mapred-site.xml:
<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
13 / 22
Configure YARN
I /opt/hadoop/etc/hadoop/yarn-site.xml:
<configuration>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>master</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
14 / 22
Configure slave host names
I /opt/hadoop/etc/hadoop/slaves:
slave1
slave2
15 / 22
Copy configuration to slave hosts
I Log into each slave:
ssh slave[1|2]
I Copy over the Hadoop distribution and environment scripts:
scp -r master:/opt/hadoop .
scp master:/etc/profile.d/99-hadoop.sh .
sudo mv hadoop /opt/hadoop
sudo cp 99-hadoop.sh /etc/profile.d/
rm 99-hadoop.sh
16 / 22
Start Hadoop
I On the master node format the HDFS. . .
hdfs namenode -format
I . . . and start it:
start-dfs.sh
I Then start YARN:
start-yarn.sh
17 / 22
Test Hadoop
I Check with jps that everything is running:
$ jps
5178 Jps
4646 NameNode
1339 SecondaryNameNode
4076 ResourceManager
I On the slave nodes:
$ jps
3900 DataNode
4161 Jps
3994 NodeManager
18 / 22
Test Hadoop
If everything looks fine, you should be able to access the web UI inyour browser:
I NodeManager: http://10.42.23.101:50070/
I ResourceManager: http://10.42.23.101:8088/
19 / 22
Create user home directory on HDFS
You can now access the HDFS, but you need to create a homedirectory for your user:
hadoop fs -mkdir -p /user/hadoop-admin
Browsing the HDFS:
I List files: hadoop fs -ls DIRNAME
I Remove (empty) directory: hadoop fs -rmdir DIRNAME
I Remove file: hadoop fs -rm FILENAME
I Copy from local FS to HDFS:hadoop fs -copyFromLocal SOURCE DEST
20 / 22
Start a first MapReduce Job
Once everything is set up, we can start one of the standard exampleMapReduce jobs:
cd /opt/hadoop/share/hadoop/mapreduce/
yarn jar hadoop-mapreduce-examples-*.jar pi 16 1000000
Output:
Number of Maps = 16
Samples per Map = 1000000
Wrote input for Map #0
Wrote input for Map #1
...
Job Finished in 186.733 seconds
Estimated value of Pi is 3.14159125000000000000
21 / 22
Questions?
22 / 22