docshare01.docshare.tipsdocshare01.docshare.tips/files/31041/310418924.pdf · Hadoop Tutorials...

K Honggoogle.com/+KHongSanFrancisco

2,276 followers

Follow

bogotobogo

Hadoop 2.6Installing on

Ubuntu 14.04(Single-Node

Cluster) 24

Hadoop on Ubuntu 14.04

Hadoop 2.6 - Installing on Ubuntu 14.04 (Single-... http://www.bogotobogo.com/Hadoop/BigData_ha...

1 of 23 Sunday 31 January 2016 09:27 PM

Big Data &HadoopTutorials

Hadoop 2.6 -Installing onUbuntu 14.04(Single-NodeCluster)

Hadoop - RunningMapReduce Job

Hadoop -Ecosystem

CDH5.3 Install onfour EC2 instances(1 Name node and3 Datanodes)using Cloudera

In this chapter, we'll install a single-node Hadoop clusterbacked by the Hadoop Distributed File System on Ubuntu.

Installing Java

Hadoop framework is written in Java!!

k@laptop:~$ cd ~

# Update the source listk@laptop:~$ sudo apt-get update

# The OpenJDK project is the default version of Java # that is provided from a supported Ubuntu repository.k@laptop:~$ sudo apt-get install default-jdk

k@laptop:~$ java -versionjava version "1.7.0_65"OpenJDK Runtime Environment (IcedTea 2.5.3) (7u71-2.5.3-0ubuntu0.1OpenJDK 64-Bit Server VM (build 24.65-b04, mixed mode)

Adding a dedicated Hadoopuser

k@laptop:~$ sudo addgroup hadoopAdding group `hadoop' (GID 1002) ...Done.

k@laptop:~$ sudo adduser --ingroup hadoop hduserAdding user `hduser' ...Adding new user `hduser' (1001) with group `hadoop' ...Creating home directory `/home/hduser' ...Copying files from `/etc/skel' ...Enter new UNIX password: Retype new UNIX password: passwd: password updated successfullyChanging the user information for hduserEnter the new value, or press ENTER for the default

Full Name []: Room Number []: Work Phone []: Home Phone []:



Manager 5

CDH5 APIs

QuickStart VMs forCDH 5.3

QuickStart VMs forCDH 5.3 II - Testingwith wordcount

QuickStart VMs forCDH 5.3 II - HiveDB query

Scheduled startand stop CDHservices

Zookeeper & KafkaInstall

Zookeeper & Kafka- single nodesingle brokercluster

Zookeeper & Kafka- single nodemultiple brokercluster

OLTP vs OLAP

Apache HadoopTutorial I with CDH- Overview

Apache HadoopTutorial II withCDH - MapReduceWord Count

Apache HadoopTutorial III withCDH - MapReduceWord Count 2

Apache Hadoop(CDH 5) HiveIntroduction

CDH5 - HiveUpgrade to 1.3 tofrom 1.2

Other []: Is the information correct? [Y/n] Y

Installing SSH

ssh has two main components:

ssh : The command we use to connect to remotemachines - the client.

1.

sshd : The daemon that is running on the server andallows clients to connect to the server.

2.

The ssh is pre-enabled on Linux, but in order to start sshddaemon, we need to install ssh �rst. Use this command todo that :

k@laptop:~$ sudo apt-get install ssh

This will install ssh on our machine. If we get somethingsimilar to the following, we can think it is setup properly:

k@laptop:~$ which ssh/usr/bin/ssh

k@laptop:~$ which sshd/usr/sbin/sshd

Create and Setup SSHCertificates

Hadoop requires SSH access to manage its nodes, i.e.remote machines plus our local machine. For oursingle-node setup of Hadoop, we therefore need tocon�gure SSH access to localhost.



Apache Hadoop :Creating HBasetable with HBaseshell and HUE

Apache Hadoop :Creating HBasetable with Java API

Apache Hadoop :HBase in Pseudo-Distributed mode

Apache HadoopHBase : Map,Persistent, Sparse,Sorted, DistributedandMultidimensional

Apache Hadoop -Flume with CDH5:a single-nodeFlume deployment(telnet example)

Apache Hadoop(CDH 5) Flumewith VirtualBox :syslog example viaNettyAvroRpcClient

List of ApacheHadoop hdfscommands

Apache Hadoop :CreatingWordcount JavaProject withEclipse Part 1

Apache Hadoop :CreatingWordcount JavaProject withEclipse Part 2

Apache Hadoop :Creating Card JavaProject withEclipse usingCloudera VM

So, we need to have SSH up and running on our machineand con�gured it to allow SSH public key authentication.

Hadoop uses SSH (to access its nodes) which wouldnormally require the user to enter a password. However,this requirement can be eliminated by creating and settingup SSH certi�cates using the following commands. If askedfor a �lename just leave it blank and press the enter key tocontinue.

k@laptop:~$ su hduserPassword: k@laptop:~$ ssh-keygen -t rsa -P ""Generating public/private rsa key pair.Enter file in which to save the key (/home/hduser/.ssh/id_rsa): Created directory '/home/hduser/.ssh'.Your identification has been saved in /home/hduser/.ssh/id_rsa.Your public key has been saved in /home/hduser/.ssh/id_rsa.pub.The key fingerprint is:50:6b:f3:fc:0f:32:bf:30:79:c2:41:71:26:cc:7d:e3 hduser@laptopThe key's randomart image is:+--[ RSA 2048]----+| .oo.o || . .o=. o || . + . o . || o = E || S + || . + || O + || O o || o.. |+-----------------+

hduser@laptop:/home/k$ cat $HOME/.ssh/id_rsa.pub >> $HOME/.ssh/aut

The second command adds the newly created key to thelist of authorized keys so that Hadoop can use ssh withoutprompting for a password.

We can check if ssh works:

hduser@laptop:/home/k$ ssh localhostThe authenticity of host 'localhost (127.0.0.1)' can't be establisECDSA key fingerprint is e1:8b:a0:a5:75:ef:f4:b4:5e:a9:ed:be:64:beAre you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'localhost' (ECDSA) to the list of knowWelcome to Ubuntu 14.04.1 LTS (GNU/Linux 3.13.0-40-generic x86_64)...



UnoExample forCDH5 - local run

Apache Hadoop :CreatingWordcount MavenProject withEclipse

WordcountMapReduce withOozie work�owwith Hue browser -CDH 5.3 Hadoopcluster usingVirtualBox andQuickStart VM

Spark 1.2 usingVirtualBox andQuickStart VM -wordcount

SparkProgrammingModel : ResilientDistributedDataset (RDD)

Apache Spark 1.3with PySpark(Spark Python API)Shell

Apache Spark 1.2with PySpark(Spark Python API)Wordcount usingCDH5

Apache Spark 1.2Streaming

Sponsor OpenSource developmentactivities and freecontents foreveryone.

Thank you.

- K Hong

Install Hadoop

hduser@laptop:~$ wget http://mirrors.sonic.net/apache/hadoop/commohduser@laptop:~$ tar xvzf hadoop-2.6.0.tar.gz

We want to move the Hadoop installation to the /usr/local/hadoop directory using the following command:

hduser@laptop:~/hadoop-2.6.0$ sudo mv * /usr/local/hadoop[sudo] password for hduser: hduser is not in the sudoers file. This incident will be reported

Oops!... We got:

"hduser is not in the sudoers file. This incident will be reported

This error can be resolved by logging in as a root user, andthen add hduser to sudo:

hduser@laptop:~/hadoop-2.6.0$ su kPassword:

k@laptop:/home/hduser$ sudo adduser hduser sudo[sudo] password for k: Adding user `hduser' to group `sudo' ...Adding user hduser to group sudoDone.

Now, the hduser has root priviledge, we can move theHadoop installation to the /usr/local/hadoop directorywithout any problem:

k@laptop:/home/hduser$ sudo su hduser

hduser@laptop:~/hadoop-2.6.0$ sudo mv * /usr/local/hadoop hduser@laptop:~/hadoop-2.6.0$ sudo chown -R hduser:hadoop /usr/loc



Elastic Search,Logstash,

Kibana

ELK : Elastic Search

ELK : Logstashwith Elastic Search

ELK : Logstash,ElasticSearch, andKibana 4

DataVisualization

Data VisualizationTools

Data Visualization

Setup Configuration Files

The following �les will have to be modi�ed to complete theHadoop setup:

~/.bashrc1.

/usr/local/hadoop/etc/hadoop/hadoop-env.sh2.

/usr/local/hadoop/etc/hadoop/core-site.xml3.

/usr/local/hadoop/etc/hadoop/mapred-site.xml.template

4.

/usr/local/hadoop/etc/hadoop/hdfs-site.xml5.

1. ~/.bashrc:

Before editing the .bashrc �le in our home directory, weneed to �nd the path where Java has been installed to setthe JAVA_HOME environment variable using the followingcommand:

hduser@laptop update-alternatives --config javaThere is only one alternative in link group java (providing /usr/bNothing to configure.

Now we can append the following to the end of ~/.bashrc:

hduser@laptop:~$ vi ~/.bashrc

#HADOOP VARIABLES START



D3.js

DevOps

Phases ofContinuousIntegration

Softwaredevelopmentmethodology

Introduction toDevOps

Samples ofContinuousIntegration (CI) /ContinuousDelivery (CD) - Usecases

Artifact repositoryand repositorymanagement

Linux - General,shellprogramming,processes &signals ...

RabbitMQ...

MariaDB

New Relic APMwith NodeJS :simple agent setupon AWS instance

Nagios - Theindustry standardin IT infrastructuremonitoring onUbuntu

DevOps / Sys Admin Q& A

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64export HADOOP_INSTALL=/usr/local/hadoopexport PATH=$PATH:$HADOOP_INSTALL/binexport PATH=$PATH:$HADOOP_INSTALL/sbinexport HADOOP_MAPRED_HOME=$HADOOP_INSTALLexport HADOOP_COMMON_HOME=$HADOOP_INSTALLexport HADOOP_HDFS_HOME=$HADOOP_INSTALLexport YARN_HOME=$HADOOP_INSTALLexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"#HADOOP VARIABLES END

hduser@laptop:~$ source ~/.bashrc

note that the JAVA_HOME should be set as the path justbefore the '.../bin/':

hduser@ubuntu-VirtualBox:~$ javac -versionjavac 1.7.0_75

hduser@ubuntu-VirtualBox:~$ which javac/usr/bin/javac

hduser@ubuntu-VirtualBox:~$ readlink -f /usr/bin/javac/usr/lib/jvm/java-7-openjdk-amd64/bin/javac

2. /usr/local/hadoop/etc/hadoop/hadoop-env.sh

We need to set JAVA_HOME by modifying hadoop-env.sh�le.

hduser@laptop:~$ vi /usr/local/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/lib/jvm/java-7-openjdk-amd64

Adding the above statement in the hadoop-env.sh �leensures that the value of JAVA_HOME variable will beavailable to Hadoop whenever it is started up.

3. /usr/local/hadoop/etc/hadoop/core-site.xml:



(1) - LinuxCommands

(2) - Networks

(3) - Linux Systems

(4) - Scripting(Ruby/Shell/Python)

(5) - Con�gurationManagement

(6) - AWS VPCsetup(public/privatesubnets with NAT)

(7) - Web server

(8) - Database

(9) - Linux System /ApplicationMonitoring,PerformanceTuning, Pro�lingMethods & Tools

(10) - TroubleShooting: Load,Throughput,Response timeand Leaks

(11) - SSH key pairs& SSL Certi�cate

(12) - Why is thedatabase slow?

(13) - Is my website down?

(14) - Is my serverdown?

(15) - Why is theserver sluggish?

(16A) - Servingmultiple domainsusing Virtual Hosts- Apache

The /usr/local/hadoop/etc/hadoop/core-site.xml �lecontains con�guration properties that Hadoop uses whenstarting up.This �le can be used to override the default settings thatHadoop starts with.

hduser@laptop:~$ sudo mkdir -p /app/hadoop/tmphduser@laptop:~$ sudo chown hduser:hadoop /app/hadoop/tmp

Open the �le and enter the following in between the<con�guration></con�guration> tag:

hduser@laptop:~$ vi /usr/local/hadoop/etc/hadoop/core-site.xml

<configuration> <property> <name>hadoop.tmp.dir</name> <value>/app/hadoop/tmp</value> <description>A base for other temporary directories.</descriptio </property>

<property> <name>fs.default.name</name> <value>hdfs://localhost:54310</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. T uri's scheme determines the config property (fs.SCHEME.impl) nam the FileSystem implementation class. The uri's authority is use determine the host, port, etc. for a filesystem.</description> </property></configuration>

4. /usr/local/hadoop/etc/hadoop/mapred-site.xml

By default, the /usr/local/hadoop/etc/hadoop/ foldercontains/usr/local/hadoop/etc/hadoop/mapred-site.xml.template�le which has to be renamed/copied with the namemapred-site.xml:

hduser@laptop:~$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.t

The mapred-site.xml �le is used to specify which



(16B) - Servingmultiple domainsusing server block- Nginx

(17) - Linux startupprocess

Jenkins

Install

Con�guration -Manage Jenkins -security setup

Adding job andbuild

Scheduling jobs

Managing_plugins

Git/GitHubplugins, SSH keyscon�guration, andFork/Clone

JDK & Mavensetup

Buildcon�guration forGitHub Javaapplication withMaven

Build Action forGitHub Javaapplication withMaven - ConsoleOutput, UpdatingMaven

Commit tochanges to GitHub& new test results- Build Failure

Commit tochanges to GitHub& new test results

framework is being used for MapReduce.We need to enter the following content in between the<con�guration></con�guration> tag:

<configuration> <property> <name>mapred.job.tracker</name> <value>localhost:54311</value> <description>The host and port that the MapReduce job tracker ru at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property></configuration>

5. /usr/local/hadoop/etc/hadoop/hdfs-site.xml

The /usr/local/hadoop/etc/hadoop/hdfs-site.xml �leneeds to be con�gured for each host in the cluster that isbeing used.It is used to specify the directories which will be used asthe namenode and the datanode on that host.

Before editing this �le, we need to create two directorieswhich will contain the namenode and the datanode for thisHadoop installation.This can be done using the following commands:

hduser@laptop:~$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenohduser@laptop:~$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanohduser@laptop:~$ sudo chown -R hduser:hadoop /usr/local/hadoop_sto

Open the �le and enter the following content in betweenthe <con�guration></con�guration> tag:

hduser@laptop:~$ vi /usr/local/hadoop/etc/hadoop/hdfs-site.xml

<configuration> <property> <name>dfs.replication</name> <value>1</value>



- Successful Build

Adding codecoverage andmetrics

Jenkins on EC2 -creating an EC2account, ssh toEC2, and installApache server

Jenkins on EC2 -setting up Jenkinsaccount, plugins,and Con�gureSystem(JAVA_HOME,MAVEN_HOME,noti�cation email)

Jenkins on EC2 -Creating a Mavenproject

Jenkins on EC2 -Con�guringGitHub Hook andNoti�cationservice to Jenkinsserver for anychanges to therepository

Jenkins on EC2 -Line Coverage withJaCoCo plugin

Jenkins BuildPipeline &DependencyGraph Plugins

Jenkins Build FlowPlugin

Puppet

Puppet withAmazon AWS I -Puppet accounts

<description>Default block replication. The actual number of replications can be specified when the file The default is used if replication is not specified in create ti </description> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/local/hadoop_store/hdfs/namenode</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/local/hadoop_store/hdfs/datanode</value> </property></configuration>

Format the New HadoopFilesystem

Now, the Hadoop �le system needs to be formatted so thatwe can start to use it. The format command should beissued with write permission since it creates currentdirectoryunder /usr/local/hadoop_store/hdfs/namenode folder:

hduser@laptop:~$ hadoop namenode -formatDEPRECATED: Use of this script to execute hdfs command is deprecatInstead use the hdfs command for it.

15/04/18 14:43:03 INFO namenode.NameNode: STARTUP_MSG: /************************************************************STARTUP_MSG: Starting NameNodeSTARTUP_MSG: host = laptop/192.168.1.1STARTUP_MSG: args = [-format]STARTUP_MSG: version = 2.6.0



Puppet withAmazon AWS II(ssh &puppetmaster/puppetinstall)

Puppet withAmazon AWS III -Puppet runningHello World

Puppet CodeBasics -Terminology

Puppet withAmazon AWS onCentOS 7 (I) -Master setup onEC2

Puppet withAmazon AWS onCentOS 7 (II) -Con�guring aPuppet MasterServer withPassenger andApache

Puppet master/agent ubuntu14.04 install onEC2 nodes

Puppet masterpost install tasks -master's namesand certi�catessetup,

Puppet agent postinstall tasks -con�gure agent,hostnames, andsign request

EC2 Puppetmaster/agentbasic tasks - mainmanifest with a �leresource/moduleand immediate

STARTUP_MSG: classpath = /usr/local/hadoop/etc/hadoop...STARTUP_MSG: java = 1.7.0_65************************************************************/15/04/18 14:43:03 INFO namenode.NameNode: registered UNIX signal h15/04/18 14:43:03 INFO namenode.NameNode: createNameNode [-format]15/04/18 14:43:07 WARN util.NativeCodeLoader: Unable to load nativFormatting using clusterid: CID-e2f515ac-33da-45bc-8466-5b1100a2bf15/04/18 14:43:09 INFO namenode.FSNamesystem: No KeyProvider found15/04/18 14:43:09 INFO namenode.FSNamesystem: fsLock is fair:true15/04/18 14:43:10 INFO blockmanagement.DatanodeManager: dfs.block.15/04/18 14:43:10 INFO blockmanagement.DatanodeManager: dfs.nameno15/04/18 14:43:10 INFO blockmanagement.BlockManager: dfs.namenode.15/04/18 14:43:10 INFO blockmanagement.BlockManager: The block del15/04/18 14:43:10 INFO util.GSet: Computing capacity for map Block15/04/18 14:43:10 INFO util.GSet: VM type = 64-bit15/04/18 14:43:10 INFO util.GSet: 2.0% max memory 889 MB = 17.8 MB15/04/18 14:43:10 INFO util.GSet: capacity = 2^21 = 2097152 e15/04/18 14:43:10 INFO blockmanagement.BlockManager: dfs.block.acc15/04/18 14:43:10 INFO blockmanagement.BlockManager: defaultReplic15/04/18 14:43:10 INFO blockmanagement.BlockManager: maxReplicatio15/04/18 14:43:10 INFO blockmanagement.BlockManager: minReplicatio15/04/18 14:43:10 INFO blockmanagement.BlockManager: maxReplicatio15/04/18 14:43:10 INFO blockmanagement.BlockManager: shouldCheckFo15/04/18 14:43:10 INFO blockmanagement.BlockManager: replicationRe15/04/18 14:43:10 INFO blockmanagement.BlockManager: encryptDataTr15/04/18 14:43:10 INFO blockmanagement.BlockManager: maxNumBlocksT15/04/18 14:43:10 INFO namenode.FSNamesystem: fsOwner 15/04/18 14:43:10 INFO namenode.FSNamesystem: supergroup 15/04/18 14:43:10 INFO namenode.FSNamesystem: isPermissionEnabled 15/04/18 14:43:10 INFO namenode.FSNamesystem: HA Enabled: false15/04/18 14:43:10 INFO namenode.FSNamesystem: Append Enabled: true15/04/18 14:43:11 INFO util.GSet: Computing capacity for map INode15/04/18 14:43:11 INFO util.GSet: VM type = 64-bit15/04/18 14:43:11 INFO util.GSet: 1.0% max memory 889 MB = 8.9 MB15/04/18 14:43:11 INFO util.GSet: capacity = 2^20 = 1048576 e15/04/18 14:43:11 INFO namenode.NameNode: Caching file names occur15/04/18 14:43:11 INFO util.GSet: Computing capacity for map cache15/04/18 14:43:11 INFO util.GSet: VM type = 64-bit15/04/18 14:43:11 INFO util.GSet: 0.25% max memory 889 MB = 2.2 MB15/04/18 14:43:11 INFO util.GSet: capacity = 2^18 = 262144 en15/04/18 14:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemod15/04/18 14:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemod15/04/18 14:43:11 INFO namenode.FSNamesystem: dfs.namenode.safemod15/04/18 14:43:11 INFO namenode.FSNamesystem: Retry cache on namen15/04/18 14:43:11 INFO namenode.FSNamesystem: Retry cache will use15/04/18 14:43:11 INFO util.GSet: Computing capacity for map NameN15/04/18 14:43:11 INFO util.GSet: VM type = 64-bit15/04/18 14:43:11 INFO util.GSet: 0.029999999329447746% max memory15/04/18 14:43:11 INFO util.GSet: capacity = 2^15 = 32768 ent15/04/18 14:43:11 INFO namenode.NNConf: ACLs enabled? false15/04/18 14:43:11 INFO namenode.NNConf: XAttrs enabled? true15/04/18 14:43:11 INFO namenode.NNConf: Maximum size of an xattr: 15/04/18 14:43:12 INFO namenode.FSImage: Allocated new BlockPoolId15/04/18 14:43:12 INFO common.Storage: Storage directory /usr/loca15/04/18 14:43:12 INFO namenode.NNStorageRetentionManager: Going t15/04/18 14:43:12 INFO util.ExitUtil: Exiting with status 015/04/18 14:43:12 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************



execution on anagent node

Setting up puppetmaster and agentwith simple scriptson EC2 / remoteinstall fromdesktop

EC2 Puppet -Install lamp with amanifest ('puppetapply')

EC2 Puppet -Install lamp with amodule

Puppet variablescope

Puppet packages,services, and �les

Puppet packages,services, and �lesII with nginx

Puppet templates

Puppet creatingand managinguser accounts withSSH access

Puppet Lockinguser accounts &deploying sudoers�le

Puppet execresource

Puppet classesand modules

Puppet Forgemodules

Puppet Express

Puppet Express 2

Puppet 4 :

SHUTDOWN_MSG: Shutting down NameNode at laptop/192.168.1.1************************************************************/

Note that hadoop namenode -format command should beexecuted once before we start using Hadoop.If this command is executed again after Hadoop has beenused, it'll destroy all the data on the Hadoop �le system.

Starting Hadoop

Now it's time to start the newly installed single nodecluster.We can use start-all.sh or (start-dfs.sh and start-yarn.sh)

k@laptop:~$ cd /usr/local/hadoop/sbin

k@laptop:/usr/local/hadoop/sbin$ lsdistribute-exclude.sh start-all.cmd stop-balancer.shhadoop-daemon.sh start-all.sh stop-dfs.cmdhadoop-daemons.sh start-balancer.sh stop-dfs.shhdfs-config.cmd start-dfs.cmd stop-secure-dns.shhdfs-config.sh start-dfs.sh stop-yarn.cmdhttpfs.sh start-secure-dns.sh stop-yarn.shkms.sh start-yarn.cmd yarn-daemon.shmr-jobhistory-daemon.sh start-yarn.sh yarn-daemons.shrefresh-namenodes.sh stop-all.cmdslaves.sh stop-all.sh

k@laptop:/usr/local/hadoop/sbin$ sudo su hduser

hduser@laptop:/usr/local/hadoop/sbin$ start-all.shhduser@laptop:~$ start-all.shThis script is Deprecated. Instead use start-dfs.sh and start-yarn15/04/18 16:43:13 WARN util.NativeCodeLoader: Unable to load nativStarting namenodes on [localhost]localhost: starting namenode, logging to /usr/local/hadoop/logs/halocalhost: starting datanode, logging to /usr/local/hadoop/logs/haStarting secondary namenodes [0.0.0.0]0.0.0.0: starting secondarynamenode, logging to /usr/local/hadoop/15/04/18 16:43:58 WARN util.NativeCodeLoader: Unable to load nativstarting yarn daemonsstarting resourcemanager, logging to /usr/local/hadoop/logs/yarn-hlocalhost: starting nodemanager, logging to /usr/local/hadoop/logs



Changes

Puppet--con�gprint

Puppet withDocker

Chef

What is Chef?

Chef install onUbuntu 14.04 -Local Workstationvia omnibusinstaller

Setting up HostedChef server

VirtualBox viaVagrant with Chefclient provision

Creating and usingcookbooks on aVirtualBox node

Chef server installon Ubuntu 14.04

Chef workstationsetup on EC2Ubuntu 14.04

Chef Client Node -KnifeBootstrapping anode on EC2ubuntu 14.04

Docker 1.7

Docker install onAmazon Linux AMI

Docker install onEC2 Ubuntu 14.04

We can check if it's really up and running:

hduser@laptop:/usr/local/hadoop/sbin$ jps9026 NodeManager7348 NameNode9766 Jps8887 ResourceManager7507 DataNode

The output means that we now have a functional instanceof Hadoop running on our VPS (Virtual private server).

Another way to check is using netstat:

hduser@laptop:~$ netstat -plten | grep java(Not all processes could be identified, non-owned process info will not be shown, you would have to be root to see it all.)tcp 0 0 0.0.0.0:50020 0.0.0.0:* tcp 0 0 127.0.0.1:54310 0.0.0.0:* tcp 0 0 0.0.0.0:50090 0.0.0.0:* tcp 0 0 0.0.0.0:50070 0.0.0.0:* tcp 0 0 0.0.0.0:50010 0.0.0.0:* tcp 0 0 0.0.0.0:50075 0.0.0.0:* tcp6 0 0 :::8040 :::* tcp6 0 0 :::8042 :::* tcp6 0 0 :::8088 :::* tcp6 0 0 :::49630 :::* tcp6 0 0 :::8030 :::* tcp6 0 0 :::8031 :::* tcp6 0 0 :::8032 :::* tcp6 0 0 :::8033 :::*

Stopping Hadoop

$ pwd/usr/local/hadoop/sbin

$ lsdistribute-exclude.sh httpfs.sh start-all.sh hadoop-daemon.sh mr-jobhistory-daemon.sh start-balancer.sh hadoop-daemons.sh refresh-namenodes.sh start-dfs.cmd hdfs-config.cmd slaves.sh start-dfs.sh hdfs-config.sh start-all.cmd start-secure-dns.s

We run stop-all.sh or (stop-dfs.sh and stop-yarn.sh) to



Docker containervs Virtual Machine

Docker install onUbuntu 14.04

Docker HelloWorld Application

Docker image andcontainer viadocker commands(search, pull, run,ps, restart, attach,and rm)

More on dockerrun command(docker run -it,docker run --rm,etc.)

File sharingbetween host andcontainer (dockerrun -d -p -v)

Linking containersand volume fordatastore

Docker�le - BuildDocker imagesautomatically I -FROM,MAINTAINER, andbuild context

Docker�le - BuildDocker imagesautomatically II -revisiting FROM,MAINTAINER, buildcontext, andcaching

Docker�le - BuildDocker imagesautomatically III -RUN

Docker�le - BuildDocker imagesautomatically IV -

stop all the daemons running on our machine:

hduser@laptop:/usr/local/hadoop/sbin$ pwd/usr/local/hadoop/sbinhduser@laptop:/usr/local/hadoop/sbin$ lsdistribute-exclude.sh httpfs.sh start-all.cmd hadoop-daemon.sh kms.sh start-all.sh hadoop-daemons.sh mr-jobhistory-daemon.sh start-balancer.sh hdfs-config.cmd refresh-namenodes.sh start-dfs.cmd hdfs-config.sh slaves.sh start-dfs.sh hduser@laptop:/usr/local/hadoop/sbin$ hduser@laptop:/usr/local/hadoop/sbin$ stop-all.shThis script is Deprecated. Instead use stop-dfs.sh and stop-yarn.s15/04/18 15:46:31 WARN util.NativeCodeLoader: Unable to load nativStopping namenodes on [localhost]localhost: stopping namenodelocalhost: stopping datanodeStopping secondary namenodes [0.0.0.0]0.0.0.0: no secondarynamenode to stop15/04/18 15:46:59 WARN util.NativeCodeLoader: Unable to load nativstopping yarn daemonsstopping resourcemanagerlocalhost: stopping nodemanagerno proxyserver to stop

Hadoop Web Interfaces

Let's start the Hadoop again and see its Web UI:

hduser@laptop:/usr/local/hadoop/sbin$ start-all.sh

http://localhost:50070/ - web UI of the NameNodedaemon



CMD

Docker�le - BuildDocker imagesautomatically V -WORKDIR, ENV,ADD, andENTRYPOINT

Installing LAMP viapuppet on Docker

Docker install viaPuppet

Vagrant

VirtualBox &Vagrant install onUbuntu 14.04

Creating aVirtualBox usingVagrant

Provisioning

Networking - PortForwarding

Vagrant Share

Vagrant Rebuild &Teardown

AWS (AmazonWeb Services)

AWS : Creating asnapshot (cloningan image)

AWS : AttachingAmazon EBSvolume to aninstance

AWS : Creating anEC2 instance and



attaching AmazonEBS volume to theinstance usingPython botomodule with Userdata

AWS : Creating aninstance to a newregion by copyingan AMI

AWS : S3 (SimpleStorage Service) 1

AWS : S3 (SimpleStorage Service) 2 -Creating andDeleting a Bucket

AWS : S3 (SimpleStorage Service) 3 -Bucket Versioning

AWS : S3 (SimpleStorage Service) 4 -Uploading a large�le

AWS : S3 (SimpleStorage Service) 5 -Uploadingfolders/�lesrecursively

AWS : S3 (SimpleStorage Service) 6 -Bucket Policy forFile/FolderView/Download

AWS : S3 (SimpleStorage Service) 7 -How to Copy orMove Objectsfrom one region toanother

AWS : S3 (SimpleStorage Service) 8 -Archiving S3 Datato Glacier

AWS : Creating a

SecondaryNameNode



CloudFrontdistribution withan Amazon S3origin

AWS : ELB (ElasticLoad Balancer) :listeners, healthchecks

AWS : LoadBalancing withHAProxy (HighAvailability Proxy)

AWS : VirtualBoxon EC2

AWS : NTP setupon EC2

AWS & OpenSSL :Creating /Installing a ServerSSL Certi�cate

AWS : OpenVPNAccess Server 2Install

AWS : VPC (VirtualPrivate Cloud) 1 -netmask, subnets,default gateway,and CIDR

AWS : VPC (VirtualPrivate Cloud) 2 -VPC Wizard

AWS : VPC (VirtualPrivate Cloud) 3 -VPC Wizard withNAT

DevOps / SysAdmin Q & A (VI) -AWS VPC setup(public/privatesubnets with NAT)

AWS - OpenVPNProtocols : PPTP,L2TP/IPsec, and

(Note) I had to restart Hadoop to get this SecondaryNamenode.

DataNode



OpenVPN

AWS : Autoscalinggroup (ASG)

AWS : Adding aSSH User Accounton Linux Instance

AWS : WindowsServers - RemoteDesktopConnections usingRDP

AWS : Scheduledstopping andstarting aninstance - python& cron

AWS : ElasticBeanstalk withNodeJS

AWS : Detectingstopped instanceand sending analert email usingMandrill smtp

AWS : Identity andAccessManagement(IAM) Roles forAmazon EC2

AWS : AmazonRoute 53

AWS : AmazonRoute 53 - DNS(Domain NameServer) setup

AWS : Redshiftdata warehouse

AWS : RDSConnecting to aDB InstanceRunning the SQLServer DatabaseEngine



AWS : RDSImporting andExporting SQLServer Data

AWS : RDSPostgreSQL &pgAdmin III

AWS : RDSPostgreSQL 2 -Creating/Deletinga Table

AWS : MySQLReplication :Master-slave

AWS : MySQLbackup & restore

AWS RDS : Cross-Region ReadReplicas forMySQL andSnapshots forPostgreSQL

AWS : RestoringPostgres on EC2instance from S3backup

Redis

Redis vsMemcached

Redis 3.0.1 Install

Powershell 4Tutorial

Powersehll :Introduction

Powersehll : HelpSystem

Bogotobogo's contents

To see more items, click left or right arrow.

I hope this site is informative and helpful.



Powersehll :Runningcommands

Powersehll :Providers

Powersehll :Pipeline

Powersehll :Objects

Powersehll :Remote Control

WindowsManagementInstrumentation(WMI)

How to EnableMultiple RDPSessions inWindows 2012Server

How to install andcon�gure FTPserver on IIS 8 inWindows 2012Server

How to Run Exe asa Service onWindows 2012Server

SQL Inner, Left,Right, and OuterJoins

Git/GitHubTutorial

One page expresstutorial for GITand GitHub

Installation

Using Hadoop

If we have an application that is set up to use Hadoop, wecan �re that up and start using it with our Hadoopinstallation!

Big Data & HadoopTutorialsHadoop 2.6 - Installing on Ubuntu 14.04 (Single-NodeCluster)

Hadoop - Running MapReduce Job

Hadoop - Ecosystem

CDH5.3 Install on four EC2 instances (1 Name node and 3Datanodes) using Cloudera Manager 5

CDH5 APIs

QuickStart VMs for CDH 5.3

QuickStart VMs for CDH 5.3 II - Testing with wordcount

QuickStart VMs for CDH 5.3 II - Hive DB query

Scheduled start and stop CDH services

Zookeeper & Kafka Install

Zookeeper & Kafka - single node single broker cluster

Zookeeper & Kafka - single node multiple broker cluster

OLTP vs OLAP

Apache Hadoop Tutorial I with CDH - Overview

Apache Hadoop Tutorial II with CDH - MapReduce WordCount

Apache Hadoop Tutorial III with CDH - MapReduce WordCount 2

Apache Hadoop (CDH 5) Hive Introduction

CDH5 - Hive Upgrade to 1.3 to from 1.2

Apache Hadoop : Creating HBase table with HBase shelland HUE

Apache Hadoop : Creating HBase table with Java API



add/status/log

commit and di�

git commit--amend

Deleting andRenaming �les

Undoing Things :File Checkout &Unstaging

Reverting commit

Soft Reset - (gitreset --soft <SHAkey>)

Mixed Reset -Default

Hard Reset - (gitreset --hard <SHAkey>)

Creating &switchingBranches

Fast-forwardmerge

Rebase &Three-way merge

Merge con�ictswith a simpleexample

GitHub Accountand SSH

Uploading toGitHub

GUI

Branching &Merging

Merging con�icts

GIT on Ubuntu

Apache Hadoop : HBase in Pseudo-Distributed mode

Apache Hadoop HBase : Map, Persistent, Sparse, Sorted,Distributed and Multidimensional

Apache Hadoop - Flume with CDH5: a single-node Flumedeployment (telnet example)

Apache Hadoop (CDH 5) Flume with VirtualBox : syslogexample via NettyAvroRpcClient

List of Apache Hadoop hdfs commands

Apache Hadoop : Creating Wordcount Java Project withEclipse Part 1

Apache Hadoop : Creating Wordcount Java Project withEclipse Part 2

Apache Hadoop : Creating Card Java Project with Eclipseusing Cloudera VM UnoExample for CDH5 - local run

Apache Hadoop : Creating Wordcount Maven Project withEclipse

Wordcount MapReduce with Oozie work�ow with Huebrowser - CDH 5.3 Hadoop cluster using VirtualBox andQuickStart VM

Spark 1.2 using VirtualBox and QuickStart VM - wordcount

Spark Programming Model : Resilient Distributed Dataset(RDD)

Apache Spark 1.3 with PySpark (Spark Python API) Shell

Apache Spark 1.2 with PySpark (Spark Python API)Wordcount using CDH5

Apache Spark 1.2 Streaming



and OS X -Focused onBranching

Setting up aremote repository/ pushing localproject andcloning the remoterepo

Fork vs Clone,Origin vsUpstream

Git/GitHubTerminologies

Git/GitHub viaSourceTree I :Commit & Push

Git/GitHub viaSourceTree II :Branching &Merging

Git/GitHub viaSourceTree III : GitWork Flow

Git/GitHub viaSourceTree IV : GitReset

Subversion

Subversion InstallOn Ubuntu 14.04

Subversioncreating andaccessing I

Subversioncreating andaccessing II

NLTK (Natural Language Toolkit)tf-idf with scikit-learn - 20161 comment • 16 days ago

Avatarmanohar — Hi Sir, I am new to

nltk, currently i am working on nltk

in my project. I have several

Qt5 Tutorial QThreads - QMutex- 20161 comment • 16 days ago

Avatarkrishna kumar mandal — Dear

Hong, I am reading QT on your site,

trying to improve my own skills. I

Open Source...: About Us1 comment • 16 days ago

AvatarMinyoung Jun — 사이트이름은보

고또보고라는뜻인가요? The site

Python Tutorial: InterviewQuestions - 20161 comment • 16 days ago

AvatarManu Phatak — On that last one,

ALSO ON BOGOTOBOGO.COM

1 Comment bogotobogo.com Login1

Share⤤ Sort by Best

Join the discussion…

• Reply •

Priya Dandekar • 13 days ago

Thanks for your tutorial. I had difficult time following the official

documentation on apache website. This seems to be exactly what I was

looking for.

Do you recommend creating a new user hdfs for all installations ? or using

a default admin user will be fine? I have older version of ubuntu on my

lenovo machine and your steps are fine so far.

Also please recommend a good book for beginners perspective. I am new to

learning big data and looking for knowing all aspects of it. Some big data

books are here - http://www.fromdev.com/2014/07... however I am not sure

which to pick. What is your recommendation for beginners?

WHAT'S THIS?

Recommend

Share ›



Custom Search

Search

Home | About Us | Products | Our Services | Contact Us | Bogotobogo ©2016 | Back to Top

Apache Hadoop (CDH 5)Install - 2016

Zookeeper & Kafka Install: A single node and ...

Apache Hadoop (CDH 5)Tutorial II : MapReduce ...

Hadoop 2.4 - Running aMapReduce Job - 2016

Zookeeper & Kafka Inst- 2016



docshare01.docshare.tipsdocshare01.docshare.tips/files/31041/310418924.pdf · Hadoop Tutorials...

Documents

Transcript of docshare01.docshare.tipsdocshare01.docshare.tips/files/31041/310418924.pdf · Hadoop Tutorials...