Table of Contents - VMwaredocs.hol.vmware.com/HOL-2013/hol-sdc-1309_beta_pdf_en.pdf · There is a...
Transcript of Table of Contents - VMwaredocs.hol.vmware.com/HOL-2013/hol-sdc-1309_beta_pdf_en.pdf · There is a...
Table of ContentsLab Overview .................................................................................................................... 2
HOL-SDC-1309-vSphere Big Data Extensions Lab Modules .....................................3Verify Hadoop Clusters Have Started ...................................................................... 5
Module 1 - Hadoop POC In Under An Hour ...................................................................... 10Module Overview .................................................................................................. 11Manage Hadoop Pooled Resources........................................................................ 12Create Basic Hadoop Cluster via Web Client ......................................................... 22Create Hadoop Cluster with Serengeti CLI ............................................................ 30Add Data and Run a MapReduce Job ..................................................................... 34Scale out Hadoop Cluster via the Web UI.............................................................. 48Scale Out Cluster via Serengeti CLI....................................................................... 52
Module 2 - Fast and Easy Deployment of Hadoop Clusters .............................................59Module Overview .................................................................................................. 60Configure and Deploy Hadoop Clusters................................................................. 61Resize Hadoop cluster after creation .................................................................... 72Export configuration and create customized cluster .............................................76
Module 3- Compute only Clusters on Shared HDFS......................................................... 87Module Overview .................................................................................................. 88Create Compute only cluster................................................................................. 89Hadoop Filesystem Commands Within CLI ............................................................ 96
Module 4 - Highly Available Hadoop.............................................................................. 100Module Overview ................................................................................................ 101How to Create Hadoop Cluster with HA Enabled ................................................. 102Kill the Namenode and Verify HA restart............................................................. 114
Module 5 - Fast and Easy Deployment of HBase Clusters .............................................123Module Overview ................................................................................................ 124Configure and Deploy HBase Clusters................................................................. 125Manage Hadoop Pooled Resources...................................................................... 137
Module 6 - Elastic Hadoop............................................................................................. 143Module Overview ................................................................................................ 144Manage Existing Tier1 and Tier2 Clusters ........................................................... 145Manual Hadoop Elasticity .................................................................................... 158Automatic Hadoop Elasticity ............................................................................... 169
HOL-SDC-1309
Page 1VMware Beta Program CONFIDENTIAL
Lab Overview
HOL-SDC-1309
Page 2VMware Beta Program CONFIDENTIAL
HOL-SDC-1309-vSphere Big DataExtensions Lab ModulesThe Apache Hadoop software library is a framework that allows for the distributedprocessing of large data sets across clusters of computers, designed to scale up fromsingle servers to thousands of machines, each offering local computation and storage.Hadoop is being used by enterprises across verticals for Big Data analytics to help makebetter business decisions based on large data sets.
VMware enables you to easily and efficiently deploy and use Hadoop on your existingvirtual infrastructure through vSphere Big Data Extensions (BDE). BDE makes Hadoopvirtualization-aware, improves performance in virtual environments and enablesdeployment of Highly Available Hadoop clusters in minutes. vSphere BDE automatesdeployment of a Hadoop cluster, and thus provides better Hadoop manageability andusability.
In this lab you will execute 15 minute Lightning labs to configure and deploy Hadoop, aswell as HBase, clusters on local storage in minutes. You will also create compute-onlyclusters that allow the use of shared storage across multiple Map Reduce clusters,providing multi-tenancy and enabling easy Scale in or scale out of compute resources.You will also add vSphere High Availability (HA) to improve resiliency of your Hadoopclusters.
There is a full length lab to simulate a complete Hadoop Proof of concept. In the POCmodule, you will configure and deploy your cluster, add data to HDFS and run MapReduce jobs against your deployed cluster.
In the final module, you will configure manual and automatic scaling of your hadoopclusters. You will use resource pools with differing priorities and run Map Reduce jobs tosee how vSphere will scale in or out cluster nodes based on your priorities and theresource demands placed on the system.
Note: Some of the lab modules contain lengthy command lines that must be typed intothe Putty session. To ease this process, there is a Commands.txt file on the desktop inthe "Lab Files" folder. You can copy the relevant commands and paste them into Puttyusing this file if you don't want to type them manually.
The modules and timing are as follows:
Hadoop POC In Under an Hour
• Add resources• Create cluster Hadoop/Hbase• Put data into HDFS• Execute MR/HBase Jobs
HOL-SDC-1309
Page 3VMware Beta Program CONFIDENTIAL
• Visualize results (Partner product)
Fast and Easy Deployment of Hadoop Cluster 15 Minutes
• Create and resize standard Hadoop Clusters with multiple Distros and config• Modify Hadoop configuration after creation (e.g. change scheduler from FIFO to
Fair)• Manage resources (Add/delete Network, Resource Pools, Datastores)
Create Compute only clusters on shared HDFS 15 Minutes
• Deploy with HVE to enable locality• Show node placement policy controls in Serengeti.
Highly Available Hadoop 15 Minutes
• Deploy master nodes on shared storage with HA enabled.• Kill the NameNode process and see the node automatically restart.
Create HBase Cluster 15 Minutes
• Create and Resize Hbase Cluster• Manage resources (Add/delete Network, Resource Pools, Datastores)
Simulate Elasticity POC 45 Minutes
• Create Tier 1 and Tier 2 Clusters• Execute MR jobs on both clusters• Show manual elasticity• Show Automated Elasticity
Lab Captains: Michael West, Andy Hill, Robert Jensen
HOL-SDC-1309
Page 4VMware Beta Program CONFIDENTIAL
Verify Hadoop Clusters Have StartedAs part of the deployment of your lab, the Hadoop clusters that were created for youhave been automatically started. It is possible that the clusters you need did not startsuccessfully. Before starting any module, perform the following steps to verify that yourclusters have started.
Four clusters have been pre-created for you. The small_Hbase cluster is only needed ifyou are going to do the Fast and Easy Deployment of Hbase Clusters module. If not, donot start this cluster as it takes significant system resources
HOL-SDC-1309
Page 5VMware Beta Program CONFIDENTIAL
Connect to Serengeti CLI
From the Windows Desktop perform the following steps:
1) Click on the Putty Icon
2) Select the SerengetiCLI session
3) Click the Open button
4) At the OS login enter the Password. It is password
HOL-SDC-1309
Page 6VMware Beta Program CONFIDENTIAL
Connect to Serengeti CLI to Verify Running Clusters
1. To open the CLI, type Serengeti The CLI commands are case sensitive.2. Type connect --host localhost:8443 to connect to our management server.
Username is administrator@corp password is VMware1!
You are now in a Command Line environment that can interact directly with yourHadoop Clusters.
HOL-SDC-1309
Page 7VMware Beta Program CONFIDENTIAL
Listing Hadoop Cluster Details
1. To see your clusters type cluster list (note that up arrow will let you see yourcommand history.
2. small_cluster, Tier1 and Tier2 clusters must have a STATUS of RUNNING. If thestatus is STOPPED or ERROR you will need to start the cluster again.
3. Hbase Cluster only needs to be running if you are going to do the HbaseDeployment Module
4. Note: Clusters take several minutes to start, so you don't want to start a clusteryou are not going to use.
Start a Hadoop Cluster
1) Type cluster start --name "cluster name". Replace "cluster name" with the name ofthe cluster that needs to be started.
HOL-SDC-1309
Page 8VMware Beta Program CONFIDENTIAL
Note: You do not need to wait for the clusters to start, since the first few steps in eachmodule do not depend on the clusters running. Feel free to continue, and check back onthe status of the start command.
HOL-SDC-1309
Page 9VMware Beta Program CONFIDENTIAL
Module 1 - Hadoop POC InUnder An Hour
HOL-SDC-1309
Page 10VMware Beta Program CONFIDENTIAL
Module OverviewHadoop clusters typically require specialized expertise and dedicated hardwareinfrastructure to deploy. In this module we will explore the benefits of running Hadoopon VMware vSphere. By virtualizing Hadoop clusters, you are able to deploy multipleVMs per host, which also allows you to separate data from compute. By doing this, youcan seamlessly scale the compute layer within your Hadoop cluster, while keeping thedata separate. Other benefits of running Hadoop on vSphere include:
• Run multiple compute workloads on the same physical hardware, optimizingresource utilization
• Eliminate the need for dedicated hardware to run Hadoop workloads• Inherit better reliability and flexibility due to High Availability (HA), vMotion, and
DRS features of the vSphere platform
In this module, we will simulate a rapid proof of concept using vSphere Big DataExtensions. We will explore the following key concepts:
• Mapping vSphere resources to Big Data Extensions resources for consumption byHadoop
• Quickly create multiple types of Hadoop clusters• Load data and run MapReduce jobs• Run a Pig script via the Serengeti CLI• Simple scale-out of Hadoop compute node on vSphere
Note: If you have not already done so, You MUST run the "Verify Hadoop Clusters HaveStarted" step under the lab overview section prior to doing this module.
Let's get started!
HOL-SDC-1309
Page 11VMware Beta Program CONFIDENTIAL
Manage Hadoop Pooled ResourcesHadoop makes excellent use of the system resources that are made available to it. In anenvironment with shared physical resources that have been virtualized, it is importantto appropriately assign the resources that can be used by your Hadoop clusters.vSphere allows you to specifically make available CPU, RAM, Storage and VirtualNetworks to your Hadoop clusters. In this module, you will use the vSphere Big DataExtensions Plugin to add network and storage resources to the Hadoop Clusters.
Login to vSphere Web Client
Open Firefox and login to the vSphere Web Client by checking the "Use Windows sessionauthentication" checkbox, and clicking the Login button. The username is corp\administrator the password is VMware1! (Note: ! is part of the password)
HOL-SDC-1309
Page 12VMware Beta Program CONFIDENTIAL
Explore the vSphere Environment
In the vSphere Web Client, click the "Hosts and Clusters" icon.
HOL-SDC-1309
Page 13VMware Beta Program CONFIDENTIAL
Hosts and Clusters View
First, take a look at the resource pools that are configured in this vSphere environment.The vSphere Big Data Extensions will leverage these resource pools to ensure ourHadoop clusters have the resources they need based upon business need, while alsoensuring they do not overconsume resources and impact other applications.
HOL-SDC-1309
Page 14VMware Beta Program CONFIDENTIAL
Datastore View
Next, click over to the datastore tab, just to get a sense of the datastores and networksthat are configured in this environment.
Notice that there is an NFS volume configured, and there are also local VMFS volumesconfigured on each ESXi host. In the next steps, we'll configure our Hadoop clusters touse both shared and local storage, which is a key benefit of using the vSphere Big DataExtensions.
HOL-SDC-1309
Page 15VMware Beta Program CONFIDENTIAL
Navigate to Big Data Extensions Plugin
To get to the Big Data Extensions plugin, first click the "Home" icon, then choose "BigData Extensions" from the sidebar menu.
HOL-SDC-1309
Page 16VMware Beta Program CONFIDENTIAL
Explore BDE Plugin
First, let's take a look at the Hadoop clusters that are already configured in thisenvironment. Click on the "Big Data Clusters" item in the sidebar menu.
HOL-SDC-1309
Page 17VMware Beta Program CONFIDENTIAL
View Hadoop Clusters
Notice that there are four Hadoop clusters configured in this vSphere environment. Thecolumnar view on the right indicates each cluster's name, status, which distribution it isrunning, which vSphere resource pool it belongs to, and the list of nodes. As we saw inthe last lesson, resource pools are an important way to manage how Hadoop consumesthe underlying physical resources.
This is an important differentiator over using dedicated physical hardware for Hadoop,where the resources may be wasted when Hadoop jobs are not running. vSphere allowsyou to run a mix of workloads, while also guaranteeing resources based upon businessneeds.
HOL-SDC-1309
Page 18VMware Beta Program CONFIDENTIAL
View Cluster Actions
Right-click on one of the clusters in the right-hand pane, and note all the actions thatcan be taken on a cluster from within the vSphere Web Client. We will investigate thesefurther in a future lesson.
Now click back to the Big Data Extensions main menu by clicking the button indicated instep 2 in the screenshot above.
HOL-SDC-1309
Page 19VMware Beta Program CONFIDENTIAL
Click Resources
Click the "Resources" item under Inventory lists, highlighted above.
HOL-SDC-1309
Page 20VMware Beta Program CONFIDENTIAL
Map vCenter Resources to BDE Inventory Items
This screen is where we map vSphere datastores into constructs that the Big DataExtensions will allocate to Hadoop clusters. Notice that several mappings are alreadymade.
The Big Data Extensions can consume both shared and local storage as appropriate forthe specific need. In this screen, we can see that there is a "defaultDSShared" item thatis mapped to the ds-site-a-nfs01 vSphere datastore. There is also a "dsLOCAL" item thatis mapped to any vSphere datastore that is local to a host, and begins with the name"esx". Wildcards allow multiple datastores to be easily managed and consumed by ourHadoop clusters.
Go ahead and click the plus sign to walk through creating a new datastore mapping.
In the Add Datastore popup, you would enter any name that you choose, the actualvSphere datastore name, and then indicate whether the datastore is shared or local.Since we already have all the mappings we need, click cancel when you are done.
HOL-SDC-1309
Page 21VMware Beta Program CONFIDENTIAL
Create Basic Hadoop Cluster via WebClientIn this lesson, we will create a Hadoop cluster via the vSphere Web Client.
Navigate to Big Data Extensions Plugin
Click on "Big Data Extensions" in the side bar.
HOL-SDC-1309
Page 22VMware Beta Program CONFIDENTIAL
Simulate Creating a Basic Hadoop Cluster
A Basic Hadoop cluster mimics the standard deployment you see in physical Hadoopclusters. The Datanode and Tasktrackers reside within a single VM. In other lessons youwill see that it is advantageous to separate these services into their own VMs.
Click "Big Data Clusters"
Click the New Cluster button
Click on the icon indicated above, which is the "new cluster" button.
HOL-SDC-1309
Page 23VMware Beta Program CONFIDENTIAL
Name and type
You will choose your preferred Hadoop Distribution. Supported distros include Cloudera,Mapr, HortonWorks, and PivotablHD. We will use the opensource Apache distribution inthis module.
There are several deployment types for your clusters. You can mimic the typical physicalHadoop deployment with the Basic Hadoop Cluster. This type will separate theNamenode and Jobtracker into their own Virtual Machines, however each Tasktrackerand Datanodes combination will be in a single Virtual Machine. You also have the optionof separating the Compute (Tasktracker) from the Datanode using the Data/ComputeSeparation Hadoop option. This facilitates the elastic scaling of Compute you can see inModule 6.
For this Module, select the following options:
Big Data Cluster Name : Basic Hadoop
Hadoop distribution: apache
Deployment Type: Basic Hadoop Cluster
HOL-SDC-1309
Page 24VMware Beta Program CONFIDENTIAL
Select the custom template
Each distinct Hadoop Node configuration is called a NodeGroup. You will see specificNodeGroups based on the Deployment Type you selected, but you can also use theCommand Line Interface, or Customize Deployment Type in the UI, to define any type ofNodeGroup you want. In this section, you are sizing the virtual machine CPU, RAM andData storage for each NodeGroup. You will also define the number of a specificnodegroup to deploy. In the image above, you are going to deploy 3 Worker Nodes,containing a TaskTracker and DataNode, 1 ComputeMaster (Jobtracker) and 1DataMaster (NameNode).
Click the Resource template button, and select Customize
Customize the template
Note that you can select Shared or Local storage. Typically, Hadoop has been deployedwith local storage to provide the data locality that is central to its performance. You cansee that each NodeGroup can be configured with its own Datastore type. This meansthat, for instance, your DataNodes can run on Local storage, while you have theJobtracker and Namenodes on Shared storage. This allows the use of vSphere HA or FTto improve the availability of those nodes while still ensuring data locality.
Change the default to :
vCpu number : 1
HOL-SDC-1309
Page 25VMware Beta Program CONFIDENTIAL
Memory size : 3748
Storage Size : 10
Datastore type : Shared
Click ok.
HOL-SDC-1309
Page 26VMware Beta Program CONFIDENTIAL
Select the Resources for the Cluster
Make sure to select the Customize option and size each NodeGroups resources as in theprevious step.
Set the number of nodes for each worker to 1.
Network and resource pool
Leave the Hadoop topology and Network settings at their default values.
Click the select button to select a resource pool.
HOL-SDC-1309
Page 27VMware Beta Program CONFIDENTIAL
Select Resource Pools
Select one of the resource pools listed above, and click OK.
Cancel Creation
Depending on the size of the cluster, it takes anywhere from 6 to 20 minutes to deployand be running. Due to resource and time constraints for the lab, we will not actuallycreate the cluster
Click cancel to cancel the deployment, and watch the video below, to see a deploymentof a hadoop cluster.
HOL-SDC-1309
Page 28VMware Beta Program CONFIDENTIAL
Video
HOL-SDC-1309
Page 29VMware Beta Program CONFIDENTIAL
Create Hadoop Cluster with SerengetiCLIIn the last lesson, we used the vSphere Web Client to walk through creating a newHadoop cluster. We will now run through the same process using the Serengeti CLI. TheCLI allows you to have finer-grained control over cluster creation, including the ability tospecify what roles run on which nodes in the cluster.
Use Putty to SSH to management-server
Click the PuTTy icon on the desktop, choose the SerengetiCLI item, and click Open.
Login as root with a password of 'password'
HOL-SDC-1309
Page 30VMware Beta Program CONFIDENTIAL
Connect to the Serengeti CLI
1. Type serengeti2. Type connect --host localhost:8443 to connect to our management server.
Username is "[email protected]" password is VMware1!
HOL-SDC-1309
Page 31VMware Beta Program CONFIDENTIAL
Explore the Serengeti CLI
Try out the following commands in the CLI to get an idea of how the environment isconfigured:
• cluster list - lists all the Hadoop clusters and some of their configuration• resourcepool list - lists vSphere resource pools• datastore list - lists the Serengeti datastores• network list - lists the network mappings
Create a Hadoop Cluster via the CLI
Now we will walk through how to create a Hadoop cluster via the CLI. This process issimilar to using the vSphere Web Client, but there are more options available.
View Specfile
Hadoop cluster configuration can be controlled via spec files. Let's take a look inside oneof the spec files before we create the cluster.
From the Serengeti management Server:
1. cd /opt/serengeti/cong
HOL-SDC-1309
Page 32VMware Beta Program CONFIDENTIAL
2. more small_cluster.json
Using a json file via the CLI allows more control over the configuration of the cluster,including role placement across nodes in the cluster.
Create the Cluster - Video
This video shows the process to create a compute-only cluster using an existing HDFSfilesystem. We won't actually create another cluster in this lab due to time constraints,but here is the command to use in the CLI along with the json file:
cluster create --name SharedHDFSTest --specfile SharedHDFS.json
HOL-SDC-1309
Page 33VMware Beta Program CONFIDENTIAL
Add Data and Run a MapReduce JobIn this section we will:
• Use the HDFS Put command from the CLI to add files to the Hadoop Filesystem• Run a Map Reduce job in an existing Hadoop Cluster• Run a Pig script in an existing Hadoop Cluster• From the vSphere Web Client, use the Hadoop management pages to view job
status and the results file
Use Putty to SSH to management-server
You should still be connected to the Serengeti CLI, but if not, re-connect as follows:
Click the PuTTy icon, choose the SerengetiCLI session, and click the Open button.
Login as serengeti with a password of 'password'
HOL-SDC-1309
Page 34VMware Beta Program CONFIDENTIAL
Connect to the Serengeti CLI
1. Type serengeti2. Type connect --host localhost:8443 to connect to our management server.
Username is "[email protected]" password is VMware1!
HOL-SDC-1309
Page 35VMware Beta Program CONFIDENTIAL
Select the small_cluster as our target
To choose the small_cluster as the target we will be working with, enter the followingcommand into the CLI:
cluster target --name small_cluster
HOL-SDC-1309
Page 36VMware Beta Program CONFIDENTIAL
Put data into HDFS
As a simple example of a MapReduce job, we will do a word count on the SerengetiUser's Guide. We first need to upload a text version of the document into the HDFSfilesystem:
fs put --from /home/serengeti/serengeti.txt --to /tmp/input/serengetitest
HOL-SDC-1309
Page 37VMware Beta Program CONFIDENTIAL
Open the MapReduce Status Page
Back in the vSphere Web Client, open the MapReduce status page by right-clicking onthe small_cluster line, and choosing "Open MapReduce Status Page" from the contextmenu.
Once this page opens, you can return to the Serengeti CLI window. We will come back tothis status page after we execute the MapReduce job.
Run MapReduce
To run our MapReduce job, enter the following command in the CLI:
HOL-SDC-1309
Page 38VMware Beta Program CONFIDENTIAL
mr jar --jarfile /opt/serengeti/cli/lib/hadoop-examples-1.2.1.jar --mainclassorg.apache.hadoop.examples.WordCount --args "/tmp/input /tmp/output"
This command executes the WordCount MapReduce job that is included in the hadoop-examples jar file that comes with Serengeti. This class reads the input from the /tmp/input directory, executes the MR job, and stores the results in the /tmp/output directory.
HOL-SDC-1309
Page 39VMware Beta Program CONFIDENTIAL
View Map Reduce Status Page
Go back to your web browser and scroll down to view the Map Reduce results.
Click refresh in the browser address bar.
Scroll down and look at the running and completed jobs section. The job we submitteddoes not take long to run, so it may already be completed by the time you view thepage.
Once the job completes (you may have to refresh the page a couple of times), click onthe hyperlinked Jobid to view some details about the job.
View MR Job Details
Feel free to explore this page and look at the statistics for the job we submitted.
When we executed the MapReduce job to do a word count on the Serengeti user guide,here is what happened, at a simplified level:
HOL-SDC-1309
Page 40VMware Beta Program CONFIDENTIAL
1. Map Step: The master node takes the input data, divides it into smaller units ofwork, and distributes these to the worker nodes, which further subdivide them. Inthe WordCount example, each line in the file is broken into words, and the mapfunction outputs key/value pairs containing the word and the number ofoccurrences in that line.
2. Reduce Step: The master node collects back all the results from the workernodes, sums the values for each word (key) and outputs a single key/value withthe word and sum.
HOL-SDC-1309
Page 41VMware Beta Program CONFIDENTIAL
Sort the Results Using Pig
The default results file output by the MapReduce job is sorted alphabetically by word. Tomake our results file easier to understand, we will run a simple pig script to sort the fileby number of occurrences of each word in ascending order.
Back in the Serengeti CLI window, type the following command:
pig script /home/serengeti/sort.pig
Once the command completes as pictured above, move on to the next step.
HOL-SDC-1309
Page 42VMware Beta Program CONFIDENTIAL
Open HDFS Status Page
Back in the vSphere Client, make sure you are on the Hadoop Clusters page.
Right-click on the small_cluster and choose the "Open HDFS Status Page" option.
HOL-SDC-1309
Page 43VMware Beta Program CONFIDENTIAL
Browse HDFS Filesystem
Click the "Browse the filesystem" link as shown in the screenshot above.
HOL-SDC-1309
Page 44VMware Beta Program CONFIDENTIAL
Navigate HDFS Filesystem
We need to navigate to the directory /tmp/output/wordcount-sorted. You can simply typethis directory in to the "Goto:" field and click on the go button, or manually click on thedirectories until you reach that folder.
HOL-SDC-1309
Page 45VMware Beta Program CONFIDENTIAL
View the Sorted Results
Now that we are in the right directory, simply click the "part-r-0000" file to view theresults that we sorted with the Pig script.
HOL-SDC-1309
Page 46VMware Beta Program CONFIDENTIAL
Scroll through the file
To view the complete file, you will need to click the "View Next chunk" link near the topof the page.
The word count results are sorted by the number of occurrences of each word, inascending order. If you scroll to the bottom of the last chunk, you will see the mostcommon words, such as "the" and "Hadoop."
HOL-SDC-1309
Page 47VMware Beta Program CONFIDENTIAL
Scale out Hadoop Cluster via the WebUIThis lesson will show you how to scale out a Hadoop cluster via the vSphere Web Client.The next lesson will walk through the same procedure using the CLI interface.
The ability to easily scale your Hadoop clusters up and down is a key benefit of runningHadoop on vSphere. It is very difficult and costly to achieve the same results ondedicated physical hardware.
Navigate to Big Data Extensions Plugin
If you are not already in the Big Data Extensions plugin, navigate back to it by clickingthe home icon, then choosing "Big Data Extensions" from the sidebar menu as shownabove.
HOL-SDC-1309
Page 48VMware Beta Program CONFIDENTIAL
Click Hadoop Clusters
Click "Big Data Clusters" in the sidebar menu.
HOL-SDC-1309
Page 49VMware Beta Program CONFIDENTIAL
Scale Out the small_cluster
Right-Click on small_cluster and choose Scale Out...
HOL-SDC-1309
Page 50VMware Beta Program CONFIDENTIAL
Change Instance Number to 2
Change the Instance number to 2, and click OK.
Upon execution of this task, the Big Data Extensions would clone a new worker nodeand handle adding it to the Hadoop cluster automatically.
Note that in this lab environment, nothing will actually happen when you click OK, dueto resource constraints. Go ahead and click Cancel.
Video of the Resize Process
This video shows you the resize process in action.
HOL-SDC-1309
Page 51VMware Beta Program CONFIDENTIAL
Scale Out Cluster via Serengeti CLIIn this lesson, we will scale out a cluster using the CLI.
HOL-SDC-1309
Page 52VMware Beta Program CONFIDENTIAL
Open the Command Line Interface (CLI)
Some Hadoop Cluster management can be done through the vSphere Web Client, as wesaw in the last lesson. We are going to look at changing cluster size through the CLI.
From the Windows Desktop perform the following steps:
1. Click on the Putty Icon2. Select the SerengetiCLI session3. Click the Load button4. Click the Open button5. At the OS login enter the Password. It is password
HOL-SDC-1309
Page 53VMware Beta Program CONFIDENTIAL
HOL-SDC-1309
Page 54VMware Beta Program CONFIDENTIAL
Open the Serengeti CLI
1. To open the CLI, type serengeti The CLI commands are case sensitive.2. Type connect --host localhost:8443 to connect to our management server.
Username is administrator@corp, password is VMware1!
You are now in a Command Line environment that can interact directly with yourHadoop Clusters.
HOL-SDC-1309
Page 55VMware Beta Program CONFIDENTIAL
Listing Hadoop Cluster Details
1. To see your clusters type cluster list (note that up arrow will let you see yourcommand history.
2. Notice that we currently have 4 workers in the Tier1 cluster. In the next step, wewill expand this to 5 workers.
HOL-SDC-1309
Page 56VMware Beta Program CONFIDENTIAL
View Cluster Resize Help
First, let's look at the help for the cluster resize command.
Type:
help cluster resize
Take a look at the keywords for the command.
Enter the Resize Command
As you can see from the help information, the command we need to enter in order toresize the Tier1 cluster to 5 worker nodes is:
cluster resize --name Tier1 --nodeGroup worker --instanceNum 5
HOL-SDC-1309
Page 57VMware Beta Program CONFIDENTIAL
Note: If you plan to take additional modules in this lab, you may not want to actuallyrun this command, since it will take time to complete and may impact the functionalityof other lab modules.
If you choose, you may enter the command, and you will see the screen above thatindicates Tier1-worker-4 is cloning. If you have the vSphere Web Client open, you willsee the progress of the clone VM task. It may take several minutes for the operation tocomplete.
Eventually, the task status in the CLI will update to "waiting for ip," and then to "VMready" and several other steps as the cluster reconfigures itself.
These steps can take some time, and you do not need to wait for them to completesince this is the last step of this lab module.
HOL-SDC-1309
Page 58VMware Beta Program CONFIDENTIAL
Module 2 - Fast and EasyDeployment of Hadoop
Clusters
HOL-SDC-1309
Page 59VMware Beta Program CONFIDENTIAL
Module OverviewHadoop clusters typically require specialized expertise and dedicated hardwareinfrastructure to deploy. In this module you will see how easy it is to configure yourHadoop cluster nodes, size the virtual machines - including CPU, Memory and Storage -and deploy into your existing vSphere environment. As resource demands change overtime - or throughout the day - you can resize the Hadoop cluster to accommodate thesechanges. Lastly, once a cluster is configured, you will see how to export thatconfiguration, and use it to create or update other Hadoop clusters.
Note: You MUST run "Verify Hadoop Clusters Have Started" step under the lab overviewsection prior to doing this module.
HOL-SDC-1309
Page 60VMware Beta Program CONFIDENTIAL
Configure and Deploy Hadoop ClustersIn this module, you will deploy a Apache Hadoop cluster using the vSphere Web Clientand vSphere Big Data Extensions.
Navigate to Hosts and Clusters
Click on Hosts and Clusters
Create Resource pool
Resource Pools allow you to limit the amount of CPU and Memory that can be consumedby your Hadoop cluster, but as you will see in Module 6, also are the mechanism forestablishing the priority of one cluster over another in the case of resource contention.
HOL-SDC-1309
Page 61VMware Beta Program CONFIDENTIAL
Right click on the cluster names Cluster Site A, and select New Resource pool
Configure resource pool
Name the cluster MyHadoopCluster.
HOL-SDC-1309
Page 62VMware Beta Program CONFIDENTIAL
Leave all setting at the default level and click ok.
HOL-SDC-1309
Page 63VMware Beta Program CONFIDENTIAL
Return to homepage
Click the home button in the top, to return to the homepage.
HOL-SDC-1309
Page 64VMware Beta Program CONFIDENTIAL
Navigate to Big Data Extensions Plugin
This is a vCenter Plugin providing specific capability to config, deploy and manage yourBig Data environment.
Click on the "Big Data Extensions" tab
HOL-SDC-1309
Page 65VMware Beta Program CONFIDENTIAL
Select Hadoop Clusters
Four Hadoop clusters have been created for this lab. If any cluster that you need has notstarted or has an error status, follow the directions in "Verify Hadoop Clusters HaveStarted" step under the lab overview section prior to doing this module.
Click on the Hadoop clusters tab.
Create Hadoop Cluster
Click Create New Hadoop Cluster
HOL-SDC-1309
Page 66VMware Beta Program CONFIDENTIAL
Name and type
You will choose your preferred Hadoop Distribution. Supported distros include Cloudera,Mapr, HortonWorks, and PivotablHD. We will use the opensource Apache distribution inthis module.
There are several deployment types for your clusters. You can mimic the typical physicalHadoop deployment with the Basic Hadoop Cluster. This type will separate theNamenode and Jobtracker into their own Virtual Machines, however each Tasktrackerand Datanodes combination will be in a single Virtual Machine. You also have the optionof separating the Compute (Tasktracker) from the Datanode using the Data/ComputeSeparation Hadoop option. This facilitates the elastic scaling of Compute you can see inModule 6.
For this Module, select the following options:
Hadoop Cluster Name : Basic Hadoop
Hadoop Distro: Apache
Deployment Type: Basic Hadoop Cluster.
HOL-SDC-1309
Page 67VMware Beta Program CONFIDENTIAL
Select the custom template
Each distinct Hadoop Node configuration is called a NodeGroup. You will see specificNodeGroups based on the Deployment Type you selected, but you can also use theCommand Line Interface to define any type of NodeGroup you want. In this section, youare sizing the virtual machine CPU, RAM and Data storage for each NodeGroup. You willalso define the number of a specific nodegroup to deploy. In the image above, you aregoing to deploy 3 Worker Nodes, containing a TaskTracker and DataNode, 1ComputeMaster (Jobtracker) and 1 DataMaster (NameNode).
Click the Resource template button, and select Customize
HOL-SDC-1309
Page 68VMware Beta Program CONFIDENTIAL
Customize the template
Note that you can select Shared or Local storage. Typically, Hadoop has been deployedwith local storage to provide the data locality that is central to its performance. You cansee that each NodeGroup can be configured with its own Datastore type. This meansthat, for instance, your DataNodes can run on Local storage, while you have theJobtracker and Namenodes on Shared storage. This allows the use of vSphere HA or FTto improve the availability of those nodes while still ensuring data locality.
Change the default to :
vCpu number : 1
Memory size : 3748
Storage Size : 10
Datastore type : Shared
Click ok.
HOL-SDC-1309
Page 69VMware Beta Program CONFIDENTIAL
Select the Resources for the Cluster
Make sure to select the Customize option and size each NodeGroups resources as in theprevious step.
Set the number of notes, for each worker to 1.
Network and resource pool
Leave the network to Defaultnetwork.
click the select button, to select a resource pool.
Select the proper resource pool
Select the resource pool, MyHadoopCluster, that you created in a earlier step.
Click ok.
HOL-SDC-1309
Page 70VMware Beta Program CONFIDENTIAL
Cancel creation
Depending on the size of the cluster, it takes anywhere from 6 to 20 minutes to deployand be running. Due to resource and time constraints for the lab, we will not actuallycreate the cluster
Click cancel to cancel the deployment, and watch the video below, to see a deploymentof a hadoop cluster.
Video
HOL-SDC-1309
Page 71VMware Beta Program CONFIDENTIAL
Resize Hadoop cluster after creationAs resource demands change over time - or throughout the day - you can resize theHadoop cluster to accommodate these changes. In this module, you will use thevSphere Big Data Extensions Plugin to resize an existing cluster.
Navigate to Big Data Extensions Plugin
Click on the "Big Data Extensions" tab
HOL-SDC-1309
Page 72VMware Beta Program CONFIDENTIAL
Select Hadoop Clusters
Click on the Hadoop clusters tab.
Select the Cluster
You may choose any of the Running clusters for the Resize process. Because of Resourceand Timing constraints in the lab environment, we will not actually complete thecreation of additional nodes.
Right click the cluster from the Center Panel list of Clusters.
HOL-SDC-1309
Page 73VMware Beta Program CONFIDENTIAL
Select Scale Out
Scaling out in our environment is to create an additional node for the NodeGroup youselect. vSphere will automatically provision the Virtual Machine, install and configure theappropriate Hadoop components for your selected NodeGroup and startup the services.
Select Scale Out.
Select the NodeGroup to resize
Select node group, you want to resize.
Select the new number of instances.
Click cancel.
HOL-SDC-1309
Page 74VMware Beta Program CONFIDENTIAL
Due to the time it takes to make configuration changes and resource constraints in thelab environment, we will not be doing any changes to the cluster.
Watch the video below, to see the scale out, of a cluster.
Video
HOL-SDC-1309
Page 75VMware Beta Program CONFIDENTIAL
Export configuration and createcustomized clusterOnce a Hadoop cluster is configured, you will be able export that configuration and useit to create or update the configuration of other Hadoop clusters. In this module, you willexport a running configuration, and deploy a customized cluster from that configuration.
HOL-SDC-1309
Page 76VMware Beta Program CONFIDENTIAL
Connect to the Big Data Extensions Command LineInterface (CLI)
Open Putty from the Windows Desktop, and select the SerengetiCLI server.
Click open.
HOL-SDC-1309
Page 77VMware Beta Program CONFIDENTIAL
Login to the management appliance
vSphere Big Data Extensions take advantage of open source project Serengeti startedby VMware last year. You are now connecting to the Serengeti Management Appliance.
Login to the management appliance, using
login : serengeti
password : password
HOL-SDC-1309
Page 78VMware Beta Program CONFIDENTIAL
Connect to the Big Data Extensions CLI
Once you have logged into the Serengeti Management Appliance, you will start theCommand Line Interface (CLI)
At the linux Operating System prompt, Type
serengeti
to start the Serengeti CLI.
HOL-SDC-1309
Page 79VMware Beta Program CONFIDENTIAL
Connect to Serengeti server
Now that you are in the CLI, you need to connect to the specific Serengeti Server youwant to use. (Note: This step may seem unnecessary because you are already logged into the Serengeti Server, however the CLI can be run on your client machine as well. Inthat case, the need to connect to a specific Server is obvious).
Type
connect --host localhost:8443
Username : administrator@corp
password : VMware1!
to connect to the serengeti server
HOL-SDC-1309
Page 80VMware Beta Program CONFIDENTIAL
List Cluster Information
Locate the running cluster, by typing
cluster list --name small_cluster
Export cluster configuration
To change the cluster's configuration, we must first export it to a configuration file.
Type :
cluster export --name small_cluster --specFile /home/serengeti/small_cluster.json
Configuration file
The cluster configuration file is stored as a json file. To see its contents, Exit theSerengeti CLI and
type the command "more /home/serengeti/small_cluster.json"
You can edit it with your favorite text editor, and when you are done, just save it. Noticethat the configuration includes definition of the Nodegroups and specific Hadoopconfigurations.
Due to time constraints for the lab, we wont be editing the file. A sample of the file isprovided below.
Small_cluster.json{
HOL-SDC-1309
Page 81VMware Beta Program CONFIDENTIAL
"nodeGroups" : [{
"name" : "master","roles" : [
"hadoop_namenode","hadoop_jobtracker"
],"instanceNum" : 1,"storage" : {
"type" : "shared","shares" : "NORMAL","sizeGB" : 2
},"cpuNum" : 1,"memCapacityMB" : 1024,"swapRatio" : 1.0,"haFlag" : "on","configuration" : {
"hadoop" : {}
}},{
"name" : "worker","roles" : [
"hadoop_datanode","hadoop_tasktracker"
],"instanceNum" : 1,"storage" : {
"type" : "shared","shares" : "NORMAL","sizeGB" : 2
},"cpuNum" : 1,"memCapacityMB" : 1024,"swapRatio" : 1.0,"haFlag" : "off","configuration" : {
"hadoop" : {}
}},{
"name" : "client","roles" : [
"hadoop_client","pig",
HOL-SDC-1309
Page 82VMware Beta Program CONFIDENTIAL
"hive","hive_server"
],"instanceNum" : 1,"storage" : {
"type" : "shared","shares" : "NORMAL","sizeGB" : 2
},"cpuNum" : 1,"memCapacityMB" : 1024,"swapRatio" : 1.0,"haFlag" : "off","configuration" : {
"hadoop" : {}
}}
],"configuration" : {
"hadoop" : {"core-site.xml" : {},"hdfs-site.xml" : {},"mapred-site.xml" : {},"hadoop-env.sh" : {},"log4j.properties" : {},"fair-scheduler.xml" : {},"capacity-scheduler.xml" : {},"mapred-queue-acls.xml" : {}
}},"specFile" : false
}
HOL-SDC-1309
Page 83VMware Beta Program CONFIDENTIAL
Changing the CPU count using the Spec File
Open a new Putty session from the Windows Desktop, and select the SerengetiCLI.
Click open.
Login to the management appliance
Login to the management appliance, using
Login : serengeti
password : password
Edit the json file
Open the json file, in the VI editor by typing :
HOL-SDC-1309
Page 84VMware Beta Program CONFIDENTIAL
vi /opt/serengeti/conf/small_cluster.json
Change cpu count
Move the curser to the line with "cpuNum".
Press
i
on the keyboard, to go to edit mode.
Change the number 1 to 2 and press
esc
on the keyboard, to exit edit mode
type
:wq
on the keyboard, to quit VI and save the changes.
HOL-SDC-1309
Page 85VMware Beta Program CONFIDENTIAL
Deploy custom cluster
To create a custom cluster, from the file you just edited, you would enter the serengetiCLI and type the command below.
cluster create --name small_cluster_2cpu --specFile /home/serengeti/small_cluster.json
Due to time and resource constraints in our lab environment we will not execute thecommand, but have created a video showing the above command.
Video
HOL-SDC-1309
Page 86VMware Beta Program CONFIDENTIAL
Module 3- Compute onlyClusters on Shared HDFS
HOL-SDC-1309
Page 87VMware Beta Program CONFIDENTIAL
Module OverviewHadoop clusters typically require specialized expertise and dedicated hardwareinfrastructure to deploy. In the previous module you deployed a Basic Hadoop clusterthat separated the Namenode and Jobtracker into their own Virtual Machines, kept eachTasktracker and Datanodes combination in a single Virtual Machine. In this module youwill see how easy it is to not only separate your Jobtracker and Namenode, but also toput Tasktrackers and Datanodes into their own VMs as well. This separation of Computeand Data is the key element of the Elastic Scaling that is demonstrated in Module 6 ofthis lab. Specifically, you will create a Compute Only Cluster that deploys JobTracker,Namenode and Tasktracker nodes, but does not create new Datanodes. Instead, you willpoint to an existing Hadoop File System (HDFS) that was previously created. The valuein this is many organizations have isolated Hadoop clusters today that make use ofsome of the same data. You can now easily spin up a cluster and point it to existing datain HDFS instead of copying it into a new filesystem.
Note: If you have not done so in a previous module, You MUST run "Verify HadoopClusters Have Started" step under the lab overview section prior to doing this module.
HOL-SDC-1309
Page 88VMware Beta Program CONFIDENTIAL
Create Compute only clusterYou will deploy a Hadoop compute only cluster, that uses an external HDFS filesystem,and HVE
Hadoop Virtualization Extensions (HVE) are changes VMware has submitted to theopensource Apache community to make Hadoop run better on virtualized infrastructure.HVE refines Hadoop‟s replica placement, task scheduling and balancer policies. Hadoopclusters implemented on virtualized infrastructure have full awareness of the topologyon which they are running. Thus, the reliability and performance of these clusters areenhanced. For more information about HVE, you can refer to https://issues.apache.org/jira/browse/HADOOP-8468.
Connect to the Big Data Extensions CLI
Open Putty, and select the SerengetiCLI.
Click open.
Login to the management appliance
Login to the management appliance, using
HOL-SDC-1309
Page 89VMware Beta Program CONFIDENTIAL
Login : serengeti
password : password
Start Big Data Extensions Command Line Interface (CLI)
Type
serengeti
to start the Serengeti CLI.
HOL-SDC-1309
Page 90VMware Beta Program CONFIDENTIAL
Connect to Serengeti server
Type
connect --host localhost:8443
Username : administrator@corp
password : VMware1!
to connect to the serengeti server
Hadoop Rack topology
Hadoop makes placement and execution decisions based on datacenter topology.Administrators provide their datacenter topology via a topology file. It specifies, forinstance, the racks in the datacenter and the servers on each rack. In a virtualenvironment we have introduced the concept of a nodegroup to represent servers (thatare actually VMs) that are running on a specific esxi host. You can make Hadooptopology aware by uploading your topology file through the Big Data Extensions CLI. Weare showing you a very simple example that only defines the Racks and physical hosts.
To do this, upload the file topology.txt by typing :
topology upload --fileName /opt/serengeti/conf/rack_topology.txt
The content of the file is :
rack1: esx-01a.corp.local, esx-02a.corp.local, esx-03a.corp.local
HOL-SDC-1309
Page 91VMware Beta Program CONFIDENTIAL
List topology
Verify that the topology has been uploaded, by typing :
topology list
And see that rack1 contains the 3 esx hosts in your vCenter.
Configuring Compute Only Hadoop Cluster
As we saw in a Modules 1 and 2, Hadoop Clusters can be created directly through thevSphere Big Data Extensions plugin. They can also be created through the CLI using ajson SpecFile. The specFile contains the cluster configuration and points to the externalHadoop Filesystem using the "ExternalHDFS" tag. This tag points to the Namenode of anexisting Hadoop cluster.
This enables the new cluster to use the already existing HDFS filesystem, whiledeploying Master and compute resources.
{"externalHDFS": "hdfs://192.168.110.123:8020","distro":"PivotalHD","nodeGroups":[
{"name": "master","roles": [
"hadoop_jobtracker"],"instanceNum": 1,"storage":{"type": "SHARED","sizeGB":1
},"cpuNum": 1,"memCapacityMB": 1024,"haFlag": "off","rpNames": [
"Tier2RP"]
},
HOL-SDC-1309
Page 92VMware Beta Program CONFIDENTIAL
{"name": "worker","roles": [
"hadoop_tasktracker"],"instanceNum": 1,"cpuNum": 1,"memCapacityMB": 1024,"storage": {
"type": "LOCAL","sizeGB": 1
},"rpNames": [
"Tier2RP" // change this to the resource pool added via Serengeti CLI]
},{"name": "client","roles": [
"hadoop_client"],"instanceNum": 1,"cpuNum": 1,"memCapacityMB": 1024,"storage": {
"type": "SHARED","sizeGB": 1
},"rpNames": [
"Tier2RP"]
}],"configuration": {
"hadoop": {"core-site.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/core-default.html // note: any value (int, float, boolean, string) must be enclosed in double quotes and here is a sample:// "io.file.buffer.size": "4096"
},"hdfs-site.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/hdfs-default.html },"mapred-site.xml": {
// check for all settings at http://hadoop.apache.org/docs/stable/mapred-default.html },"hadoop-env.sh": {
// "HADOOP_HEAPSIZE": "",// "HADOOP_NAMENODE_OPTS": "",// "HADOOP_DATANODE_OPTS": "",// "HADOOP_SECONDARYNAMENODE_OPTS": "",
HOL-SDC-1309
Page 93VMware Beta Program CONFIDENTIAL
// "HADOOP_JOBTRACKER_OPTS": "",// "HADOOP_TASKTRACKER_OPTS": "",// "HADOOP_CLASSPATH": "",// "JAVA_HOME": "",// "PATH": ""
},"log4j.properties": {
// "hadoop.root.logger": "INFO,RFA",// "log4j.appender.RFA.MaxBackupIndex": "10",// "log4j.appender.RFA.MaxFileSize": "100MB",// "hadoop.security.logger": "DEBUG,DRFA"
}}
}}
Deploy Hadoop Cluster With PivotalHD Distro
From the Big Data Extensions CLI you can deploy the Compute Only Cluster withPivotalHD as the Distro and take advantage of HVE to provide virtual topologyawareness. Below is an example of the command used to deploy an alternate distro toApache. In this example, the file Pivotal.txt would specify the PivotalHD distro to beused. We will not actually execute this command in the lab because the PivotalHD distrohas not been installed in the Serengeti server.
Type :
cluster create --name Pivotal --topology HVE --distro PivotalHD --specFile /opt/serengeti/conf/Pivotal.txt
Due to time and resource constraints in our lab environment, do not execute thecommand. The video below shows the deployment of a Compute Only Hadoop Cluster.
HOL-SDC-1309
Page 94VMware Beta Program CONFIDENTIAL
Video
HOL-SDC-1309
Page 95VMware Beta Program CONFIDENTIAL
Hadoop Filesystem Commands WithinCLIIn the previous section you saw the creation of a Compute Only Hadoop cluster thatused an External HDFS filesystem. In this section you will use the Big Data ExtensionsCLI to upload files to the HDFS filesystem, and verify that they have been uploaded.
Connect to Serengeti Server With Putty
Open Putty, and select the SerengetiCLI.
Click open.
HOL-SDC-1309
Page 96VMware Beta Program CONFIDENTIAL
Login to the management appliance
Login to the management appliance, using
Login : serengeti
password : password
Start Big Data Extensions Command Line Interface (CLI)
Type
serengeti
to start the Serengeti CLI.
Connect to Serengeti server
Type
connect --host localhost:8443
Username : administrator@corp
HOL-SDC-1309
Page 97VMware Beta Program CONFIDENTIAL
password : VMware1!
to connect to the serengeti server
Connect to Cluster
We are going to load data into the small_cluster so we need to point the CLI to thattarget.
Connect to the target cluster, by typing :
cluster target --name small_cluster
Upload and download data from HDFS
We can use traditional put/get commands within the CLI to upload/download data fromthe Hadoop Filesystem (HDFS)
To upload the file /etc/inittab in the Linux EXT4 filesystem to /tmp/input/inittab in HDFSissue the following command
fs put --from /etc/inittab --to /tmp/input/inittab
To download the data to a new Linux file type :
HOL-SDC-1309
Page 98VMware Beta Program CONFIDENTIAL
fs get --from /tmp/input/inittab --to /tmp/local-inittab
List files in HDFS
Type :
fs ls /tmp/input
to list the files there.
Verify the file you just uploaded, is there. Note: you can also use the Big DataExtensions Plugin to launch the Hadoop HDFS page to browse the filesystem from a webpage. Simply go to your list of clusters in the vSphere Web Client, Right click on thesmall_cluster in the Center Panel, select the Open HDFS Status page. You can browsethe filesystem from there.
HOL-SDC-1309
Page 99VMware Beta Program CONFIDENTIAL
Module 4 - HighlyAvailable Hadoop
HOL-SDC-1309
Page 100VMware Beta Program CONFIDENTIAL
Module OverviewThis is a single 15 minute lab.
vSphere provides a well known capability to automatically restart VMs when a physicalinfrastructure failure occurs. If an esxi host fails, vSphere HA will automatically restartthe failed VM on another host in your vSphere cluster. vSphere Big Data Extensions addto this capability by monitoring specific Hadoop nodes and restarting them when thoseprocesses fail. In this lab we will take a running Hadoop cluster, kill the Namenodeprocess and see that vSphere will detect that process failure and automatically restartthe node.
Note: You MUST run "Verify Hadoop Clusters Have Started" step under the lab overviewsection prior to doing this module.
HOL-SDC-1309
Page 101VMware Beta Program CONFIDENTIAL
How to Create Hadoop Cluster with HAEnabled
Let's start by getting comfortable with the Big Data Extension vCenter plugin and seehow to create a Hadoop cluster with HA enabled.
Accessing the Big Data Extensions in vCenter
Open Firefox from your Desktop and enter Username corp\administrator and passwordVMware1!
HOL-SDC-1309
Page 102VMware Beta Program CONFIDENTIAL
Navigate to Big Data Extensions
Click on the Home icon at the top of the screen. In the Inventories panel, click on the BigData Extensions icon
HOL-SDC-1309
Page 103VMware Beta Program CONFIDENTIAL
Working with Clusters
Click on Hadoop Clusters in the Inventory Lists panel. You can now view details ofHadoop Clusters already deployed
HOL-SDC-1309
Page 104VMware Beta Program CONFIDENTIAL
Hadoop and Hbase Clusters are already running.
Notice that 4 Hadoop clusters have previously been created for you. We will be workingwith the Small_cluster in this Module. Click on the small_cluster to drill into the details.
HOL-SDC-1309
Page 105VMware Beta Program CONFIDENTIAL
Hadoop Cluster Nodes are Virtual Machines
Notice that the Hadoop cluster is made up of 3 VMs. The Nodegroup defines the HadoopRoles that have been enabled on those VMs and ultimately the Hadoop processes thatare running. As a reminder, the Namenode keeps the directory tree of the Hadoop filesystem (HDFS) and tracks where across the filesystem data is stored. The Namenodedoes not actually store the data but if it is down, the data is unavailable. In this Hadoopcluster the data is stored in the Worker Node VM. The small_cluster_master-0 VMcontains the Hadoop Namenode process. Click on the small_cluster-Master-0 VM to seeit's details.
HOL-SDC-1309
Page 106VMware Beta Program CONFIDENTIAL
Virtual Machine Availability Enabled
You are now looking at the small_cluster-master-0 Virtual Machine (VM) detailinformation. Click on the Summary tab to drill in further.
HOL-SDC-1309
Page 107VMware Beta Program CONFIDENTIAL
Namenode Virtual Machine Details
Now that you are looking at Virtual Machine details for the Namenode VM, you can see ifit is protected by HA. Hover the mouse over the icon highlighted above to see protectionlevel. Next we will open the Command Line Interface (CLI) to see how this cluster wascreated.
Command Line Interface (CLI) for granular configuration
Hadoop Cluster creation can be done through the vCenter UI. Please try the othermodules in this lab for details on that process. We are going to look at detailedconfiguration through the CLI.
HOL-SDC-1309
Page 108VMware Beta Program CONFIDENTIAL
From the Windows Desktop perform the following steps:
1) Click on the Putty Icon
2) Select the SerengetiCLI session
3) Click the Load button
4) Click the Open button
5) At the OS login enter the Password. It is password
HOL-SDC-1309
Page 109VMware Beta Program CONFIDENTIAL
Cluster Configuration using JSON file
Cluster definitions are done using JSON files. These are specfiles that define the nodesthat make up your Hadoop clusters, including types of nodes, what Hadoop roles will beconfigured in each node, how many to deploy, resources allocated to each node, HA/FTon or off, node placement on hosts, and even affinity between types of nodes. (Themodules on creating Hadoop clusters go into more detail on this).
HOL-SDC-1309
Page 110VMware Beta Program CONFIDENTIAL
1) type cd /opt/serengeti/conf to move to the directory that contains the JSON files
2) type ls -al to list the files in that directory.
The cluster we looked at with the vSphere Web Client was named small_cluster.small_cluster.json is the file that was used to define that cluster.
HOL-SDC-1309
Page 111VMware Beta Program CONFIDENTIAL
small_cluster is defined by small_cluster.json file
1)Type more small_cluster.json.
Notice the NodeGroup with the name = Master. The Master NodeGroup contains tworoles; Jobtracker and Namenode. These roles map directly to Chef Recipes that are usedto orchestrate the provisioning of the VMs. Also notice that HA is set to ON for theMaster NodeGroup and is set to OFF for the worker NodeGroup. When we create thecluster through the command line, we simply reference this specfile in the cluster createcommand. We have already done that for you in this lab.
HOL-SDC-1309
Page 112VMware Beta Program CONFIDENTIAL
HOL-SDC-1309
Page 113VMware Beta Program CONFIDENTIAL
Kill the Namenode and Verify HArestartNow we are going to kill the Namenode process and see what happens.
HOL-SDC-1309
Page 114VMware Beta Program CONFIDENTIAL
Connect to the Namenode VM
From the Windows Desktop perform the following steps:
1) Click on the Putty Icon
2) Select the Namenode session
3) Click the Load button
4) Click the Open button
5) At the OS login enter the Password. It is password
HOL-SDC-1309
Page 115VMware Beta Program CONFIDENTIAL
Find the Namenode Process
1) type ps -ef | grep proc_namenode
This command lists the processes running on the system and searches for the string"proc_namenode". Remember the process ID. You will use it in the next step.
HOL-SDC-1309
Page 116VMware Beta Program CONFIDENTIAL
Kill the Namenode process
1) type sudo kill -9 "Process_ID" Process_ID will actually be replaced with the Process IDyou identified in the previous step.
2) type ps -ef | grep proc_namenode again to verify that the process is terminated.
This command terminates the process you identified. The Namenode service is now notrunning and this Hadoop cluster cannot access data stored in the HDFS filesystem.
HOL-SDC-1309
Page 117VMware Beta Program CONFIDENTIAL
Watch the Restart of the Namenode
We will navigate to the Console screen of the Namenode VM. Go back to the vSphereWeb Client using Firefox.
1)Click on the Home Icon or Tab
2)Click on the Hosts and Clusters Icon
Find the Namenode VM and Launch the Console
1. Expand the inventory on the left hand side until you find the vSphere Big DataExtensions Resource Pool.
2. Expand the vSphere Big Data Extensions Server Small_cluster Resource Pool
HOL-SDC-1309
Page 118VMware Beta Program CONFIDENTIAL
3. Expand the Master Resource Pool.4. Click on Small Cluster_Master-0 VM and launch the Console. Notice that the VM is
restarting. It should take about 2 minutes to restart.
HOL-SDC-1309
Page 119VMware Beta Program CONFIDENTIAL
View the Namenode restart
Notice that the VM is restarting. It takes a couple of minutes for HA to determine thatthe namenode process has failed and to initiate a restart.
HOL-SDC-1309
Page 120VMware Beta Program CONFIDENTIAL
Verify that the Namenode has Restarted
HOL-SDC-1309
Page 121VMware Beta Program CONFIDENTIAL
Verify New Namenode Process
The Namenode process is again running in the small_cluster-master-0 VM. vSphereidentified the failure of the Namenode process, and initiated an automatic restart thatreduced the potential downtime for this Hadoop cluster.
HOL-SDC-1309
Page 122VMware Beta Program CONFIDENTIAL
Module 5 - Fast and EasyDeployment of HBase
Clusters
HOL-SDC-1309
Page 123VMware Beta Program CONFIDENTIAL
Module OverviewHadoop clusters typically require specialized expertise and dedicated hardwareinfrastructure to deploy. In this module you will see how easy it is to go beyond Hadoopdeployment to configure your Hbase cluster nodes, size the virtual machines - includingCPU, Memory and Storage - and deploy into your existing vSphere environment.
Note: If you have not done so in a previous module, You MUST run "Verify HadoopClusters Have Started" step under the lab overview section prior to doing this module.
HOL-SDC-1309
Page 124VMware Beta Program CONFIDENTIAL
Configure and Deploy HBase ClustersIn this module, you will see how to configure and deploy an HBase cluster suing thevSphere Big Data Extensions Plugin.
Navigate to Hosts and Clusters
From the vSphere Web Client, Click on Hosts and Clusters
Create Resource pool
If you already created a resource pool in a previous module, skip down to step "Returnto Homepage". Resource Pools allow you to limit the amount of CPU and Memory thatcan be consumed by your clusters, but as you will see in Module 6, also are the
HOL-SDC-1309
Page 125VMware Beta Program CONFIDENTIAL
mechanism for establishing the priority of one cluster over another in the case ofresource contention.
Right click on the cluster names Cluster site A, and select New Resource pool
HOL-SDC-1309
Page 126VMware Beta Program CONFIDENTIAL
Configure resource pool
Name the cluster MyHadoopCluster.
Leave all settings at the default level and click ok.
HOL-SDC-1309
Page 127VMware Beta Program CONFIDENTIAL
HOL-SDC-1309
Page 128VMware Beta Program CONFIDENTIAL
Return to homepage
Click the home button at the top to return to the homepage.
HOL-SDC-1309
Page 129VMware Beta Program CONFIDENTIAL
Navigate to Big Data Extensions Plugin
This is a vCenter Plugin providing specific capability to config, deploy and manage yourBig Data environment.
Click on the "Big Data Extensions" tab
HOL-SDC-1309
Page 130VMware Beta Program CONFIDENTIAL
Select Hadoop Clusters
Click on the Hadoop clusters tab.
Create Hadoop Cluster
Four Hadoop clusters have been created for this lab. If any cluster that you need has notstarted or has an error status, follow the directions in "Verify Hadoop Clusters HaveStarted" step under the lab overview section prior to doing this module.
Click Create New Hadoop Cluster
HOL-SDC-1309
Page 131VMware Beta Program CONFIDENTIAL
Name and type
You will choose your preferred Hadoop Distribution. Supported distros include Cloudera,Mapr, HortonWorks, and PivotablHD. We will use the opensource Apache distribution inthis module.
There are several deployment types for your clusters. You can mimic the typical physicalHadoop deployment with the Basic Hadoop Cluster. This type will separate theNamenode and Jobtracker into their own Virtual Machines, however each Tasktrackerand Datanodes combination will be in a single Virtual Machine. You also have the optionof separating the Compute (Tasktracker) from the Datanode using the Data/ComputeSeparation Hadoop option. This facilitates the elastic scaling of Compute you can see inModule 6.
For this Module You will be deploying an Hbase cluster.
Select the following options:
Hadoop Cluster Name : HBase
Hadoop Distro: Apache
Deployment Type: Hbase Cluster
HOL-SDC-1309
Page 132VMware Beta Program CONFIDENTIAL
Select the custom template
Each distinct Hadoop Node configuration is called a NodeGroup. You will see specificNodeGroups based on the Deployment Type you selected, but you can also use theCommand Line Interface to define any type of NodeGroup you want. In this section, youare sizing the virtual machine CPU, RAM and Data storage for each NodeGroup. You willalso define the number of a specific nodegroup to deploy. In the image above, you aregoing to deploy 3 Worker Nodes, containing a TaskTracker and DataNode, 1ComputeMaster (Jobtracker) and 1 DataMaster (NameNode).
Click the Resource template button, and select Customize
HOL-SDC-1309
Page 133VMware Beta Program CONFIDENTIAL
Customize the template
Note that you can select Shared or Local storage. Typically, Hadoop has been deployedwith local storage to provide the data locality that is central to its performance. You cansee that each NodeGroup can be configured with its own Datastore type. This meansthat, for instance, your DataNodes can run on Local storage, while you have theJobtracker and Namenodes on Shared storage. This allows the use of vSphere HA or FTto improve the availability of those nodes while still ensuring data locality.
Change the default to :
vCpu number : 1
Memory size : 1024
Storage Size : 2
Datastore type : Shared
Click ok.
HOL-SDC-1309
Page 134VMware Beta Program CONFIDENTIAL
Select the Resources for the Hbase Cluster
Make sure to select the Customize option and size each NodeGroup's resources as in theprevious step.
Set the number of nodes, for each worker and client Nodegroup to 1.
Network and resource pool
Leave the network to Defaultnetwork.
click the select button, to select a resource pool.
Select the proper resource pool
Select the resource pool, MyHadoopCluster, that you created in a earlier step.
Click ok.
HOL-SDC-1309
Page 135VMware Beta Program CONFIDENTIAL
Cancel creation
Depending on the size of the cluster, it takes anywhere from 6 to 20 minutes to deployand be running. Due to resource and time constraints for the lab, we will not actuallycreate the cluster
Click cancel to cancel the deployment, and watch the video below, to see a deploymentof an Hbase cluster.
Video
HOL-SDC-1309
Page 136VMware Beta Program CONFIDENTIAL
Manage Hadoop Pooled ResourcesHadoop makes excellent use of the system resources that are made available to it. In anenvironment with shared physical resources that have been virtualized, it is importantto appropriately assign the resources that can be used by your Hadoop clusters.vSphere allows you to specifically make available CPU, RAM, Storage and VirtualNetworks to your Hadoop clusters. In this module, you will use the vSphere Big DataExtensions Plugin to add network and storage resources to the Hadoop Clusters.
Navigate to Big Data Extensions Plugin
Click on the "Big Data Extensions" tab
HOL-SDC-1309
Page 137VMware Beta Program CONFIDENTIAL
Select Resources
Click on the Resources tab.
HOL-SDC-1309
Page 138VMware Beta Program CONFIDENTIAL
Find Your Datastores
This process is not creating new datastores. It is simply allowing the administrator todetermine which datastores can be used when creating Hadoop clusters. vSphere willthen create virtual disks across those datastores during cluster creation.
Select the Datastores tab.
Add datastore
Click on the plus sign in the upper left corner to open the add datastore window.
Add datastore details
Fill out the information for the datastores you want to add. The Name you specify can beused in SpecFiles to refer to this set of datastores.
Name : Test datastores
HOL-SDC-1309
Page 139VMware Beta Program CONFIDENTIAL
The show name of the datastore
Datastore : test*
Selects all datastores that begin with the name test
Datastore type : Shared
Select if the datastores are shared or local storage.
Select Cancel because we have already added the datastores into your environment.
Networks
You are able to easily segment network traffic for specific clusters by adding multiplenetworks and using them in the cluster create specFiles.
Select the tab Networks
HOL-SDC-1309
Page 140VMware Beta Program CONFIDENTIAL
Add network
Click on the plus sign in the upper left corner to open the add networks window.
HOL-SDC-1309
Page 141VMware Beta Program CONFIDENTIAL
Network information
Fill out the information for your selected network.
Name : This will be the name you refer to when creating your cluster specFiles
Port group name : Then name of the port group, where the network is attached.
Use DHCP to obtain ipadress.
Check this, if there is DHCP on the network.
Ip range : Type the ip range, that the VM's can use.
Subnet mask : The subnet mask of the network.
Gateway : The gateway of the network.
DNS : The DNS server of the network.
Select cancel, to exit the guide.
HOL-SDC-1309
Page 142VMware Beta Program CONFIDENTIAL
Module 6 - Elastic Hadoop
HOL-SDC-1309
Page 143VMware Beta Program CONFIDENTIAL
Module OverviewvSphere Big Data Extensions add to the resource monitoring and sharing capabilities ofvSphere. You will configure manual and automatic scaling of your hadoop clusters. Youwill use resource pools with differing priorities and run Map Reduce jobs to see howvSphere will scale in or out cluster nodes based on your priorities and the resourcedemands placed on the system. We will begin by introducing the vCenter extensionsthat provide the new Big Data functionality and show you how to monitor resourceconsumption of your clusters. Next you will manually resize your clusters, includingcreation of new cluster nodes, in support of increased resource demand. Finally, you willexecute a MapReduce job called Pi, on two separate clusters with different priorities. Youwill see how vSphere can automatically respond to resource contention by poweringdown lower priority cluster nodes.
Note: You MUST run "Verify Hadoop Clusters Have Started" step under the lab overviewsection prior to doing this module. The Tier1 and Tier2 clusters must have a status of -RUNNING.
Elastic Hadoop Video
If you are running short of time and do not want to complete the Elastic Hadoop lab, wehave included this video to show it in action.
HOL-SDC-1309
Page 144VMware Beta Program CONFIDENTIAL
Manage Existing Tier1 and Tier2ClustersWe will get familiar with the clusters pre-created for this lab and use the Hadoopadministrative views. We will also see the CPU performance views that will be used inthe later part of the module.
HOL-SDC-1309
Page 145VMware Beta Program CONFIDENTIAL
Accessing the Big Data Extensions in vCenter
Open Firefox from your Desktop and enter Username corp\administrator and passwordVMware1!
We are going to navigate to the Big Data Extensions functionality in vCenter.
HOL-SDC-1309
Page 146VMware Beta Program CONFIDENTIAL
Navigate to Big Data Extensions
Click on the Home icon at the top of the screen. In the Inventories panel, click on the BigData Extensions icon.
HOL-SDC-1309
Page 147VMware Beta Program CONFIDENTIAL
Working with Clusters
Click on Hadoop Clusters in the Inventory Lists panel. You can now view details ofHadoop Clusters already deployed.
HOL-SDC-1309
Page 148VMware Beta Program CONFIDENTIAL
Manually Scale Out Hadoop Cluster
Those of you familiar with vSphere are comfortable with the idea of scaling up or downindividual VMs. With Big Data Extenstions, not only can we add resources to individualHadoop Nodes, but we can add new nodes to existing clusters or power down nodesthat are not needed for current workloads. To add new nodes to an existing cluster:
1) Right Click on the Tier1 Cluster
2) Select the Scale Out menu item
Choose The Number of Instances to Deploy
You can now choose the Nodegroup that you want to change and the total number ofinstances to deploy. Just a reminder that Node Groups define the Hadoop Roles that areconfigured on all VMs associated with a particular Nodegroup. We do not want to addinstances in this Node Group because space is limited in our lab environment.
HOL-SDC-1309
Page 149VMware Beta Program CONFIDENTIAL
1) Click the Cancel Button.
HOL-SDC-1309
Page 150VMware Beta Program CONFIDENTIAL
View Hadoop File System (HDFS) Details
Deployed Hadoop Clusters contain administrative pages that are available via your webbrowser. You can access those pages directly from vCenter. To view Hadoop File System(HDFS) information.
1) Right Click on the Tier1 Cluster
2) Click on Open HDFS Status Page
Note that this page is deployed in a separate tab in your Firefox browser.
HOL-SDC-1309
Page 151VMware Beta Program CONFIDENTIAL
Hadoop NameNode Page
Click on a few of the links to see the wealth of information that is available on thesepages.
View MapReduce Job Details
Deployed Hadoop Clusters contain administrative pages that are available via your webbrowser. You can access those pages directly from vCenter. To view MapReduceinformation.
1) Right Click on the Tier1 Cluster
2) Click on Open MapReduce Status Page
HOL-SDC-1309
Page 152VMware Beta Program CONFIDENTIAL
Note that this page is deployed in a separate tab in your Firefox browser.
HOL-SDC-1309
Page 153VMware Beta Program CONFIDENTIAL
MapReduce Job Details Page
Click on a few of the links to see the wealth of information that is available on thesepages.
HOL-SDC-1309
Page 154VMware Beta Program CONFIDENTIAL
vSphere Performance Views
We are only going to look at CPU usage information in this lab. You should note thatvSphere will monitor, and take action based upon, CPU and Memory usage.
1) Click on the Tier1 cluster
HOL-SDC-1309
Page 155VMware Beta Program CONFIDENTIAL
Find One of The Worker VMs in Tier1 Cluster
In our clusters, the Data VM contain the Data Node Role for Hadoop. The Worker VMscontain the Tasktracker Role and are responsible for executing the tasks that make up aJob. Our goal is to make sure that we have the right number of Worker (Tasktracker) VMsavailable for the workload and prioritization defined for the clusters. Here we are goingto monitor the performance of a single worker node from each of our two clusters
1) Click on the Tier1-worker-0 VM in the Tier1 cluster.
Navigate to Advanced CPU Monitoring
Navigate to the Advanced CPU Performance tab for the Tier1-worker-0 VM. Get familiarwith the CPU usage on this chart. Later in the Module we will configure a specific ChartView to monitor the load on the VM.
1. Click on Monitor Tab
HOL-SDC-1309
Page 156VMware Beta Program CONFIDENTIAL
2. Click on Performance Tab3. Click on Advanced Tab4. Click to Close the Advanced Panel.
HOL-SDC-1309
Page 157VMware Beta Program CONFIDENTIAL
Manual Hadoop ElasticityWe will use the CLI to see how to deploy clusters into specific Resource Pools. We willalso see how to directly access Hadoop clusters to Scale in (power down) nodes and toResize (add new nodes) using manual commands.
HOL-SDC-1309
Page 158VMware Beta Program CONFIDENTIAL
Command Line Interface (CLI) For Manual Elasticity
Some Hadoop Cluster management can be done through the vCenter UI. Please try theother modules in this lab for details on that process. We are going to look at changingcluster size through the CLI.
From the Windows Desktop perform the following steps:
1) Click on the Putty Icon
2) Select the SerengetiCLI session
3) Click the Load button
4) Click the Open button
5) At the OS login enter the Password. It is password
HOL-SDC-1309
Page 159VMware Beta Program CONFIDENTIAL
Cluster Configuration using JSON
Cluster definitions are done using JSON files. These are specfiles that define the nodesthat make up your Hadoop clusters, including types of nodes, what Hadoop roles will beconfigured in each node, how many to deploy, resources allocated to each node, HA/FTon or off, node placement on hosts, and even affinity between types of nodes. (Themodules on creating Hadoop clusters go into more detail on this).
HOL-SDC-1309
Page 160VMware Beta Program CONFIDENTIAL
1) type cd /opt/serengeti/conf to move to the directory that contains the JSON files
2) type ls -al to list the files in that directory.
The cluster we looked at with the vSphere Web Client was named Tier1. Tier1.json is thefile that was used to define that cluster.
HOL-SDC-1309
Page 161VMware Beta Program CONFIDENTIAL
Tier1 Cluster is Defined by Tier1.json
1)Type more Tier1.json.
Notice the NodeGroup with the name = Master. The Master NodeGroup contains tworoles; Jobtracker and Namenode. These roles map directly to Chef Recipes that are usedto orchestrate the provisioning of the VMs. Also notice that HA is set to ON for theMaster NodeGroup. When we create the cluster through the command line, we simplyreference this specfile in the cluster create command. We have already done that foryou in this lab. Notice that we have specified the Resource Pool that this cluster will bedeployed in to. This is important for prioritization of clusters, as you will see later in themodule.
HOL-SDC-1309
Page 162VMware Beta Program CONFIDENTIAL
Serengeti CLI
1. To open the CLI, type Serengeti The CLI commands are case sensitive.2. Type connect --host localhost:8443 to connect to our management server.
Username is root password is VMware1!
HOL-SDC-1309
Page 163VMware Beta Program CONFIDENTIAL
You are now in a Command Line environment that can interact directly with yourHadoop Clusters.
HOL-SDC-1309
Page 164VMware Beta Program CONFIDENTIAL
Listing Hadoop Cluster Details
1. To see your clusters type cluster list (note that up arrow will let you see yourcommand history. Notice that the AUTO ELASTIC is set to Disabled for both ourTier1 and Tier2 clusters. This means if you want to power on or off a node in aHadoop cluster because a workload has changed, you must do it manually. Wewill automate this later in the lab.
HOL-SDC-1309
Page 165VMware Beta Program CONFIDENTIAL
Cluster List with More Details
1. Type cluster list --name Tier1 --detail to see additional details of the Tier1 cluster.Note that all 4 Worker nodes have a STATUS of Service Ready. This means theVMs are powered on and the Hadoop services are running.
Manually Change Running Cluster Nodes
If you need to change the number of worker nodes that are running, you will execute asingle cluster command. Executing this command will take a couple of minutes and mayimpact the results of the Automated Elasticity lab. We recommend that you view the
HOL-SDC-1309
Page 166VMware Beta Program CONFIDENTIAL
results in the video at the beginning of the module. Also, note that you can perform thisoperation through the vCenter Big Data Extensions Plugin GUI by right clicking on thecluster name and selecting "Scale Out"
1) The correct command is:
cluster setParam --name Tier1 --elasticityMode MANUAL --targetComputeNodeNum 3
These commands are case-sensitive. Notice that one of the VMs has been Powered Off.
HOL-SDC-1309
Page 167VMware Beta Program CONFIDENTIAL
Resizing Hadoop Clusters
Note: Do not execute the Cluster resize command. This Lab environment does not haveenough storage to allocate additional nodes.
1. You can also manually add new nodes (Creating new VMs to extend the nodes inthe cluster) to an existing cluster with the cluster resize command. Type helpcluster resize to see the syntax. Also, note that you can perform this operationthrough the vCenter Big Data Extensions Plugin GUI by right clicking on thecluster name and selecting "Scale Up/Down"
HOL-SDC-1309
Page 168VMware Beta Program CONFIDENTIAL
Automatic Hadoop ElasticityWe will execute MapReduce jobs on both our Tier1 and Tier2 clusters and see howvSphere responds to the consumption of CPU from multiple clusters with differentpriorities levels. vSphere also supports scaling clusters in or out based on Memorycontention, however we will focus on CPU contention in this lab. Note: The resourcesavailable to this lab are highly dependent upon the number of labs being deployed inthe HOL environment. Your results may be different than those shown in thescreenshots.
Start MapReduce Job on Tier2 Cluster
1. From the Windows desktop, Click on Putty2. Click on Tier2JobtrackerNode3. Click on Load4. Click on Open. Password is password
Show the MapReduce Script
HOL-SDC-1309
Page 169VMware Beta Program CONFIDENTIAL
1. Type cd /usr/lib/hadoop This moves you to the hadoop directory, which containsour script
2. Type ls -al run* to see the python scripts that call MapReduce Java apps. We aregoing to use the runPi.py script
HOL-SDC-1309
Page 170VMware Beta Program CONFIDENTIAL
Run Pi MapReduce on Tier2
1. Type python runPi.py This will start a Pi calculation MapReduce job that willsaturate the CPU usage on the worker VMs of your Tier2 cluster. This scriptexecutes a MapReduce job that is a heavy CPU process that will use 100% of theavailable resources in the worker (Tasktracker) VMs in our cluster. Note: It ispossible that your results could be significantly different based on the totalresource usage in the HOL environment.
HOL-SDC-1309
Page 171VMware Beta Program CONFIDENTIAL
Check Tier2 CPU Usage through Web Client
Click on the Home icon at the top of the screen. In the Inventories panel, click on the BigData Extensions icon.
HOL-SDC-1309
Page 172VMware Beta Program CONFIDENTIAL
View Your Cluster List
Click on Hadoop Clusters in the Inventory Lists panel. You can now view details ofHadoop Clusters already deployed.
HOL-SDC-1309
Page 173VMware Beta Program CONFIDENTIAL
Select Your Tier2 Cluster
1) Click on Tier2 Cluster
HOL-SDC-1309
Page 174VMware Beta Program CONFIDENTIAL
Find The Worker-0 VMs in Tier2 Cluster
In our clusters, the Data VM contain the Data Node Role for Hadoop. The Worker VMscontain the Tasktracker Role and are responsible for executing the tasks that make up aJob. Our goal is to make sure that we have the right number of Worker (Tasktracker) VMsavailable for the workload and prioritization defined for the clusters. Here we are goingto monitor the performance of a single worker node from each of our two clusters
1) Click on the Tier2-worker-0 VM in the Tier2 cluster.
Navigate to Advanced CPU Monitoring
Navigate to the Advanced CPU Performance tab for the Tier2-worker-0 VM.
1. Click on Monitor Tab2. Click on Performance Tab
HOL-SDC-1309
Page 175VMware Beta Program CONFIDENTIAL
3. Click on Advanced Tab4. Click to Close the Advanced Panel.
Create Custom Chart for Tier2-worker-0 VM
You are going to create a custom chart that contains CPU Usage and CPU Ready Time.You will save this as a chart called "Elasticity Test"
1) Click on Chart Options
HOL-SDC-1309
Page 176VMware Beta Program CONFIDENTIAL
Select the Performance Metrics for Your Custom Chart
1) Make sure that Target Object 0 is deselected and that Tier2-worker-0 is selected
2) Select the Ready Counter
3) Select the Usage Counter
Create "Elasticity Testing" View For Tier2 Worker-0 VM
1)Select "Save Options as" and choose type "Elasticity Testing" as the name.
This chart will now let you see CPU Ready Time and CPU Usage in a single pane.
A quick note on reading these numbers. This data is accumulated in 20 secondintervals. You are looking at the average CPU utilization % over that interval. Ready timeis a measure of the amount of time that a vCPU is ready to run, but has not yet beenscheduled on a physical CPU. This number should be less than 10% per vCPU. Thecollection interval is 20 seconds (or 20,000 Milliseconds). We are running with 1 vCPU
HOL-SDC-1309
Page 177VMware Beta Program CONFIDENTIAL
per VM, so Ready time above 2,000 Milliseconds potentially signals that there iscontention for resources and we may need to power down a Hadoop node VM tooptimize performance of the clusters. Note: Because of the nature of our HOLenvironment, there can be spikes in Ready time that are unrelated to the workloadwithin your individual labs. This means that VMs will tend to power on or off more oftenthan in other physical infrastructure. It is also possible that you will not see any VMspower down. If you do not see results in two to three minutes, move on in the labbecause the Ready time did not exceed the threshold needed to invoke the power off.You can see the expected behavior in the video at the beginning of the Module.
HOL-SDC-1309
Page 178VMware Beta Program CONFIDENTIAL
View Elasticity Testing Chart
1) Select "Elasticity Testing" from the Chart View Drop List.
HOL-SDC-1309
Page 179VMware Beta Program CONFIDENTIAL
Tier2-worker-0 Resource Consumption
Notice that we are using 100% of the one vCPU that is assigned to this VM. The ReadyTime number should be relatively low, however as mentioned in the note above, our labenvironment will cause some Ready Time spikes due to the extreme over-allocation ofresources to support thousands of VMs with limited physical Hardware.
Start MapReduce Job on Tier1 Cluster
Now we want to repeat our previous process and start the MapReduce Job on our Tier1cluster.
HOL-SDC-1309
Page 180VMware Beta Program CONFIDENTIAL
1. From the Windows desktop, Click on Putty2. Click on Tier1JobtrackerNode3. Click on Load4. Click on Open. Password is password
HOL-SDC-1309
Page 181VMware Beta Program CONFIDENTIAL
Show the MapReduce Script
1. Type cd /usr/lib/hadoop This moves you to the hadoop directory, which containsour script
2. Type ls -al run* to see the python scripts that call MapReduce Java apps. We aregoing to use the runPi.py script
HOL-SDC-1309
Page 182VMware Beta Program CONFIDENTIAL
Run Pi MapReduce on Tier1
1. Type python runPi.py This will start a Pi calculation MapReduce job that willsaturate the CPU usage on the worker VMs of your Tier1 cluster. This scriptexecutes a MapReduce job that is a heavy CPU process that will use 100% of theavailable resources in the worker (Tasktracker) VMs in our cluster. Note: Becauseof the nature of our lab environment, it is possible that you will not see the 100%CPU. You can see the expected result in the video at the beginning of the module.
HOL-SDC-1309
Page 183VMware Beta Program CONFIDENTIAL
Check Tier1 CPU Usage Through the Web Client
If you have not left the Performance Chart page we used to view Tier2 CPU, then ClickTwice on the Navigation Drop List to go back to your cluster list. You can also navigatedirectly there from the Drop list or by taking the path we used previously: Home -> BigData Extensions -> Hadoop Clusters -> Tier1
HOL-SDC-1309
Page 184VMware Beta Program CONFIDENTIAL
Select Your Tier1 Cluster
1) Click on Tier1 Cluster
HOL-SDC-1309
Page 185VMware Beta Program CONFIDENTIAL
Find Your Tier1-worker-0 VM
In our clusters, the Data VM contain the Data Node Role for Hadoop. The Worker VMscontain the Tasktracker Role and are responsible for executing the tasks that make up aJob. Our goal is to make sure that we have the right number of Worker (Tasktracker) VMsavailable for the workload and prioritization defined for the clusters. Here we are goingto monitor the performance of a single worker node from each of our two clusters
1) Click on the Tier1-worker-0 VM in the Tier1 cluster.
HOL-SDC-1309
Page 186VMware Beta Program CONFIDENTIAL
Create "Elasticity Testing" View For Tier1 Worker-0 VM
1) Make sure that Target Object 0 is deselected and that Tier1-worker-0 is selected
2) Select the Ready Counter
3) Select the Usage Counter
4) Select "Save Options As" and name it "Elasticity Testing"
Tier1-worker-0 VM Resource Consumption
You should see CPU for this VM at 100% Usage as expected. You also should be seeingsome increase in Ready Time. Note: As previously mentioned, due to the nature of our
HOL-SDC-1309
Page 187VMware Beta Program CONFIDENTIAL
lab environment, you may not see 100% CPU usage. To see the expected behavior, youcan view the video at the beginning of the module.
Tiered Service Levels - Set Resource Pool Priorities
We now want to show how to increase the priority of the Tier1 Hadoop cluster. We dothat by setting the CPU shares in the Tier1 Clusters Resource Pool to HIGH. Note that theshares were already set to HIGH for you.
1) Click on the Home Icon or Home Tab
HOL-SDC-1309
Page 188VMware Beta Program CONFIDENTIAL
2) Click on Hosts and Clusters
Raise the Priority on Your Tier1 Resource Pool
Raising the priority of a Resource Pool that contains a Hadoop Cluster means that thecluster will get a higher share of resources than Clusters that are created in lowerpriority Resource Pools.
HOL-SDC-1309
Page 189VMware Beta Program CONFIDENTIAL
1)Expand the Inventory List on the Left hand side of the screen and click on Tier1Hadoop Clusters Resource Pool
2) Click on the Manage Tab on the middle panel of the screen
Notice that the CPU Shares is already set to HIGH, but this is where you can change thissetting.
HOL-SDC-1309
Page 190VMware Beta Program CONFIDENTIAL
Edit Tier1 Clusters Resource Pool CPU Shares
1) Click on Edit
2) Notice Shares are set to High
3) Click on OK
Constrain CPU Resource
Because of the nuances of our Hands-on Lab environment, we are going to arbitrarilylimit the amount of CPU available to non Tier1 VMs by setting a CPU reservation on theTier1 Hadoop Clusters resource pool. This is not something you need to do in your ownenvironment to enable elastic scaling.
1) Right click on the Tier1 Hadoop Clusters Resource Pool and select Edit Settings
2) Set a CPU reservation for 4144 megahertz and click OK
HOL-SDC-1309
Page 191VMware Beta Program CONFIDENTIAL
As you view the performance charts later in the lab, you might like to come back hereand play with this Reservation amount. Increasing it will starve the Tier2 clusters,resulting in increases in CPU Ready time for its VMs.
HOL-SDC-1309
Page 192VMware Beta Program CONFIDENTIAL
Verify Worker Node VMs are Powered on
1) Click on Related Objects
2) Click on Virtual Machines
Verify that all Tier1 worker nodes are Powered On. They should be unless you poweredthem off in a Previous Lab.
HOL-SDC-1309
Page 193VMware Beta Program CONFIDENTIAL
Verify Tier2 Worker Nodes are Powered On
1) Click on Tier2 Hadoop Clusters Resource Pool
2) Click on Related Objects
3) Click on Virtual Machines
Verify that all Tier2 worker nodes are Powered On. They should be unless you poweredthem off in a Previous Lab.
HOL-SDC-1309
Page 194VMware Beta Program CONFIDENTIAL
Change Elasticity Mode to Auto
Now that we have set the Priority of the Tier1 Cluster Resource Pool to High, we wantvSphere to Automatically manage the number of Hadoop nodes that are running, basedon the workloads and that prioritization. We will set the elasticity level through the CLI
From the Windows Desktop perform the following steps:
1) Click on the Putty Icon
2) Select the SerengetiCLI session
3) Click the Load button
4) Click the Open button
5) At the OS login enter the Password. It is password
HOL-SDC-1309
Page 195VMware Beta Program CONFIDENTIAL
Connect to Serengeti CLI to Set Elasticity Mode
1. To open the CLI, type Serengeti The CLI commands are case sensitive.2. Type connect --host localhost:8443 to connect to our management server.
Username is root password is VMware1!
You are now in a Command Line environment that can interact directly with yourHadoop Clusters. Node: We have sometimes seen the cluster entirely CPU bound during
HOL-SDC-1309
Page 196VMware Beta Program CONFIDENTIAL
this test, which can make Connecting to Serengeti and running this command difficult.This is an artifact of our HOL environment. If you are unable to execute this portion ofthe lab, please see the video at the beginning of the lab for the expected results.
Listing Hadoop Cluster Details
1. To see your clusters type cluster list (note that up arrow will let you see yourcommand history. Notice that the AUTO ELASTIC is set to Disabled for both ourTier1 and Tier2 clusters.
Turn on Auto Elasticity Mode
To Turn on Auto Elasticity:
1) Type cluster setParam --name Tier2 --elasticityMode auto
2) Type cluster setParam --name Tier1 --elasticityMode auto
HOL-SDC-1309
Page 197VMware Beta Program CONFIDENTIAL
Note: you can change elasticity mode through the Big Data Extensions vCenter pluginby right clicking on the Tier1 and Tier2 clusters
Because you are consuming Host CPU by running the RunPi workload, this commandtakes a little longer than normal. Expect about 2 minutes for each cluster setParamcommand. Also note that you can set elasticity directly withing vCenter using the BigData Extensions plugin. Simply right click on the cluster you want to set and select SetElasticity, then select Auto.
HOL-SDC-1309
Page 198VMware Beta Program CONFIDENTIAL
Monitor Power Off/On Tasks
It may take a few minutes for vSphere to determine that a Node needs to be poweredoff
1) Go back to the vSphere Web Client
2) on the Right side of the screen, you will see the Recent Tasks Panel. Click on ALL
3) Click on More Tasks.
HOL-SDC-1309
Page 199VMware Beta Program CONFIDENTIAL
VMs Powering On/Off
In a couple of minutes you should see VMs in your Tier1 and Tier2 Clusters begin topower down. As mentioned before, the nature of our Hands on Lab infrastructure willmake this somewhat unpredictable, but generally you will see more Tier2 VMs Power offthan Tier1. Note: You should click the refresh button on this page to view the updatedtasks more quickly. If you do not see this occur in a couple of minutes, please view thevideo at the beginning of the module for the expected result. Sometimes the Ready timethreshold for powering down is not met and the VMs may not power off.
HOL-SDC-1309
Page 200VMware Beta Program CONFIDENTIAL
Monitoring CPU Performance Metrics
We will navigate back to our Custom Performance Views to see what is happening withCPU Usage and Ready time.
1) Click on the Home icon at the top of the screen.
2) In the Inventories panel, click on the Big Data Extensions icon.
HOL-SDC-1309
Page 201VMware Beta Program CONFIDENTIAL
View Your Cluster List
Click on Hadoop Clusters in the Inventory Lists panel. You can now view details ofHadoop Clusters already deployed.
HOL-SDC-1309
Page 202VMware Beta Program CONFIDENTIAL
Select Your Tier1 Cluster
1) Click on Tier1 Cluster
Find The Worker-0 VMs in Tier1 Cluster
In our clusters, the Data VM contain the Data Node Role for Hadoop. The Worker VMscontain the Tasktracker Role and are responsible for executing the tasks that make up aJob. Our goal is to make sure that we have the right number of Worker (Tasktracker) VMsavailable for the workload and prioritization defined for the clusters. Here we are goingto monitor the performance of a single worker node from each of our two clusters
HOL-SDC-1309
Page 203VMware Beta Program CONFIDENTIAL
1) Click on the Tier1-worker-0 VM in the Tier2 cluster.
Monitor Ready Time Reduction
1. Click on Monitor2. Click on Performance3. Click on the Chart Options View DropList and select "Elasticity Testing" This will
give you your CPU Usage and Ready View You should see some reduction in the
HOL-SDC-1309
Page 204VMware Beta Program CONFIDENTIAL
Ready time spikes based on a reduction in the CPU consumption across thecluster. Note: this will be dependent upon the infrastructure anomalies describedearlier in the module.
HOL-SDC-1309
Page 205VMware Beta Program CONFIDENTIAL
ConclusionThank you for participating in the VMware 2013 Hands-on Labs. Be sure to visithttp://hol.vmware.com/ to continue your lab experience online.
Lab SKU: HOL-SDC-1309
Version: 20140213-184824
HOL-SDC-1309
Page 206VMware Beta Program CONFIDENTIAL