Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents...

127
Table of Contents Lab Overview - HOL-SDC-1609 - vSphere Big Data Extensions ......................................... 2 Lab Guidance .......................................................................................................... 3 Virtualizing Big Data on vSphere Overview ............................................................. 5 vSphere Big Data Extensions 2.2 ............................................................................ 9 Verify Hadoop Clusters are Running ...................................................................... 10 Module 1 - Hadoop POC in Under an Hour (45 min) ........................................................ 15 Module Overview .................................................................................................. 16 Managed Hadoop Pooled Resources...................................................................... 17 Create a Basic Hadoop Cluster Via Web Client ...................................................... 28 Create a Hadoop Cluster with the Serengeti CLI ................................................... 39 Scale Out a Hadoop Cluster Via the Web Client .................................................... 44 Scale Out a Hadoop Cluster Via the Serengeti CLI ................................................ 49 Module 2 - Fast and Easy Deployment of Hadoop Clusters (15 Min) ............................... 54 Module Overview .................................................................................................. 55 Configure and Deploy Hadoop Clusters................................................................. 56 Resize Hadoop Cluster After Creation ................................................................... 70 Export Configuration and Create Customized Cluster ........................................... 75 Manage Hadoop Pooled Resources........................................................................ 84 Module 3 - Compute Only Clusters on Shared HDFS (15 Min) ......................................... 90 Module Overview .................................................................................................. 91 Create a Compute Only Cluster ............................................................................. 92 Module 4 - Managing Hadoop Clusters with Ambari Manager (30 Min) ........................... 99 Module Overview ................................................................................................ 100 Introduction to Ambari Manager ......................................................................... 101 Module 5 - Troubleshooting Big Data Extensions (30 Min) ............................................ 115 Module Overview ................................................................................................ 116 Finding and Viewing Log Files.............................................................................. 117 Starting and Stopping Services ........................................................................... 122 HOL-SDC-1609 Page 1 HOL-SDC-1609

Transcript of Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents...

Page 1: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Table of ContentsLab Overview - HOL-SDC-1609 - vSphere Big Data Extensions.........................................2

Lab Guidance .......................................................................................................... 3Virtualizing Big Data on vSphere Overview............................................................. 5vSphere Big Data Extensions 2.2 ............................................................................ 9Verify Hadoop Clusters are Running...................................................................... 10

Module 1 - Hadoop POC in Under an Hour (45 min) ........................................................ 15Module Overview .................................................................................................. 16Managed Hadoop Pooled Resources...................................................................... 17Create a Basic Hadoop Cluster Via Web Client...................................................... 28Create a Hadoop Cluster with the Serengeti CLI ................................................... 39Scale Out a Hadoop Cluster Via the Web Client .................................................... 44Scale Out a Hadoop Cluster Via the Serengeti CLI ................................................ 49

Module 2 - Fast and Easy Deployment of Hadoop Clusters (15 Min) ...............................54Module Overview .................................................................................................. 55Configure and Deploy Hadoop Clusters................................................................. 56Resize Hadoop Cluster After Creation ................................................................... 70Export Configuration and Create Customized Cluster ...........................................75Manage Hadoop Pooled Resources........................................................................ 84

Module 3 - Compute Only Clusters on Shared HDFS (15 Min) .........................................90Module Overview .................................................................................................. 91Create a Compute Only Cluster............................................................................. 92

Module 4 - Managing Hadoop Clusters with Ambari Manager (30 Min) ...........................99Module Overview ................................................................................................ 100Introduction to Ambari Manager ......................................................................... 101

Module 5 - Troubleshooting Big Data Extensions (30 Min) ............................................115Module Overview ................................................................................................ 116Finding and Viewing Log Files.............................................................................. 117Starting and Stopping Services ........................................................................... 122

HOL-SDC-1609

Page 1HOL-SDC-1609

Page 2: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Lab Overview - HOL-SDC-1609 - vSphere Big

Data Extensions

HOL-SDC-1609

Page 2HOL-SDC-1609

Page 3: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Lab GuidanceThe Apache Hadoop software library is a framework that allows for the distributedprocessing of large data sets across clusters of computers, designed to scale up fromsingle servers to thousands of machines, each offering local compute and storageresources. Hadoop is being used by enterprises across verticals for Big Data analytics tohelp make better business decisions based on large data sets.

VMware enables you to easily and efficiently deploy and use Hadoop on your existingvirtual infrastructure through vSphere Big Data Extensions (BDE). BDE makes Hadoopvirtualization-aware, improves performance in virtual environments and enablesdeployment of Highly Available (HA) Hadoop clusters in minutes. vSphere BDEautomates deployment of a Hadoop cluster, and thus provides better Hadoopmanageability and usability.

In this lab, you will execute 15 minute "lightning labs" to configure and deploy Hadoopand HBase clusters on local storage in minutes. You will also create compute-onlyclusters that allow the use of shared storage across multiple Map Reduce clusters,providing multi-tenancy and enabling easy scaling in or scaling out of computeresources.

There is a full length lab to simulate a complete Hadoop Proof Of Concept (POC). In thePOC module, you will configure and deploy your cluster, add data to HDFS and run MapReduce jobs against your deployed cluster.

The modules and timing are as follows:

Hadoop POC In Under an Hour: 45 min

• Add resources• Create cluster Hadoop/Hbase

Fast and Easy Deployment of Hadoop Cluster: 15 min.

• Create and resize standard Hadoop Clusters with multiple distros and config• Export a Hadoop configuration for later use• Manage resources (Add/delete Network, Resource Pools, Datastores)

Create Compute-only and No-Distro clusters on shared HDFS: 15 min.

• Create Cluster with No Hadoop Distribution installed.

Manage Hadoop Clusters with Ambari Manager: 30 min.

• Start and stop processes with a third party manager.• Add additional services to the cluster.

HOL-SDC-1609

Page 3HOL-SDC-1609

Page 4: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Lab Captains: Julie Roman, Matthew Portnoy, Chris Saunders, David Morse

This lab manual can be downloaded from the Hands-on Labs Document site found here:

http://docs.hol.pub/catalog/

This lab may be available in other languages. To set your language preference and havea localized manual deployed with your lab, you may utilize this document to help guideyou through the process:

http://docs.hol.vmware.com/announcements/nee-default-language.pdf

Activation Prompt or Watermark

When you first start your lab, you may notice a watermark on the desktop indicatingthat Windows is not activated.

One of the major benefits of virtualization is that virtual machines can be moved andrun on any platform. The Hands-on Labs utilizes this benefit and we are able to run thelabs out of multiple datacenters. However, these datacenters may not have identicalprocessors, which triggers a Microsoft activation check through the Internet.

Rest assured, VMware and the Hands-on Labs are in full compliance with Microsoftlicensing requirements. The lab that you are using is a self-contained pod and does nothave full access to the Internet, which is required for Windows to verify the activation.Without full access to the Internet, this automated process fails and you see this

watermark.

This cosmetic issue has no effect on your lab. If you have any questions or concerns,please feel free to use the support made available to you either at VMworld in theHands-on Labs area, in your Expert-led Workshop, or online via the survey comments aswe are always looking for ways to improve your hands on lab experience.

HOL-SDC-1609

Page 4HOL-SDC-1609

Page 5: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Virtualizing Big Data on vSphereOverviewBig Data applications are moving from the workstations and test beds of thedevelopment groups into the data centers of corporations around the world. But what isBig Data, and as a virtualization practitioner, why should you care? And what impact canit have back to the business?

What is Hadoop?

Hadoop is an Apache open source project that provides scalable and distributedcomputing. The model provides a framework that can process large amounts of data byleveraging the parallel and distributed processing of many compute nodes arrayed in acluster. These clusters can be configured as a single host or scaled up to utilizethousands of machines depending on the workload. As the software in the projectmatures, more companies are using it in lieu of the traditionally more expensive andmore complex commercial databases like Oracle or Microsoft SQL Server.

Why should you care? The multi-node architecture of Hadoop with its dynamicscalability is a natural fit for the virtual infrastructure you already know.

Why should the business care?

Better resource utilization:

Collocating virtual machines containing Hadoop roles with virtual machines containingdifferent workloads on the same set of VMware ESXi™ server hosts can balance the useof the system. This leads to lower operating expenses and lower capital expenses asyou can leverage the existing infrastructure and skills in the data center and you do nothave to invest in bare-metal servers for your Hadoop deployment.

Alternative storage options:

Originally, Hadoop was developed with local storage in mind, and this type of storagescheme can be used with vSphere also. The shared storage that is frequently used as abasis for vSphere can also be leveraged for Hadoop workloads. This re-enforcesleveraging the existing investment in storage technologies for greater efficiencies in theenterprise.

Isolation:

This includes running different versions of Hadoop itself on the same cluster or runningHadoop alongside other applications, forming an elastic environment, or differentHadoop tenant. Isolation can reduce your overall security risk, ensure you are meetingyour SLA’s, and support Hadoop as a service back to the lines of business.

HOL-SDC-1609

Page 5HOL-SDC-1609

Page 6: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Availability and fault tolerance:

The NameNode, the Resource Manager and other Hadoop components, such as HiveMetastore and HCatalog, can be single points of failure in a system. vSphere servicessuch as VMware vSphere High Availability (vSphere HA) and VMware vSphere FaultTolerance (vSphere FT) can protect these components from server failure and improveavailability. Resource management tools such as VMware vSphere vMotion® andVMware vSphere Distributed Resource Scheduler™ (vSphere DRS) can provideavailability during planned maintenance and can be used to balance the load across thevSphere cluster. Uptime of critical applications is just as important in a Hadoopenvironment, why would the enterprise want to go back in time to a place where theservers and server components were single points of failure. Leverage the existinginvestment in vSphere to drive meeting SLA’s and providing an excellent service back tothe business.

Hadoop Projects

Like most open source projects, the solution itself has a core set of functionality and anumber of add-in modules that users can choose to use, or not. The core projects aremade up of :

Hadoop Common - The utilities that support the other Hadoop modules.

Hadoop Distributed File System - The distributed file system used by most Hadoopdistributions . Also known by its initials, HDFS.

Hadoop YARN - Used to manage cluster resources and schedule jobs.

Hadoop Map Reduce - YARN based system of processing large amounts of data.

Additional Hadoop Projects

In addition to the core modules just described there are others that provide specific andspecialized capabilities to this distributed processing framework. These are just some ofthe tools, and yes, they all seem to have unusual names:

Ambari - A web-based tool for provisioning, management, and monitoring of Hadoopclusters.

HBase - Distributed database that supports structured data storage.

HOL-SDC-1609

Page 6HOL-SDC-1609

Page 7: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Hive - Data warehouse model with data summarization and ad hoc query capability.

Pig - Data flow language.

ZooKeeper - Coordination service for distributed applications.

Why should you care? As you, or your company, begin to deploy Big Data workloads,know that there are tools available to augment the capabilities of the core distributionto provide more sophisticated data manipulation, faster execution, graphicalcomputation, and many other data processing features.

These are modules available from the Apache open-source project, but there are alsomore than thirty other companies that provide Hadoop distributions that include theopen-source code as well as adding competing management solutions, processingengines, and many other features. Some of the best known and widest used aredistributed from Cloudera, MapR, and Hortonworks.

VMware Big Data Extensions

Big Data Extensions, also known as BDE, is an automated big data provisioning andmanagement solution that lets you deploy and centrally operate Hadoop and HBaseclusters.

Big Data Extensions simplifies the deployment and provisioning process, giving you acluster-wide, real time view of all running services and the status of their virtual hosts.With Big Data Extensions, you have a single place from which to manage and monitoryour cluster as well as elastic scaling to help you optimize cluster performance andutilization.

Why should you care?

Big Data Extensions is included as part of vSphere Enterprise or Enterprise PlusEditions. It facilitates Hadoop deployment on vSphere through a vCenter Web ClientPlugin. When the data analysts in your organization are talking about standing upHadoop clusters, you can offer them an environment that provides availability,manageability, security, and performance comparable to a physical deployment.

Why should the business care?

Rapid provisioning:

vSphere Big Data Extensions enables rapid deployment, management, and scalability ofHadoop in virtual and cloud environments. Virtualization tools ranging from simplecloning to sophisticated end-user provisioning products such as VMware vRealizeAutomation™ can speed up the deployment of Hadoop, which in most cases requiresmultiple nodes with different configurations. A virtualized infrastructure with scale in/outcapabilities built into Big Data Extensions tools enables on-demand Hadoop instances.This enables IT to be a service provider back to the business and provide Hadoop as a

HOL-SDC-1609

Page 7HOL-SDC-1609

Page 8: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

service back to the different lines of business, providing faster time to market. This willfurther enable today's IT to be a value driver vs. seen as a cost center.

Learn more about Big Data Extensions by doing this Hands On Lab!

HOL-SDC-1609

Page 8HOL-SDC-1609

Page 9: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

vSphere Big Data Extensions 2.2RELEASE HIGHLIGHTS:

• Fully Qualified Domain Name Management. Big Data Extensions integrateswith a Dynamic DNS server in its network through which it provides meaningfulhost names to the nodes in a Hadoop cluster.

• Centralized User Management.You can specify an Active Directory or LDAPserver for user authentication, letting you mange users from a central point for allthe Hadoop nodes managed by BDE.

• Resize Hadoop Clusters on Demand. You can reduce the number of virtualmachines in a running Hadoop cluster. The virtual machines are deleted,releasing all resources such as memory, CPU, and storage.

• Increase Cloning Performance and Resource Usage of VirtualMachines. BDE can rapidly clone and deploy virtual machines using InstantClone, a feature of vSphere 6.0.

• Centralized Logging with vRealize Log Insight or other External SyslogServers. You can configure Big Data Extensions to use an external (or remote)syslog server, such as VMware vRealize Log Insight.

• Quiesce Big Data Extensions for Backup and Maintenance Procedures.You can quiesce Big Data Extensions so you can safely back-up and restore yourenvironment, or perform other maintenance tasks.

• Support for the Latest Hadoop Distributions. Big Data Extensions supportsBigtop 0.8, Cloudera CDH 5.4, Hortonworks HDP 2.2, MapR 4.1, and Pivotal PHD3.0.

• Support for the Latest Partner Hadoop Management Tools. Big DataExtensions supports Cloudera Manager 5.4 and Ambari 1.7.

• Ability to Deploy Different Types of Hadoop Clusters with 3rd PartyApplication Manager. In addition to basic Hadoop clusters, with Big DataExtensions you can deploy HBase clusters, MapReduce clusters, compute-onlyclusters, data-compute separated clusters, and several variations of customizedcluster types to meet your requirements.

• Support for EMC Isilon OneFS 7.2. Big Data Extensions provides anautomated process to deploy and manage compute-only clusters on EMC IsilonOneFS 7.2.

• International Language Support. Big Data Extensions is localized into sixadditional languages: Chinese (Simplified), Chinese (Traditional), French, German,Korean, and Japanese. This additional language support provides easy access to aWeb interface and documentation that is fully translated.

• Big Data Extensions Upgrade. You can upgrade Big Data Extensions 2.1 to thecurrent version, Big Data Extensions 2.2, and preserve all the data from theclusters created in Big Data Extensions. All of your existing clusters can bemanaged by Big Data Extensions 2.2 once the upgrade completes.

HOL-SDC-1609

Page 9HOL-SDC-1609

Page 10: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Verify Hadoop Clusters are RunningAs part of the deployment of your lab, the two Hadoop clusters that were created foryou will need to be started.

If you restart this lab, please ensure you repeat the steps below prior to starting eachmodule to verify the clusters are running.

Use Putty to SSH to management-server

1. Double-click the PuTTY icon on the desktop.

2. Click the SerengetiCLI session.

3. Click Open.

HOL-SDC-1609

Page 10HOL-SDC-1609

Page 11: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Connect to the Serengeti CLI

You should automatically be logged into the serengeti management server.

The ./serengeti-shell script will automatically login to the Serengeti CLI (NOTE: youcan type ./ser and hit TAB to auto-complete the command).

This script automates the following tasks:

• Launches the serengeti shell (located at /opt/serengeti/sbin/serengeti)• Connects to the local management server (connect --host localhost:8443)• Enters the username (administrator@corp)• Enters the password (VMware1!)

You should then see "Connected" followed by the "serengeti>" prompt, as shownabove.

Listing Hadoop Cluster Details

1. To see your clusters type cluster list (note: the up arrow will let you cyclethrough the command history)

2. ambarihbase and bigtopyarn clusters must have a STATUS of RUNNING. If youare just starting this lab, or the status is STOPPED or ERROR, you will need tostart the cluster (see the next step for instructions).

HOL-SDC-1609

Page 11HOL-SDC-1609

Page 12: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

3. ambarihbase only needs to be running if you are going to do the HbaseDeployment Module

4. Note: Clusters take several minutes to start, so you don't want to start a clusteryou are not going to use.

HOL-SDC-1609

Page 12HOL-SDC-1609

Page 13: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Start a Hadoop Cluster

NOTE: This step is only necessary if any clusters are in the STOPPED or ERROR state, aslisted in the previous step. If they're all RUNNING, don't try to start them.

1) Type cluster start --name "cluster name". Replace "cluster name" with the nameof the cluster that needs to be started.

You do not need to wait for the clusters to start, since the first few steps in each moduledo not depend on the clusters running. Feel free to continue, and check back on thestatus of the start command.

HOL-SDC-1609

Page 13HOL-SDC-1609

Page 14: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Exit the Serengeti Session

Type exit in the command window or exit the Serengeti Session by selecting the X thatthe top right of the window to close the session.

Choose OK at the Exit Confirmation screen.

HOL-SDC-1609

Page 14HOL-SDC-1609

Page 15: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Module 1 - Hadoop POC inUnder an Hour (45 min)

HOL-SDC-1609

Page 15HOL-SDC-1609

Page 16: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Module OverviewHadoop clusters typically require specialized expertise and dedicated hardwareinfrastructure to deploy. In this module, we will explore the benefits of running Hadoopon VMware vSphere. By virtualizing Hadoop clusters, you are able to deploy multipleVMs per host, which allows you to separate data from compute. By doing this, you canseamlessly scale the compute layer within your Hadoop cluster, while keeping the dataseparate.

Other benefits of running Hadoop on vSphere include:

• Run multiple compute workloads on the same physical hardware, optimizingresource utilization

• Eliminate the need for dedicated hardware to run Hadoop workloads• Inherit better reliability and flexibility due to High Availability (HA), vMotion, and

DRS features of the vSphere platform

In this module, we will simulate a rapid proof of concept using vSphere Big DataExtensions. We will explore the following key concepts:

• Mapping vSphere resources to Big Data Extensions resources for consumption byHadoop

• Quickly create multiple types of Hadoop clusters• Simple scale-out of Hadoop compute node on vSphere

NOTE: If you have not already done so, please ensure all the clusters are running byfollowing the instructions in the initial section of this lab.

Let's get started!

HOL-SDC-1609

Page 16HOL-SDC-1609

Page 17: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Managed Hadoop Pooled ResourcesHadoop makes excellent use of the system resources that are made available to it. Inan environment with shared physical resources that have been virtualized, it isimportant to appropriately assign the resources that can be used by your Hadoopclusters. vSphere allows you to dedicate specific CPU, RAM, storage and networkingresources to your Hadoop clusters.

In this module, you will use the vSphere Big Data Extensions Plugin to add network andstorage resources to the Hadoop clusters.

Open the Firefox Browser

Select the Firefox Browser from the desktop.

Login to vSphere Web Client

1. If the vCenter Single Sign-On screen does not appear choose the Site A Web Clientfrom the shortcuts at the top of the screen.

HOL-SDC-1609

Page 17HOL-SDC-1609

Page 18: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

2. Check the Use Windows session authentication checkbox to login to the vSphereWeb Client.

3. Click the Login button. In case this fails, you can uncheck the box and specify ausername of CORP\Administrator with a password of VMware1! (Note: ! is part of thepassword)

HOL-SDC-1609

Page 18HOL-SDC-1609

Page 19: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Explore the vSphere Environment

In the vSphere Web Client, click the Hosts and Clusters icon as shown above.

HOL-SDC-1609

Page 19HOL-SDC-1609

Page 20: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Hosts and Clusters View

First, take a look at the resource pools that are configured in this vSphereenvironment.

The vSphere Big Data Extensions will leverage these resource pools to ensure ourHadoop clusters have the resources they need, while also ensuring resources are notovercommitted, which could negatively impact other applications.

HOL-SDC-1609

Page 20HOL-SDC-1609

Page 21: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Storage View

Next, click on the Storage and Networking icons to get a sense of the data stores andnetworks that are configured in this environment.

Notice that there is one NFS datastore (ds-site-a-nfs01) configured, and there are alsolocal VMFS volumes (Local-ds-esx-0na) for each of the ESXi hosts.

In the next steps, we'll configure our Hadoop clusters to use both shared and localstorage, a key benefit of using the vSphere Big Data Extensions.

HOL-SDC-1609

Page 21HOL-SDC-1609

Page 22: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Navigate to Big Data Extensions Plugin

To get to the Big Data Extensions plugin, first click the Home icon, then choose BigData Extensions from the sidebar menu.

HOL-SDC-1609

Page 22HOL-SDC-1609

Page 23: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Explore BDE Plugin

First, let's take a look at the Hadoop clusters that are already configured in thisenvironment. Click on the Big Data Clusters item in the sidebar menu, as shownabove.

HOL-SDC-1609

Page 23HOL-SDC-1609

Page 24: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

View Big Data Clusters

Notice that there are two Hadoop clusters configured in this vSphere environment. Thecolumnar view on the right indicates each cluster's name, status, which Hadoopdistribution is running, the resource pool it belongs to, and the list of nodes. As wesaw in the last lesson, resource pools are how we manage how Hadoop consumes theunderlying physical resources.

This is an important differentiator versus using dedicated physical hardware for Hadoop,where the resources may be wasted when Hadoop jobs are not running. vSphere allowsyou to run a mix of workloads, while also guaranteeing resources based upon businessneeds.

HOL-SDC-1609

Page 24HOL-SDC-1609

Page 25: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

View Cluster Actions

1. Click on All Actions, and note all the actions that can be taken on a cluster fromwithin the vSphere Web Client. We will investigate these further in a futurelesson.

2. Return to the main BDE menu by clicking the Big Data Extensions back buttonindicated above.

HOL-SDC-1609

Page 25HOL-SDC-1609

Page 26: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Click Resources

Click the Resources item under Inventory lists, as highlighted above.

HOL-SDC-1609

Page 26HOL-SDC-1609

Page 27: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Map vCenter Resources to BDE Inventory Items

This screen is where we map vSphere datastores into constructs that the Big DataExtensions will allocate to Hadoop clusters. Notice that a couple of mappings arealready made. (Note: Depending on the lab environment, there may be no datastoresavailable to be shown. Refer to the illustration above instead.)

The Big Data Extensions can consume both shared and local storage, depending uponthe specific need. In this screen, we can see that there is a dsLOCAL (local datastore)item, that maps to the local storage on each of the 4 ESX hosts in our datacenter. Thereis also a defaultDSSHARED datastore item, that is mapped to the ds-site-a-nfs01NFS vSphere datastore. Checkboxes allow multiple datastores to be easily managedand consumed by our Hadoop clusters.

1. Click the Add icon (plus sign) as shown above to view the Add Datastore dialog.You can enter in an arbitrary name, specify if the datastore will be local orshared, and check which datastores will be used. (Your view may not includedadditional datastores.)

2. Since we already have all the mappings we need, click Cancel to close thisdialog.

3. Click the Home icon to return to the vSphere Web Client home page.

HOL-SDC-1609

Page 27HOL-SDC-1609

Page 28: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Create a Basic Hadoop Cluster Via WebClientIn this lesson, we will create a Hadoop cluster via the vSphere Web Client.

Navigate to Big Data Extensions Plugin

Click on Big Data Extensions in the side bar.

HOL-SDC-1609

Page 28HOL-SDC-1609

Page 29: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Simulate Creating a Basic Hadoop Cluster

A basic Hadoop cluster mimics the standard deployment you'd see with physical Hadoopclusters, in that the Datanode and Task Tracker reside within a single machine. In otherlessons, you will see that it can be advantageous to separate these services into theirown VMs.

Click Big Data Clusters under Inventory Lists.

Click the New Cluster button

Click on the New Big Data Cluster icon, as indicated above.

HOL-SDC-1609

Page 29HOL-SDC-1609

Page 30: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Specify Cluster Details

1. Enter Basic_Hadoop as the name of the new cluster.2. Select Default as the Application Manager choice.3. Select bigtop as the Hadoop distribution. Cloudera CDH, MapR, Hortonworks HDP,

and Pivotal Hadoop distributions are also supported.

Click Next to continue.

Set Deployment Type

Select Basic Hadoop Cluster as the deployment type. This mimics a traditionalphysical Hadoop deployment.

Here is a description of the deployment types available:

• Basic Hadoop Cluster: For simple Hadoop deployments for proof of conceptprojects and other small-scale data processing tasks.

• Basic HBase Cluster: HBase clusters can contain JobTracker or Tasktrackernodes to run HBase MapReduce jobs.

• Compute-only Hadoop Cluster: For running MapReduce jobs; they read datafrom external HDFS clusters, and don't store data.

• Compute Workers Only Cluster: If you already have a physical Hadoop clusterand want to do more CPU or memory intensive operations, you can increase thecompute capacity by provisioning a workers only cluster. With the computeworkers only clusters, you can "burst out to virtual." Worker only clusters are notsupported on Ambari and Cloudera Manager application managers.

• HBase Only Cluster: contains only HBase Master, HBase RegionServer, andZookeeper nodes, but not Namenodes and Datanodes. The advantage of having

HOL-SDC-1609

Page 30HOL-SDC-1609

Page 31: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

an HBase only cluster is that multiple HBase clusters can use the same externalHDFS.

• Data-Compute Separation Hadoop Cluster: Allows you to separate the dataand compute nodes, which allow control of where nodes are placed on your ESXihosts. Also facilitates elastic scaling of compute nodes as shown later in thislab.

• Customized Cluster: Allows creation of clusters using the same configurationfile as previously created clusters. You can also edit the file to further customizethe cluster configuration.

HOL-SDC-1609

Page 31HOL-SDC-1609

Page 32: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Select the Custom Template

Each distinct Hadoop node configuration is called a Node Group. The Node Groups youwill see depends on the Deployment Type selected in the previous step. Node Groupsallow for deployment of similar types of nodes, and share common attributes such asvCPU, RAM, and datastore. In the image above, you are going to deploy 1 DataMasternode (which runs the NameNode service), 1 ComputeMaster node (which runs theJobTracker service), and 3 Worker nodes (which contain the DataNode and TaskTrackerservices).

• Under the DataMaster Node Group, choose Customize... in the Resourcetemplate drop down list.

NOTE: You can also use the command-line interface or Customize... to define any NodeGroup you want.

HOL-SDC-1609

Page 32HOL-SDC-1609

Page 33: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Customize the Template

Change the defaults to these values, as shown above:

1. vCPU number: 12. Memory size: 3748 MB3. Storage size: 10 GB4. Datastore type: Shared (NOTE: this is customizable for each Node Group, so you

could choose to have your Worker nodes use Local storage, but put theDataMaster/NameNode and/or ComputeMaster/Jobtracker on Shared storage.This allows the use of vSphere HA or FT to improve the availability of those

nodes, while still ensuring data locality for compute nodes)5. Click OK to continue.

HOL-SDC-1609

Page 33HOL-SDC-1609

Page 34: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Specify the Resources for the Cluster

1. Choose the Customize... option for each Node Group, and size each NodeGroups' resources as shown in the previous step, except for the Worker Nodes(keep them as Local storage and 50 GB).

2. Set the Number of nodes for each Node Group to 1.3. Once you've ensured your choices match the ones shown in the screenshot

above, click Next to continue.

HOL-SDC-1609

Page 34HOL-SDC-1609

Page 35: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Network and Resource Pool

Leave the Hadoop topology and Network settings at their default values.

Click Next to continue.

Select Resource Pool

Check the Cluster Site A resource pool as shown above, and click Next to continue.

HOL-SDC-1609

Page 35HOL-SDC-1609

Page 36: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Set User Information

Choose an administrative password for the Big Data Cluster.

The password and confirmation password must match.

Type VMw@re15 as the password. ( The password must contain at least one upper casecharacter, one lower case character, one digit, and one of the following specialcharacters: _, @, $, %, ^, &, *. )

Choose Next to continue.

HOL-SDC-1609

Page 36HOL-SDC-1609

Page 37: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Cancel Creation

Review the settings for the Big Data cluster; they should look similar to the screenshotabove.

NOTE!!! Due to resource and time constraints for the lab, DO NOT CLICKFINISH! Click Cancel, as creating a cluster is too resource and time intensivefor the HOL environment.

Click Cancel to cancel the deployment. Watch the video below to see a deployment of aHadoop cluster.

HOL-SDC-1609

Page 37HOL-SDC-1609

Page 38: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Video

<div class="player-unavailable"><h1 class="message">An error occurred.</h1><div class="submessage"><ahref="http://www.youtube.com/watch?v=DnIYlA3E0TU" target="_blank">Try watching this video on www.youtube.com</a>, or enableJavaScript if it is disabled in your browser.</div></div>

HOL-SDC-1609

Page 38HOL-SDC-1609

Page 39: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Create a Hadoop Cluster with theSerengeti CLIIn the last lesson, we used the vSphere Web Client to walk through creating a newHadoop cluster. We will now run through the same process using the Serengeti CLI. TheCLI allows you to have finer-grained control over cluster creation, including the ability tospecify what roles run on which nodes in the cluster.

Use Putty to SSH to management-server

1. Double-click the PuTTY icon on the desktop.2. Click the SerengetiCLI session.3. Click Open.

HOL-SDC-1609

Page 39HOL-SDC-1609

Page 40: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Connect to the Serengeti CLI

HOL-SDC-1609

Page 40HOL-SDC-1609

Page 41: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Explore the Serengeti CLI

Try out the following commands in the CLI to get an idea of how the environment isconfigured:

• cluster list - lists all the Hadoop clusters and some of their configuration• resourcepool list - lists vSphere resource pools• datastore list - lists the Serengeti datastores• network list - lists the network mappings• help - shows a list of all available commands within this shell

Create a Hadoop Cluster via the CLI

Now we will walk through how to create a Hadoop cluster via the CLI. This process issimilar to using the vSphere Web Client, but there are more options available.

View specFile

Hadoop cluster configurations are defined by JSON specification files. Let's take a lookinside one of these spec files.

1. If you're in the Serengeti shell, type exit to get back to a Linux bash prompt2. Change to the right directory: cd /opt/serengeti/samples/3. Type less compute_only_cluster.json to look at a sample specification file.

HOL-SDC-1609

Page 41HOL-SDC-1609

Page 42: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

4. Scroll through the file using the arrow keys, and press q to quit when you'redone.

Using a json file via the CLI allows more control over the configuration of the cluster,including role placement across nodes in the cluster.

Exit the Serengeti Session

Exit the Serengeti Session by selecting the X that the top right of the window to closethe session.

Choose OK at the Exit Confirmation screen.

HOL-SDC-1609

Page 42HOL-SDC-1609

Page 43: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Create the Cluster - Video

This video shows the process to create a compute-only cluster using an existing HDFSfilesystem. We won't actually create another cluster in this lab due to timeconstraints, but here is the command to use in the CLI along with the json file:

cluster create --name SharedHDFSTest --specFile /opt/serengeti/samples/compute_only_cluster.json

<div class="player-unavailable"><h1 class="message">An error occurred.</h1><div class="submessage"><ahref="http://www.youtube.com/watch?v=GZ8ni2DgBgo" target="_blank">Try watching this video on www.youtube.com</a>, or enableJavaScript if it is disabled in your browser.</div></div>

HOL-SDC-1609

Page 43HOL-SDC-1609

Page 44: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Scale Out a Hadoop Cluster Via theWeb ClientThis lesson will show you how to scale out a Hadoop cluster via the vSphere Web Client.The next lesson will walk through the same procedure using the CLI interface.

The ability to easily scale your Hadoop clusters up and down is a key benefit of runningHadoop on vSphere. It is very difficult and costly to achieve the same results ondedicated physical hardware.

Navigate to Big Data Extensions Plugin

1. If you are not already in the Big Data Extensions plugin, navigate back to it byclicking the Home icon.

2. Select Big Data Extensions from the sidebar menu as shown above.

HOL-SDC-1609

Page 44HOL-SDC-1609

Page 45: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Select Big Data Clusters

Select Big Data Clusters in the sidebar menu.

HOL-SDC-1609

Page 45HOL-SDC-1609

Page 46: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Scale Out the bigtopyarn Cluster

1. Right-click on bigtopyarn2. Choose 'Scale Out/In...'

HOL-SDC-1609

Page 46HOL-SDC-1609

Page 47: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Increase Worker Instances

If we wanted to increase the number of compute nodes (workers), we would increasethe Adjustment from 0 to 1.

Upon execution of this task, Big Data Extensions would clone a new worker node, andautomatically add it to the bigtopyarn Hadoop cluster.

The Scale In would allow us to reduce the number by changing the Adjustment to -1.That would reduce the number of workers on execution.

NOTE!! Due to time and resource constraints, please don't scale the clustereither In or Out at this time.

Click Cancel.

Video of the Resize Process

HOL-SDC-1609

Page 47HOL-SDC-1609

Page 48: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

This video shows you the resize process in action.

HOL-SDC-1609

Page 48HOL-SDC-1609

Page 49: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Scale Out a Hadoop Cluster Via theSerengeti CLIIn this lesson, we will scale out a Hadoop cluster using the Serengeti CLI.

Use Putty to SSH to management-server

1. Double-click the PuTTY icon on the desktop.2. Click the SerengetiCLI session.3. Click Open.

Connect to the Serengeti CLI

Type ./serengeti-shell to automatically login to the Serengeti CLI (NOTE: you can type./ser and hit TAB to auto-complete the command).

HOL-SDC-1609

Page 49HOL-SDC-1609

Page 50: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

This script automates the following tasks:

• Launches the serengeti shell (located at /opt/serengeti/sbin/serengeti)• Connects to the local management server (connect --host localhost:8443)• Enters the username (administrator@corp)• Enters the password (VMware1!)

You should then see "Connected" followed by the "serengeti>" prompt, as shownabove.

HOL-SDC-1609

Page 50HOL-SDC-1609

Page 51: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Listing Hadoop Cluster Details

To see your clusters, type cluster list. Note that up arrow will let you see yourcommand history.

Notice that we currently have 3 workers in the bigtopyarn cluster. In the next step, wewill see how to expand this.

HOL-SDC-1609

Page 51HOL-SDC-1609

Page 52: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

View Cluster Resize Syntax

First, let's look at the help for the cluster resize command. Type help cluster resizeand press ENTER.

Take a look at the keywords for the command.

Enter the Resize Command

As you can see from the help information, the command syntax we'd use to scale outthe basicyarn cluster to 5 worker nodes is:

cluster resize --name basicyarn --nodeGroup worker --instanceNum 5

Note!! Due to HOL constraints, do not actually run this command, as it is resourceintensive and time-consuming.

HOL-SDC-1609

Page 52HOL-SDC-1609

Page 53: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Exit the Serengeti Session

Exit the Serengeti Session by selecting the X that the top right of the window to closethe session.

Choose OK at the Exit Confirmation screen.

HOL-SDC-1609

Page 53HOL-SDC-1609

Page 54: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Module 2 - Fast and EasyDeployment of Hadoop

Clusters (15 Min)

HOL-SDC-1609

Page 54HOL-SDC-1609

Page 55: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Module OverviewHadoop clusters typically require specialized expertise and dedicated hardwareinfrastructure to deploy. In this module you will see how easy it is to configure yourHadoop cluster nodes, size the virtual machines - including CPU, Memory and Storage -and deploy into your existing vSphere environment. As resource demands change overtime - or throughout the day - you can resize the Hadoop cluster to accommodate thesechanges. Lastly, once a cluster is configured, you will see how to export thatconfiguration, and use it to create or update other Hadoop clusters.

Note: You MUST run the "Verify Hadoop Clusters Are Running" step under the LabOverview section prior to doing this module.

HOL-SDC-1609

Page 55HOL-SDC-1609

Page 56: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Configure and Deploy Hadoop ClustersIn this module, you will deploy an Apache Hadoop cluster using the vSphere Web Clientand the vSphere Big Data Extensions.

Login to vSphere Web Client

Open Firefox and login to the vSphere Web Client by

1. checking the Use Windows session authentication checkbox, and

2. clicking the Login button.

In case this fails, you can uncheck the box and specify a username of CORP\Administrator with a password of VMware1! (Note: ! is part of the password)

HOL-SDC-1609

Page 56HOL-SDC-1609

Page 57: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Navigate to Hosts and Clusters

Click on Hosts and Clusters.

HOL-SDC-1609

Page 57HOL-SDC-1609

Page 58: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Create a Resource Pool

Resource Pools allow you to limit the amount of CPU and Memory that can be consumedby your Hadoop cluster, but as you will see in Module 6, they also are the mechanismfor establishing the priority of one cluster over another in the case of resourcecontention.

Right-click on the cluster named Cluster Site A, and select 'New Resource Pool...'.

HOL-SDC-1609

Page 58HOL-SDC-1609

Page 59: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Configure Resource Pool

1. Name the resource pool MyResourcePool.

2. Leave all settings at the defaults and click OK.

Return to the Home Page

Click the Home icon at the top and select Home to return to the Home page.

HOL-SDC-1609

Page 59HOL-SDC-1609

Page 60: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Navigate to Big Data Extensions Plugin

This is a vCenter Plugin providing specific capability to configure, deploy, and manageyour Big Data environment.

Click on the Big Data Extensions tab

HOL-SDC-1609

Page 60HOL-SDC-1609

Page 61: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Select Big Data Clusters

Two Hadoop clusters have been created for this lab. If any cluster that you need has notstarted, or if it has an error status, please follow the directions in the Verify HadoopClusters are Running step under the Lab Overview section prior to doing thismodule.

Click on Big Data Clusters in the sidebar.

Create a Hadoop Cluster

Click New Big Data Cluster icon in the objects tab.

HOL-SDC-1609

Page 61HOL-SDC-1609

Page 62: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Select Cluster Name and Hadoop Distribution

You will choose your preferred Hadoop Distribution. Supported distros include Cloudera,Mapr, HortonWorks, and PivotablHD. We will use the bigtop distribution in this module.

There are several deployment types for your clusters. You can mimic the typical physicalHadoop deployment with the Basic Hadoop Cluster. This type will separate theNamenode and Jobtracker into their own Virtual Machines, however each Tasktrackerand Datanodes combination will be in a single Virtual Machine. You also have the optionof separating the Compute (Tasktracker) from the Datanode using the Data/ComputeSeparation Hadoop option. This facilitates the elastic scaling of Compute you can see inModule 6.

For this Module, enter or select the following options:

Name : BasicHadoop

Application Manager: Default

Hadoop distribution: bigtop

Click Next to continue.

Select the Deployment Type

Select Basic Hadoop Cluster for the Deployment Type if it is not already selected.

HOL-SDC-1609

Page 62HOL-SDC-1609

Page 63: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Select the custom template

Each distinct Hadoop Node configuration is called a Node Group. You will see specificNode Groups based on the Deployment Type you selected, but you can also use theCommand Line Interface to define any type of Node Group you want. In this section, youare sizing the virtual machine CPU, RAM, and Data storage for each Node Group. You willalso define the number of a specific node group to deploy. In the image above, you aregoing to deploy 3 Worker Nodes, containing a NodeManager and DataNode, 1ComputeMaster (ResourceManager), and 1 DataMaster (NameNode).

1. Click the Resource template button

2. Select Customize.

Customize the Template

Note that you can select Shared or Local storage. Typically, Hadoop has been deployedwith local storage to provide the data locality that is central to its performance. You cansee that each Node Group can be configured with its own Datastore type. This meansthat, for instance, your DataNodes can run on Local storage, while you have theResourceManager and Namenodes on Shared storage. This allows the use of vSphere HAor FT to improve the availability of those nodes while still ensuring data locality.

Change the defaults to :

HOL-SDC-1609

Page 63HOL-SDC-1609

Page 64: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

vCPU number : 1

Memory size : 3748

Storage Size : 10

Datastore type : Shared

Click OK.

HOL-SDC-1609

Page 64HOL-SDC-1609

Page 65: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Select Resources for the Cluster

Make sure to select the Customize option and size each Node Group resources as in theprevious step. (Note: The Worker Node group will require a Storage size of 20GB.)

Set the number of nodes for each worker to 1.

Click Next to continue.

HOL-SDC-1609

Page 65HOL-SDC-1609

Page 66: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Topology and Network

Leave the default settings (which may be slightly different than shown above).

Click the Next button to continue.

Select the proper Resource Pool

Select the resource pool, MyResourcePool, that you created in a earlier step.

Click Next.

HOL-SDC-1609

Page 66HOL-SDC-1609

Page 67: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Set Cluster Password

Set a custom administrative password for the nodes.

Use the password 'VMware1$'.

Select Next to continue.

HOL-SDC-1609

Page 67HOL-SDC-1609

Page 68: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Cancel Creation

Review the deployment settings.

NOTE!!! Due to resource and time constraints for the lab, DO NOT CLICKFINISH! Click Cancel, as creating a cluster is too resource and time intensivefor the HOL environment.

Depending on the size of the cluster, it takes anywhere from 6 to 20 minutes to deployand be running.

ClickCancel to cancel the deployment. Watch the video below to see a deployment of aHadoop cluster.

HOL-SDC-1609

Page 68HOL-SDC-1609

Page 69: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Video

HOL-SDC-1609

Page 69HOL-SDC-1609

Page 70: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Resize Hadoop Cluster After CreationAs resource demands change over time - or throughout the day - you can resize theHadoop cluster to accommodate these changes. In this module, you will use thevSphere Big Data Extensions Plugin to resize an existing cluster.

Navigate to Big Data Extensions Plugin

Click on the "Big Data Extensions" tab

HOL-SDC-1609

Page 70HOL-SDC-1609

Page 71: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Select Big Data Clusters

Click on the Big Data clusters tab.

HOL-SDC-1609

Page 71HOL-SDC-1609

Page 72: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Select the Cluster

Choose the bigtopyarn cluster for the Resize process. Because of resource and timingconstraints in the lab environment, we will not actually complete the creation ofadditional nodes.

Right click thebigtopyarn cluster from the Center Panel list of Clusters.

HOL-SDC-1609

Page 72HOL-SDC-1609

Page 73: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Select Scale Out/In

Scaling Out in our environment will create additional nodes for the Node Group youselect. vSphere will automatically provisionthe Virtual Machine, install and configure theappropriate Hadoop components for your selected NodeGroup, and startup, theservices. Scaling In will decrease the number of nodes in the selected NodeGroup.

Select Scale Out/In.

HOL-SDC-1609

Page 73HOL-SDC-1609

Page 74: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Select the Node Group to resize

Select node group, you want to resize. Here there is only the worker or client groups toadjust.

Adjust the number of instances. Adjusting up will add nodes to the group, whileadjusting down will remove nodes from the group.

Note!! Due to the time it takes to make configuration changes and resourceconstraints in the lab environment, we will not be doing any changes to thecluster.

Click Cancel.

Watch the video below to see the scale out of a cluster.

Video

HOL-SDC-1609

Page 74HOL-SDC-1609

Page 75: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Export Configuration and CreateCustomized ClusterOnce a Hadoop cluster is configured, you will be able export that configuration and useit to create or update the configuration of other Hadoop clusters. In this module, youwill export a running configuration, and deploy a customized cluster from thatconfiguration.

Use Putty to SSH to management-server

1. Double-click the PuTTY icon on the desktop.2. Click the SerengetiCLI session.3. Click Open.

HOL-SDC-1609

Page 75HOL-SDC-1609

Page 76: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Connect to the Serengeti CLI

Type ./serengeti-shell to automatically login to the Serengeti CLI (NOTE: you can type./ser and hit TAB to auto-complete the command).

This script automates the following tasks:

• Launches the serengeti shell (located at /opt/serengeti/sbin/serengeti)• Connects to the local management server (connect --host localhost:8443)• Enters the username (administrator@corp)• Enters the password (VMware1!)

You should then see "Connected" followed by the "serengeti>" prompt, as shownabove.

HOL-SDC-1609

Page 76HOL-SDC-1609

Page 77: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

List bigtopyarn Information

Locate the running cluster, by typing

cluster list --name bigtopyarn

HOL-SDC-1609

Page 77HOL-SDC-1609

Page 78: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Export bigtopyarn Configuration

To change the cluster's configuration, we must first export it to a configuration file.

Type :

cluster export --name bigtopyarn --specFile /home/serengeti/bigtopyarn.json

Configuration File

The cluster configuration file is stored as a json file. To see its contents exit serengeti bytyping:

quit

Then type:

more /home/serengeti/bigtopyarn.json

A spacebar will advance the page. 'q' will quit the more command.

You can edit it with your favorite text editor, and when you are done, just save it. Noticethat the configuration includes definition of the Node groups and specific Hadoopconfigurations.

HOL-SDC-1609

Page 78HOL-SDC-1609

Page 79: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Due to time constraints for the lab, we will not be editing the file. A sample of thefile is provided below.

{

"appManager" : "Default",

"nodeGroups" : [

{

"name" : "master",

"roles" : [

"hadoop_namenode",

"hadoop_resourcemanager"

],

"instanceNum" : 1,

"instanceType" : "MEDIUM",

"storage" : {

"type" : "shared",

"sizeGB" : 10

},

"cpuNum" : 1,

"memCapacityMB" : 3748,

"swapRatio" : 1.0,

"haFlag" : "off",

"configuration" : {

"hadoop" : { }

}

HOL-SDC-1609

Page 79HOL-SDC-1609

Page 80: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

},

{

"name" : "worker",

"roles" : [

"hadoop_datanode",

"hadoop_nodemanager"

],

"instanceNum" : 3,

"instanceType" : "SMALL",

"storage" : {

"type" : "local",

"sizeGB" : 10

},

"cpuNum" : 1,

"memCapacityMB" : 3748,

"swapRatio" : 1.0,

"haFlag" : "off",

"configuration" : {

"hadoop" : { }

}

},

{

"name" : "client",

"roles" : [

"hadoop_client",

HOL-SDC-1609

Page 80HOL-SDC-1609

Page 81: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

"pig",

"hive",

"hive_server"

],

"instanceNum" : 1,

"instanceType" : "MEDIUM",

"storage" : {

"type" : "local",

"sizeGB" : 10

},

"cpuNum" : 1,

"memCapacityMB" : 3748,

"swapRatio" : 1.0,

"haFlag" : "off",

"configuration" : {

"hadoop" : { }

}

}

],

"configuration" : {

"hadoop" : {

"hdfs-site.xml" : { },

"yarn-env.sh" : { },

"hadoop-env.sh" : { },

"core-site.xml" : { },

HOL-SDC-1609

Page 81HOL-SDC-1609

Page 82: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

"yarn-site.xml" : { },

"log4j.properties" : { },

"mapred-site.xml" : { },

"capacity-scheduler.xml" : { },

"fair-scheduler.xml" : { }

}

},

"clusterCloneType" : "instant",

"hostnamePrefix" : ""

}

Deploying the Custom Cluster

If you did edit the file, for example changing the number of vCPUs in the worker nodefrom 1 to 2, you could then deploy the customized cluster by typing:

cluster create --name bigtopyarn_2cpu --specFile /home/serengeti/bigtopyarn.json

after reconnecting to the serengeti CLI.

NOTE!! Creating a cluster is too time and resource intensive for the HOLenvironment.

Do not enter this command.

HOL-SDC-1609

Page 82HOL-SDC-1609

Page 83: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Exit the Serengeti Session

Exit the Serengeti Session by selecting the X that the top right of the window to closethe session.

Choose OK at the Exit Confirmation screen.

Video

Due to time and resource constraints in our lab environment we will not execute thecommand, but have created a video showing the above command.

HOL-SDC-1609

Page 83HOL-SDC-1609

Page 84: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Manage Hadoop Pooled ResourcesHadoop makes excellent use of the system resources that are made available to it. Inan environment with shared physical resources that have been virtualized, it isimportant to appropriately assign the resources that can be used by your Hadoopclusters. vSphere allows you to specifically make available CPU, RAM, Storage andVirtual Networks to your Hadoop clusters. In this module, you will use the vSphere BigData Extensions Plugin to add network and storage resources to the Hadoop Clusters.

Navigate to Big Data Extensions Plugin

1.) Click on the "Big Data Extensions" tab

Select Resources

1.) Click on the Resources tab.

HOL-SDC-1609

Page 84HOL-SDC-1609

Page 85: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Find Your Datastores

This process is not creating new datastores. It is simply allowing the administrator todetermine which datastores can be used when creating Hadoop clusters. vSphere willthen create virtual disks across those datastores during cluster creation.

1.) Select the Datastores tab.

Add datastore

1.) Click on the plus sign in the upper left corner to open the add datastore window.

HOL-SDC-1609

Page 85HOL-SDC-1609

Page 86: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Add datastore details

Fill out the information for the datastores you want to add. The Name you specify canbe used in SpecFiles to refer to this set of datastores.

1.) Name : Test datastores

2.) Datastore type : Shared

Select if the datastores are shared or local storage.

Select Cancel because we have already added the datastores into your environment.

Networks

You are able to easily segment network traffic for specific clusters by adding multiplenetworks and using them in the cluster create specFiles.

1.) Select the tab Networks

HOL-SDC-1609

Page 86HOL-SDC-1609

Page 87: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Add Network

1.) Click on the plus sign in the upper left corner to open the add networks window.

HOL-SDC-1609

Page 87HOL-SDC-1609

Page 88: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Network information

Fill out the information for your selected network.

Name : This will be the name you refer to when creating your cluster specFiles. EnterTest Network.

Port group name : Then name of the port group, where the network is attached. Do notselect.

Use DHCP to obtain IP addresses. Check this, if there is DHCP on the network. Do notcheck.

IP range : Type the IP range, that the VM's can use. Enter 192.168.1.2 and192.168.1.254

Network mask : The subnet mask of the network. Enter 255.255.255.0

Gateway : The gateway of the network. Enter 192.168.1.1

DNS : The DNS server of the network. Enter 192.168.1.1

Select cancel, to exit the guide.

HOL-SDC-1609

Page 88HOL-SDC-1609

Page 89: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

HOL-SDC-1609

Page 89HOL-SDC-1609

Page 90: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Module 3 - Compute OnlyClusters on Shared HDFS

(15 Min)

HOL-SDC-1609

Page 90HOL-SDC-1609

Page 91: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Module OverviewHadoop clusters typically require specialized expertise and dedicated hardwareinfrastructure to deploy. In the previous module you deployed a Basic Hadoop clusterthat separated the Namenode and Jobtracker into their own Virtual Machines, kept eachTasktracker and Datanodes combination in a single Virtual Machine. In this module youwill see how easy it is to not only separate your Jobtracker and Namenode, but also toput Tasktrackers and Datanodes into their own VMs as well. This separation of Computeand Data is the key element of the Elastic Scaling that is demonstrated in Module 6 ofthis lab. Specifically, you will create a Compute Only Cluster that deploys JobTracker,Namenode and Tasktracker nodes, but does not create new Datanodes. Instead, youwill point to an existing Hadoop File System (HDFS) that was previously created. Thevalue in this is many organizations have isolated Hadoop clusters today that make useof some of the same data. You can now easily spin up a cluster and point it to existingdata in HDFS instead of copying it into a new filesystem.

Note: If you have not done so in a previous module, You MUST run "Verify HadoopClusters Have Started" step under the lab overview section prior to doing this module.

HOL-SDC-1609

Page 91HOL-SDC-1609

Page 92: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Create a Compute Only ClusterYou will deploy a Hadoop compute only cluster, that uses an external HDFS filesystem,

and HVE

Hadoop Virtualization Extensions (HVE) are changes VMware has submitted to theopensource Apache community to make Hadoop run better on virtualized infrastructure.HVE refines Hadoop‟s replica placement, task scheduling and balancer policies. Hadoopclusters implemented on virtualized infrastructure have full awareness of the topologyon which they are running. Thus, the reliability and performance of these clusters areenhanced. For more information about HVE, you can refer to https://issues.apache.org/jira/browse/HADOOP-8468.

HOL-SDC-1609

Page 92HOL-SDC-1609

Page 93: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Connect to the Big Data Extensions CLI

1. Open Putty2. Select the SerengetiCLI3. Click Open

HOL-SDC-1609

Page 93HOL-SDC-1609

Page 94: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Start Big Data Extensions Command Line Interface (CLI)

You will be automatically connected to the Big Data Extensions server and automaticallylogged into the serengeti CLI.

Hadoop Rack topology

Hadoop makes placement and execution decisions based on datacenter topology.Administrators provide their datacenter topology via a topology file. It specifies, for

instance, the racks in the datacenter and the servers on each rack. In a virtualenvironment we have introduced the concept of a nodegroup to represent servers (thatare actually VMs) that are running on a specific esxi host. You can make Hadooptopology aware by uploading your topology file through the Big Data Extensions CLI. Weare showing you a very simple example that only defines the Racks and physical hosts.

To do this, you would upload the file topology.txt by typing :

topology upload --fileName /opt/serengeti/conf/rack_topology.txt

However, the file has already been uploaded for you in this lab so you will see anotification if you attempt to upload the file again.

(Note: if you would like to upload the file you can type "rm -rf /opt/serengeti/conf/rack_topology.txt" and follow the steps above to upload the file.)

HOL-SDC-1609

Page 94HOL-SDC-1609

Page 95: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

There is a copy of the file in the Lab Files directory on the desktop if you want toexamine it.

The contents of the rack_topology.txt file is :

rack1: esx-01a.corp.local,

rack2: esx-02a.corp.local,

rack3: esx-03a.corp.local,

rack4: esx-04a.corp.local

List Topology

Verify that the topology has been uploaded, by typing :

topology list

And see that the rack/host table is as expected.

Configuring Compute Only Hadoop Cluster

As we saw in a Modules 1 and 2, Hadoop Clusters can be created directly through thevSphere Big Data Extensions plugin. They can also be created through the CLI using ajson SpecFile. The specFile contains the cluster configuration and points to the externalHadoop Filesystem using the "ExternalHDFS" tag. This tag points to the Namenode ofan existing Hadoop cluster.

This enables the new cluster to use the already existing HDFS filesystem, whiledeploying Master and compute resources.

HOL-SDC-1609

Page 95HOL-SDC-1609

Page 96: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

{"externalHDFS": "hdfs://192.168.110.123:8020","distro":"PivotalHD","nodeGroups":[

{"name": "master","roles": [

"hadoop_jobtracker"],"instanceNum": 1,"storage":{"type": "SHARED","sizeGB":1

},"cpuNum": 1,"memCapacityMB": 1024,"haFlag": "off","rpNames": [

"Tier2RP"]

},{

"name": "worker","roles": [

"hadoop_tasktracker"],"instanceNum": 1,"cpuNum": 1,"memCapacityMB": 1024,"storage": {

"type": "LOCAL","sizeGB": 1

},"rpNames": [

"Tier2RP" // change this to the resource pool added via Serengeti CLI]

},{"name": "client","roles": [

"hadoop_client"],"instanceNum": 1,"cpuNum": 1,"memCapacityMB": 1024,"storage": {

"type": "SHARED","sizeGB": 1

HOL-SDC-1609

Page 96HOL-SDC-1609

Page 97: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

},"rpNames": [

"Tier2RP"]

}],"configuration": {

"hadoop": {"core-site.xml": {

// check for all settings at http://hadoop.apache.org/docs/stable/core-default.html // note: any value (int, float, boolean, string) must be enclosed in double quotes and here is a sample:// "io.file.buffer.size": "4096"

},"hdfs-site.xml": {

// check for all settings at http://hadoop.apache.org/docs/stable/hdfs-default.html },"mapred-site.xml": {

// check for all settings at http://hadoop.apache.org/docs/stable/mapred-default.html },"hadoop-env.sh": {

// "HADOOP_HEAPSIZE": "",// "HADOOP_NAMENODE_OPTS": "",// "HADOOP_DATANODE_OPTS": "",// "HADOOP_SECONDARYNAMENODE_OPTS": "",// "HADOOP_JOBTRACKER_OPTS": "",// "HADOOP_TASKTRACKER_OPTS": "",// "HADOOP_CLASSPATH": "",// "JAVA_HOME": "",// "PATH": ""

},"log4j.properties": {

// "hadoop.root.logger": "INFO,RFA",// "log4j.appender.RFA.MaxBackupIndex": "10",// "log4j.appender.RFA.MaxFileSize": "100MB",// "hadoop.security.logger": "DEBUG,DRFA"

}}

}}

Deploy Hadoop Cluster With PivotalHD Distro

From the Big Data Extensions CLI you can deploy the Compute Only Cluster withPivotalHD as the Distro and take advantage of HVE to provide virtual topologyawareness. Below is an example of the command used to deploy an alternate distro toApache. In this example, the file Pivotal.txt would specify the PivotalHD distro to beused. We will not actually execute this command in the lab because the PivotalHDdistro has not been installed in the Serengeti server.

Type :

HOL-SDC-1609

Page 97HOL-SDC-1609

Page 98: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

cluster create --name Pivotal --topology HVE --distro PivotalHD --specFile /opt/serengeti/conf/Pivotal.txt

Due to time and resource constraints in our lab environment, do not executethe command. The video at the end of this section shows the deployment of aCompute Only Hadoop Cluster.

Exit the Serengeti Session

Exit the Serengeti Session by selecting the X that the top right of the window to closethe session.

Choose OK at the Exit Confirmation screen.

Video

HOL-SDC-1609

Page 98HOL-SDC-1609

Page 99: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Module 4 - ManagingHadoop Clusters with

Ambari Manager (30 Min)

HOL-SDC-1609

Page 99HOL-SDC-1609

Page 100: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Module OverviewThe Apache Ambari project is aimed at making Hadoop management simpler bydeveloping software for provisioning, managing, and monitoring Apache Hadoopclusters. Ambari provides an intuitive, easy-to-use Hadoop management web UI backedby its RESTful APIs.

Ambari enables System Administrators to:

Manage a Hadoop Cluster

-Ambari provides central management for starting, stopping, and reconfiguringHadoop services across the entire cluster.

Monitor a Hadoop Cluster

-Ambari provides a dashboard for monitoring health and status of the Hadoopcluster.

-Ambari leverages Ambari Metrics System for metrics collection.

-Ambari leverages Ambari Alert Framework for system alerting and will notify youwhen your attention is needed (e.g., a node goes down, remaining disk space is low,etc).

Ambari enables Application Developers and System Integrators to:

Easily integrate Hadoop provisioning, management, and monitoring capabilities totheir own applications with the Ambari REST APIs.

In this module you will get an Ambari managment interface overview, monitor Hadoopservices, start and restart services, and finally the ability to add existing services to helpyou manage your Hadoop environment.

fromapache org

HOL-SDC-1609

Page 100HOL-SDC-1609

Page 101: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Introduction to Ambari ManagerIf you have tried the earlier Modules, you have used VMware vCenter with the Big DataExtensions plugin to create, manage, and monitor Hadoop clusters on a VMwareinfrastructure. Big Data Extensions also supports using some third party Hadoopmanagers. If your organization is using the Cloudera distribution of Hadoop it will befamiliar with using Cloudera Manager to manage those Hadoop clusters. Manydistributions use the Apache Ambari Manager. Both of these are currently supported.This lab will showcase working with Ambari Manager to manage and monitor Hadoopclusters.

Connect to the Ambari Manager

Open the Firefox browser.

Right click on the Ambari Manager shortcut and select Open in a New Tab.

Sign In to Ambari

Sign into the Ambari Manager.

Enter admin as the Username.

Enter admin as the Password.

HOL-SDC-1609

Page 101HOL-SDC-1609

Page 102: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Click the Sign in button to continue.

HOL-SDC-1609

Page 102HOL-SDC-1609

Page 103: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

The Dashboard - Metrics

Across the top of the page are a number of tabs including Dashboard which is where youshould arrive. The Dashboard provides a compact overview of the health of the clustershowing a number of key metric that you can examine. If you use the mouse and hoverover many of the portlets you can get additional information about that metric.

HOL-SDC-1609

Page 103HOL-SDC-1609

Page 104: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Services

To the left of the metrics is a sections that shows the various services that have beenadded to this cluster and their current state. All of the services are all green.

Choose the HBase link to investigate further.

HOL-SDC-1609

Page 104HOL-SDC-1609

Page 105: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

HBase Services

This displays a summary of the metrics and status particular to this service. Next to theSummary tab, the Configs tab gives you a look at how the nodes have been configuredand the opportunity to make adjustments if necessary.

From the Summary tab, select the Active HBase Master link.

HOL-SDC-1609

Page 105HOL-SDC-1609

Page 106: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Stop the HBase Master

The HBase Master is Started.

Click on the Started button get the menu.

Select Stop. When the Confirmation screen asks, 'Are you sure?', choose OK.

HOL-SDC-1609

Page 106HOL-SDC-1609

Page 107: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Starting Up

A status screen appears displaying the progress of the operation. When it completes,select OK.

On the Components panel the Active HBase Master has now been stopped as shown bythe red warning icon.

Choose the Dashboard tab at the top of the screen. The Services area also shows thatthe HBase service has an issue.

Hosts

Select the Hosts tab.

Components

All of the nodes that comprise the Hadoop cluster are displayed. Each node here isactually a virtual machine that see back in vCenter if we choose to. The far rightcolumn shows which processes or components are running on each node.

HOL-SDC-1609

Page 107HOL-SDC-1609

Page 108: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Examine the Components on the master node by choosing the hyperlink in theComponents column.

Starting Components

We can see the red warning icon on the master node indicating a problem.

Select master node hyperlink to investigate further.

Restarting the HBase Master

The HBase Master service is stopped.

Click on the Stopped button, and choose Start to restart the service.

Respond to the 'Are you sure?' confirmation with OK.

The HBase Master will restart. Choose OK to clear the Operations window. The HBaseMaster has been restarted.

HOL-SDC-1609

Page 108HOL-SDC-1609

Page 109: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Return to Hosts

Select Back to return to the Hosts screen.

HOL-SDC-1609

Page 109HOL-SDC-1609

Page 110: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

The Service is restarted

Now that the service has been restarted, the node is now healthy.

Adding New Services

Select the Dashboard tab. Some of our metrics are not displaying due to the lack of aGanglia service.

In the Services area, select the Actions button and choose Add Service.

HOL-SDC-1609

Page 110HOL-SDC-1609

Page 111: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Choose Services

Services are loaded into the Hadoop platform as needed. Examine the options to seewhat functionality could be added to the environment.

Select Ganglia by clicking the checkbox.

Choose Next to continue.

HOL-SDC-1609

Page 111HOL-SDC-1609

Page 112: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Linking Services

A warning appears that explains we may need more than just the Ganglia service. Whilethe Ganglia service provides metric collections, the Nagios service adds monitoring andalerts and it is suggested to add both.

Select Cancel to clear the warning. Then select Nagios by clicking the checkbox andchoose Next to continue.

Assign Masters

The Wizard places the services on the client node, but you can opt to run themelsewhere.

Choose the defaults and select Next to continue.

Assign Slaves and Masters

Again the Wizard chooses where the components should be run. Keep the defaults andchoose Next to continue.

HOL-SDC-1609

Page 112HOL-SDC-1609

Page 113: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Customize Services

The wizard has again done some configurations. The red '2' indicates that there are stillmandatory configurations that must be completed before we can continue.

Enter a password ('admin' for example) into the password field, and reenter it forvalidation in the second password field.

Enter a valid email address ('[email protected]' for example) into the email field.

As each requirement is successfully completed, the red highlight will disappear and thenumber will decrement until it reaches zero and disappears.

Before you select Next to continue, choose the Ganglia tab to investigate theconfiguration that are possible with that service.

HOL-SDC-1609

Page 113HOL-SDC-1609

Page 114: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Review (but do NOT deploy)

You can review the services before continuing to the actual deployment and installation.

Because of the time and resource constraints of this lab, DO NOT select Deploy.

Instead, choose the X at the top right corner of the Service Wizard screen to canceladding the services.

HOL-SDC-1609

Page 114HOL-SDC-1609

Page 115: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Module 5 -Troubleshooting Big Data

Extensions (30 Min)

HOL-SDC-1609

Page 115HOL-SDC-1609

Page 116: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Module OverviewIn this section you will find information about administering and troubleshootingcomponent of the Big Data Extensions Environment. Taken from the view point of a BigData admin, we will show you the locations of log files and which ones are important toleverage when you run into issues.

HOL-SDC-1609

Page 116HOL-SDC-1609

Page 117: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Finding and Viewing Log FilesIn order to troubleshoot issues when things go wrong, we need to know where to look!This module will cover the important log files to help troubleshoot your environment.And of course you can always ship these off to your favorite syslog analyzer, such asVMware Log Insight.

Log in to Serengeti Server

1. Double Click on Putty on the Desktop

2. Click on SerengetiCLI

3. Click on Load

4. Click on Open

HOL-SDC-1609

Page 117HOL-SDC-1609

Page 118: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Log in as Seregenti User

1. The Putty Session should log you in automatically to the serengeti CLI, then type"exit"

HOL-SDC-1609

Page 118HOL-SDC-1609

Page 119: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Navigate to Serengeti Log Files

1.) Type cd /opt/serengeti/logs to enter the correct directory

2) Enter ls to list the files in the directory

3.) Notice the key log file serengeti.log

/opt/serengeti/logs is the most important directory to look into for BDE logging. This is agood place for admins to look when troubleshooting issues in the environment.

HOL-SDC-1609

Page 119HOL-SDC-1609

Page 120: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Search the serengeti.log for WARN

1. run a query using the grep command, type cat ./serengeti.log | grep WARN

What if I only care about failures and errors?

cat ./serengeti.log | grep fail* or

HOL-SDC-1609

Page 120HOL-SDC-1609

Page 121: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

cat ./serengeti.log | grep error

HOL-SDC-1609

Page 121HOL-SDC-1609

Page 122: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Starting and Stopping ServicesWhen in doubt restart! Every admin has done this, but when something is not working,our first inclination is to restart the service! In this module we will show you how to stopthe various services from the command line.

Login into Serengeti Server

1. Double click on putty on the desktop

2. Select SerengetiCLI

3. Click on Load

4. Click on Open

HOL-SDC-1609

Page 122HOL-SDC-1609

Page 123: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Login to the Serengeti CLI

1. The Putty Session should log you in automatically to the serengeti CLI, then type"exit"

HOL-SDC-1609

Page 123HOL-SDC-1609

Page 124: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Checking Status and Starting Tomcat

1. Type "sudo service tomcat status" notice the status of the service. Your screen makelook different.

2. Type "sudo service tomcat start", alternatively we can type "sudo service tomcatstop" to stop tomcat and "sudo service tomcat restart" to restart tomcat

3. Notice the status now

HOL-SDC-1609

Page 124HOL-SDC-1609

Page 125: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Login into Ambari Server

1. Select ambari.corp.local

2. Click Open

3. Type password VMware1! to log in

HOL-SDC-1609

Page 125HOL-SDC-1609

Page 126: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

Checking Status, Stopping and Starting ambari-server

1. Check ambari server status by typing "ambari-server status"

2. Stop ambari server by typing "ambari-server stop"

3. Start ambari server by typing "ambari-server start"

HOL-SDC-1609

Page 126HOL-SDC-1609

Page 127: Table of Contents - VMwaredocs.hol.vmware.com/HOL-2016/hol-sdc-1609_pdf_en.pdf · Table of Contents Lab Overview - HOL-SDC-1609 ... The Apache Hadoop software library is a framework

ConclusionThank you for participating in the VMware Hands-on Labs. Be sure to visithttp://hol.vmware.com/ to continue your lab experience online.

Lab SKU: HOL-SDC-1609

Version: 20160128-161234

HOL-SDC-1609

Page 127HOL-SDC-1609