Administration of Hadoop Summer 2014 Lab Guide v3.1

8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

1/107

Administration of HadoopLab Guide

Summer 2014


2/107

Cluster Admin on Hadoop

This Certified Training Services Partner Program Guide (the Program Guide) is protected under

U.S. and international copyright laws, and is the exclusive property of MapR Technologies,

Inc. 2014, MapR Technologies, Inc. All rights reserved.

PROPRIETARY AND CONFIDENTIAL INFORMATION ii

2014 MapR Technologies, Inc. All Rights Reserved.


3/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 000

12345 /67# '89:;:B? #8?8CD8EA

Contents

Administration of Hadoop Lab Guide ...................................................................... i

Get Started ............................................................................................................ 8

F8B GB6CB8E 4H G8B I7 6 =6J 8;D0C


4/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 0D

12345 /67# '89:;:B? #8?8CD8EA

.6J 4A4H "C8]0;?B6== D6=0E6B0


5/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* D

12345 /67# '89:;:B? #8?8CD8EA

"C69B098 ,C86B0;> 6;E #8K R


6/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* D0

12345 /67# '89:;:B? #8?8CD8EA

.6J WA[H /0CC


7/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* D00

12345 /67# '89:;:B? #8?8CD8EA

.6J YA2H G8B I7 G/'" AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ^W

.6J YA[H /8BC09?@ /


8/107

Get Started

Get Started 1: Set up a lab environment in Amazon Web

Services (AWS)

':0? ?8B I7 7C


9/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* ^

12345 /67# '89:;:B? #8?8CD8EA

(MG 7C


10/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 43

12345 /67# '89:;:B? #8?8CD8EA

EA G8=89B fK67C]8K8;B ,


11/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 44

12345 /67# '89:;:B? #8?8CD8EA

JA dG M8?B N$C8>


12/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 42

12345 /67# '89:;:B? #8?8CD8EA

JA G8=89B f(== ',"f QC8@ C8D08S Vf ?B6B8 6;E ?B6BI? 9:89T? B< 9


13/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4[

12345 /67# '89:;:B? #8?8CD8EA

#" G8=89B f(MG /6;6>8K8;B ,


14/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 45

12345 /67# '89:;:B? #8?8CD8EA

JA ?8=89B B:8 fK67C]


15/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4W

12345 /67# '89:;:B? #8?8CD8EA

$ passwd mapr

B:8; BV78 B:8 76??S


16/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4Y

12345 /67# '89:;:B? #8?8CD8EA

'< C8?B6CB B:8 0;?B6;98?@ C8786B B:8?8 ?B87?@ 6;E ?8=89B iGB6CBj 0; ?B87 WA #8K8KJ8C@ V


17/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4Z

12345 /67# '89:;:B? #8?8CD8EA

Get Started 2: Setup passwordless ssh access between

nodes

M:8; B8?B0;> :6CES6C8 ; U6E


18/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4P

12345 /67# '89:;:B? #8?8CD8EA

Get Started 3: Log into the class cluster

-


19/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4^

12345 /67# '89:;:B? #8?8CD8EA


20/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 23

12345 /67# '89:;:B? #8?8CD8EA

&" .0; 6?

892]I?8C

76??S


21/107


22/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 22

12345 /67# '89:;:B? #8?8CD8EA

Get Started 4: Explore the MapR Control System

':8 /67# ,


23/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 2[

12345 /67# '89:;:B? #8?8CD8EA

Lab Procedure

Log on and explore different views of the cluster

!" ,


24/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 25

12345 /67# '89:;:B? #8?8CD8EA

&" GB87 WA %; B:8 ;6D0>6B08 D08S QC


25/107


"#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 2W

12345 /67# '89:;:B? #8?8CD8EA

^A ,"d IB0=0L6B06B0


26/107

Lesson 1: Pre-install

Lab OverviewIn this lesson you will learn where you can download a collection of tools and scripts that we will

use to prepare the cluster hardware for the parallel execution of tests and then test and

measure the performance of the hardware components for our cluster to determine that they

are functioning properly and within the specifications for Hadoop installation. we will also

identify the current firmware for each of the new hardware components in the cluster, and

update these components to make sure that they have matching firmware.

Lab 1.1: Pre-install validation downloads, setup and clustershell

Lab 1.2: Network, Memory and IO

Lab Procedures

Lab 1.1: Pre-install validation

Note: One of the most common causes for a failure when installing Hadoop is that the hardware

is not within the necessary specifications. You can see a list of the current hardware and OS

specifications at: http://doc.mapr.com/display/MapR/Preparing+Each+Node

The Professional Services team at MapR has developed a collection of all of the tools and scripts

that we will need to validate our hardware and prepare it for installation.

1. Download the cluster-validation package onto your master node from:

https://github.com/jbenninghoff/cluster-validation/archive/master.zip

Extract master.zip and move the pre-install and post-install folder directly under /root for

simplicity.

2. Here, we will find two directories, pre-installand post-install. We will use the tools and

scripts inside the pre-install directory to validate our new hardware prior to installing

Hadoop. We will use tools and scripts in the post-install later, to test our new cluster

after we have completed our install.
https://github.com/jbenninghoff/cluster-validation/archive/master.ziphttps://github.com/jbenninghoff/cluster-validation/archive/master.ziphttps://github.com/jbenninghoff/cluster-validation/archive/master.zip


27/107


Note: The tools and files in this collection are updated frequently, so we should always make

sure we download the latest package when preparing for a new Hadoop installation.

3. To prepare the cluster for these validation tests, choose one node on the cluster to be

your set up master node. Generate ssh keys on this node, and make sure that it has

passwordless ssh access to all other nodes on the cluster. You can find steps for how todo this in your lab guide at the end of this guide.

4. Inside the pre-install directory is a clustershell rpm. Install this rpm on the master node

with passwordless ssh access to the rest of our cluster. We will be making all further

commands for this exercise from this master node, using clush to propagate those

commands throughout the rest of our hardware.

5. Once installed, update the file to include an entry for all:

/etc/clustershell/groups

then the host names for the nodes we will use, such as:

all: node[0-19]

6. Once we have our node names listed, type the following to copy the /root/pre-install

directory to all of our node hardware.:

# clush -a --copy /root/pre-install

7. When that is complete, type to confirm that all of the nodes have a copy of the package:

# clush -Ba ls /root/pre-install

8. After we have a copy of the pre-install package on all nodes, we are ready to start our

hardware validation tests. First, we will run an audit of our hardware to see exactly

what we have on each node, and to verify that they all have a similar configuration. to

run the cluster-audit.sh script, type:

/root/pre-install/cluster-audit.sh | tee cluster-audit.log

This will list hardware specifications from each of the new nodes.

We can examine the output log to look for hardware or software that does not match the

requirements to install Hadoop, or discrepancies in the hardware or software from one node to

the next.

PROPRIETARY AND CONFIDENTIAL INFORMATION 20



28/107


Note: that the audit output will give us deltas when looking at things like the RAM. It will tell us

the total about of RAM, number of slots and then the types of DIMMs found, but it will not tell

us which exact DIMMs are in which slots. Also, if only one DIMM type is listed, then all slots

have the same DIMM type

Lab 1.2: Network, Memory and IO

1. Evaluate the network interconnect bandwidth.

Inside the pre-install directory, update the network-test.shfile so that the half1

and half2 arrays contain the correct IP addresses for our hardware nodes. Next delete

the exit command, and save the file.

2. When the file has been updated, type:

# / r oot / pr e- i nst al l / net wor k- t est . sh | t ee net wor k- t est . l og

This will run RPC test to validate our network bandwidth. This test should take about 2

minutes to run, maybe a little longer.

We should expect to see results of about 90% of our peak bandwidth. Thus, with a

1GbE network, we should expect to see results of about 115MB/sec, or with a 10GbE

network, look for results around 1100MB/sec. If we are not seeing results in this range,

then we need to check with our network administrators to verify the connections and

firmware.

3. Next, we will evaluate the raw memory performance. Type to run the stream59 utility:

# cl ush - Ba ' / r oot / pr e- i nst al l / memor y- t est . sh | gr epTr i ad' | t ee memor y- t est . l og

This tests the memory performance of the cluster. The exact bandwidth of memory is

highly variable and is dependent on the speed of the DIMMs, the number of memory

channels and to a lesser degree, the CPU frequency.

4. Evaluate the raw disk performance. Thedisk-test.shscript will run IOzone on our

hard drives to test their performance.

Note: This process is destructive to any existing data, so make sure the drives do not have any

needed data on them, and that you do not run this test after you have installed MapR Hadoop

on the cluster.




29/107


Type:

# clush -ab /root/pre-install/disk-test.sh

When you first run this script, it will list out the spindles to be tested. We need to verify

that this list is correct, and then edit the script to run the test.

The comments in the script will direct us to the edits that we need to make. When we are done,

we save the file and run the script again to perform the test.

If we have a large number of total drives, the summIOzone.sh script will provide us with a

summary of the disk-test.sh output.

We will keep the results of this test with the other benchmark tests for post installation

comparison.

Conclusion

Now that we have run all of our hardware tests, and compiled benchmarks for all of our

components, we have one final task to prepare our new hardware for installation.

The firmware for the new hardware must be up to date with vendor specifications and match

across each of the nodes of the same type. The BIOS versions and settings must also match for

similar nodes. In addition, the firmware for the management interfaces needs to be the same

on each of these nodes. Any other hardware components that we may have in our system, such

as NICs or onboard RAID controllers also need to have updated and matching firmware.

We will need to refer to the manual for each node vendor that we are including, and update the

firmware and BIOS according to their specifications. If there is a discrepancy in our BIOS or

firmware between nodes from the same vendor, then we can see inconsistent performance

across nodes.




30/107

Lesson 2: Install MapR software

Lab Overview

!" $%&' ()(*+&'( ,-. /&00 &"'$100 1 2134 +0.'$(*5 !$ &' 10'- &63-*$1"$ $- +-"'&7(* %-/ 61", -8 (1+%

'(*9&+(' /&00 :( *.""&"; -" $%( ("$&*( +0.'$(* $- ("'.*( $%1$ ,-. %19( 1 *-:.'$

%& B-; &"$- $%( 61'$(* "-7( -8 ,-.* +0.'$(* 1' 7('+*&:(7 1:-9(C -* 1' 7('+*&:(7 :, ,-.*

&"'$*.+$-*5

'& D19&;1$( $- $%( E%-6(E613* 7&*(+$-*,>

$ cd /home/mapr

(& F-/"0-17 $%( 613*G'($.3 31+@1;(>$ wget http://package.mapr.com/releases/v.3.1.1//mapr-setup

)& F-/"0-17 $%( 3(6 @(, $- $%( 61'$(* "-7( &" ,-. +0.'$(*

*& H($ $%( 3(*6&''&-"' -" $%( '($.3G613* 8&0( 1"7 3(6 @(,>

$ chmod 755 mapr-setup

$ chmod 600

+& 4." $%( 613*G'($.3 '+*&3$5 D-$(> $%&' '+*&3$ /&00 +*(1$( E-3$E613*G&"'$100(* 7&*(+$-*, 1"7

177&$&-"10 '.:7&*(+$-*&('

$ sudo ./mapr-setup

===============================================

Self Extracting Installer for MapR Installation

===============================================

Extracting installer.......

Copying setup files to "/opt/mapr-installer"......

Installed to "/opt/mapr-installer"

====================================


31/107


I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD OP

QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75

Run "/opt/mapr-installer/bin/install" as super user, to

begin install process

[root@ip-10-170-125-38 ec2-user]#

,& ?-3, $%( '$.7("$'RUSUPRSP53(6 @(, $- $%( E -3$E613*G&"'$100(*E:&" 7&*(+$-*,>

$ mv /opt/mapr-installer/bin

-& !8 ,-. 1*( .'&"; 1 +-"8&; 8&0( /&$% $%( &"'$100(*C (7&$ $%( +-"8&;5()1630( 8&0( $- '3(+&8, $%(

+-"$*-0 "-7(' 1"7 71$1 "-7(' &"8-*61$&-"5

$ vi config.example

L%&' &"8-*61$&-" +1" 10'- :( &"3.$ /%(" *.""&"; $%( &"'$100(*C &8 "-$ .'&"; 1 +-"8&; 8&0(5

=77&$&-"10 &"8-*61$&-" +1" :( '3(+&8&(7 &" $%( +-"8&; 8&0( 1' /(00C &"+0.7&";>

o F&'@ .'(75 !"#$> 8-* =61V-"C 7&'@ 1*( $%( 8-00-/&"; /dev/xvdf,

/dev/xvdgo 2,'A0 71$1:1'( &"8-*61$&-" W'(( ,-. &"'$*.+$-* 8-* !I 177*(''X

o 4(3-'&$-*&(' W+1" :( 0-+10X

o Y(*'&-"

o H(+.*&$,

o 2U

o ?0.'$(*"16(

o K$+5

...............................................................

Z K1+% D-7( '(+$&-" +1" '3(+&8, "-7(' &" $%( 8-00-/&"; 8-*61$

Z D-7(> 7&'@SC 7&'@PC 7&'@OZ H3(+&8,&"; 7&'@' &' -3$&-"105 !" /%&+% +1'( $%( 7(81.0$ 7&'@ &"8-*61$&-"

Z 8*-6 $%( F(81.0$ '(+$&-" /&00 :( 3&+@(7 .3

[?-"$*-0\D-7(']

^&3GSRGSUSG_`GSU_a > E7(9E)978C E7(9E)97;

^&3GSRGSUSGO_GSbba > E7(9E)978C E7(9E)97;

^&3GSRGSUTGS`GSb`a > E7(9E)978C E7(9E)97;

[F1$1\D-7(']

^&3GSRGSUSGPOGPPba > E7(9E)978C E7(9E)97;

^&3GSRGSURGSS`GSPUa E7(9E)978C E7(9E)97;

^&3GSRGSUTGPOGTSa > E7(9E)978C E7(9E)97;

[?0&("$\D-7(']

Z?S

Z?P

[J3$&-"']


32/107


I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD OO

QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75

2134(7.+( c $*.(

M=4D c 810'(

613*G&"'$10053, [G%] [G'] [Gf HfFJ\fHK4] [G. 4K2JLK\fHK4]


33/107


I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD OT

QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75

[GG3*&91$(G@(, I4!Y=LK\jKM\N!BK] [G@] [Gj]

[GG'@&3G+%(+@'] [GGA.&($] [GG+8; ?Ng\BJ?=L!JD]

[GG7(:.;] [GG31''/-*7 4K2JLK\I=HH]

[GG'.7-G31''/-*7 HfFJ\I=HH]

k"(/C177l 555

3-'&$&-"10 1*;.6("$'>

k"(/C177l

"(/ H$1*$ "(/ !"'$1001$&-"

177 =77 $- 1" ()&'$&"; !"'$1001$&-"

-3$&-"10 1*;.6("$'>

GG+8; ?Ng\BJ?=L!JD +-"8&; 8&0( $- .'(*

GG7(:.; *." &"'$100(* &" 7(:.; 6-7(

GG31''/-*7 4K2JLK\I=HH

*(6-$( ''% .'(* 31''/-*7

GG3*&91$(G@(, I4!Y=LK\jKM\N!BK

.'( $%&' 8&0( $- 1.$%("$&+1$( $%( +-""(+$&-"

GGA.&($ *." &"'$100(* &" "-"G&"$(*1+$&9( 6-7(

GG'@&3G+%(+@' '@&3 3*(G+%(+@' WF=DgK4JfHX

GG'.7-G31''/-*7 HfFJ\I=HH

'.7- .'(* 31''/-*7GjC GG1'@G'.7-G31'' 1'@ 8-* '.7- 31''/-*7

Gf HfFJ\fHK4C GG'.7-G.'(* HfFJ\fHK4

7('&*(7 '.7- .'(* W7(81.0$c*--$X

G%C GG%(03 '%-/ $%&' %(03 6(''1;( 1"7 ()&$

G@C GG1'@G31'' 1'@ 8-* HH< 31''/-*7

G'C GG'.7- *." -3(*1$&-"' /&$% '.7- W"-31''/7X

G. 4K2JLK\fHK4C GG.'(* 4K2JLK\fHK4

0& =5 !8 ,-. 1*( "-$ .'&"; 1 +-"8&; 8&0(C *." $%( &"'$100(*>

$ sudo /opt/mapr-installer/bin/install K s private-key

-u ec2-user U root debug new

1"7 8&00 &" $%( +0.'$(* 7($1&0' /%(" 3*-63$(7 0&'$(7 1:-9(5


34/107


I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD O_

QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75

12

d5 !8 ,-. 1*( .'&"; 1 +-"8&; 8&0(C *." $%( &"'$100(* $- 7($(*6&"( &8 $%( 31*16($(*' ,-. %19(

'3(+&8&(7 1*( +-**(+$5

$ sudo /opt/mapr-installer/bin/install K s --cfg

config.example

--private-key -u ec2-user -U root --debug new

%3& !" $%( '.661*, *('3-"'( 1*(1 +%--'( W1X:-*$ 18$(* ()16&"&"; ,-.* 31*16($(*'5

%%&4(*." $%( &"'$100(* /&$% $%( Gquiet1*.;.(6("$ 8-* "-"G&"$(*1+$&9( 6-7( 1"7 $%( &

$- :1+@;*-."7 $%( &"'$100(* &" +1'( $%( /&"7-/ &' 0-'$ -* $%( 013$-3 ;-(' $- %&:(*"1$(

6-7(5 L%&' $&6(C '(0(+$ W+X $- +-"$&".( /&$% $%( &"'$100 18$(* *(9&(/&"; $%( 31*16($(*'5

A. $ sudo /opt/mapr-installer/bin/install K s private-

key -u ec2-user U root debug --quiet new &

OR

B. $sudo /opt/mapr-installer/bin/install --cfg

config.example

--private-key students07172012.pem -u ec2-user -s -U root -

-debug --quiet new &

D-$(> Y&(/ 7($1&0' 1:-.$ &"'$100&"; -" 1" JH -$%(* $%1" 4(7EE///5613*5+-6E7-+E7&'301,E2134E!"'$100&";m2134mH-8$/1*(

L%( 176&"&'$*1$&9( .'(* /%- '%-.07 :( ;&9(" 8.00 3(*6&''&-" &' n613*o 1"7 $%( .'(*

31''/-*7 &' n613*o5

e%(" *(;&'$(*&"; ,-.* +0.'$(* '(0(+$ 1" 2U L*&10 0&+("'(5 =0'-C :( '.*( $- 1330, ,-.*

2U 0&+("'( :(8-*( ,-. +0-'( $%( B&+("'( 21"1;(6("$ 7&10-;5

%'&e1$+% $%( &"'$1001$&-" 3*-+('' 1"7 0--@ 8-* $%( 91*&-.' 31+@1;(' :(&"; &"'$100(75 =8$(*

$%( +-"$*-0 "-7(' %19( :((" &"'$100(7 W.'.100, PRGOR6&"X 0-; &"$- $%( 2?H :, 3-&"$ ,-.*

:*-/'(* $- $%( !I 177*('' -8 -"( -8 $%( +-"$*-0 "-7('C 1$ 3-*$ `TTO>

http://ControlNodeIP:8443/

%(&=++(3$ $%( 2134 1;*((6("$C 1"7 '(0(+$ $%( 0&+("'(' 0&"@ &" $%( .33(* *&;%$ +-*"(*5


35/107


I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD Oh

QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75

%)&=330, $%( $(63-*1*, 2U 0&+("'( *(+(&9(7 /%(" *(;&'$(*&"; 8-* $%( +-.*'(5 !8 ,-. 7- "-$

%19( 1 $(63-*1*, 0&+("'(C +-"$1+$ $*1&"&";i613*5+-6-* 1'@ ,-.* &"'$*.+$-* &8 ,-. 1*(

$1@&"; 1 +01''*--6 -* 9&*$.10 $*1&"&"; +01''5

%*&=8$(* ,-. %19( '.++(''8.00, 1330&(7 1 $*&10 0&+("'( ,-. 61, "-$&+( $%1$ '-6( -8 $%( "-7('

&" $%( +0.'$(* %19( -*1";( &+-"' &" $%( %(1$613 &"7&+1$&"; $%1$ $%(, %19( 7(;*17(7

'(*9&+(5

%+&=' $%( &"'$100(* +-"$&".(' $- &"'$100 31+@1;('C 1"7 $%( /1*7(" '(*9&+( '$1*$ $%( '(*9&+('

-" (1+% "-7(C /( /&00 :(;&" $- '(( $%( "-7(' $.*" ;*(("5 K9("$.100, 100 -8 $%( "-7(' /&00

:( ;*(("C &"7&+1$&"; $%1$ 100 "-7(' 1*( 1+$&9( 1"7 %(10$%,5

Conclusion

I01" ,-.* '(*9&+( 01,-.$ 3*&-* $- &"'$100&"; $%( 2134 '-8$/1*(

o 21@( '.*( $%1$ ,-. %19( &7("$&8&(7 /%(*( $%( @(, 61"1;(6("$ '(*9&+(' W?BFdC

p--@((3(*C q-:L*1+@(*C e(:'(*9(*X /&00 :( *.""&"; &" $%( +0.'$(*

o K"'.*( $%1$ ,-. %19( ("-.;% &"'$1"+(' -8 $%( 61"1;(6("$ '(*9&+(' $- 61&"$1&" $%(

0(9(0 -8 '(*9&+( $%1$ &' 133*-3*&1$( 8-* ,-.* -*;1"&V1$&-"

N-00-/ $%( 3*-+(7.*(' -.$0&"(7 &" $%( 2134 7-+.6("$1$&-" ."7(* !"'$1001$&-" g.&7(

o %$$3>EE613*5+-6E7-+E7&'301,E2134E!"'$1001$&-"mg.&7(

f'( $%( 2?H $- 9(*&8, $%1$ $%( +0.'$(* &"'$1001$&-" &' +-630($( 1"7 $%1$ $%( +0.'$(* &' "-/

1+$&9(

Discussion

S5 J"+( ,-. '(( $%1$ $%( +0.'$(* &' 1+$&9(C $*, ()30-*&"; $%( 2?H :, +0&+@&"; -" $%( 7&88(*("$

0&"@' &" $%( D19&;1$&-" 31"( 1"7 -" $%( F1'%:-1*75 e%1$ /&00 ,-. :( 1:0( $- 6-"&$-*

-"+( ,-. :(;&" $- .'( ,-.* +0.'$(*r

P5 e%1$ /-.07 ,-.* "()$ '$(3 :( 18$(* &"'$100&"; $%( +0.'$(*r


36/107

Lesson 3: Post-install

Lab Overview

If you remember, the package that we downloaded in our pre-install lesson contained a post-

install directory. That directory contains all of the tools and scripts that we need to run post

install benchmarks to make sure our new cluster is performing as expected.

First, we will test the drive throughput. As with our pre-install tests, we will use clush to push

this test to all of the nodes on our cluster.

Lab Procedures

3.1 Run RWSpeedTest1. Log into the master node that we used for our pre-install tests and navigate to the

directory /root/post-install. In here we will find the file runRWSpeedTest.sh.

2. Note: This script uses an HDFS API to stress test the io subsystem. The output provides

an estimate of the maximum throughput the io subsystem can deliver. To begin the test,

type:

# clush -Ba /root/post-install/runRWSpeedTest.sh | tee

RWSpeedTest.log

3. After we run our RWSpeed, we can compare our results to our pre-install IOzone tests.We should expect to see similar results, within 10-15% of the pre-installation test.

3.2 TeraGen/TeraSort

Teragenis a map/reduce program that will generate 1GB of synthetic data, and Terasort

samples this data and uses map/reduce to sort it into a total order. These two tests together

will challenge the upper limits of our clusters performance

1. Type:

# maprcli volume create -name data1 replication 1 mount 1

path /root/data1

# mkdir data1/out1

# mkdir data1/out2

2. Verify that the new directories exist, then type:

hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-

dev-examples.jar teragen 10000000 /data1/out1


37/107


3. This will create 1TB worth of small number data. Once teragen has finished then type to

sort the newly created data:


dev-examples.jar terasort /data1/out1 /data1/out2

When we are running Terasort, we can use the MCS to watch the node usage. When we set the

heatmap to show Disk Usage, we can see the load on each node. We are looking for the load

to be spread evenly across our cluster. Hotspots suggest a problem with a hard drive or its

controller. We can change the view of our heatmap to look at the load of different resources of

our cluster as we run our tests.

In addition to the heatmap views, we can look at the services and jobs. Since we are using

synthetic code, we know that it functions properly. If we have a job or task failure, then we

have an issue with our hardware.

When Terasort is finished, we can compare the results with our RWSpeedTest results. We

should expect to see our Terasort throughput to be between 50% to 70% of our RWSpeedTest

throughput. Since we know the Terasort job code does not have any errors, if we see

performance that doesnt match our expectations, we know we have a problem with the

hardware in our cluster.




38/107

Lesson 4: Configure Cluster Storage

Resources

Lab Overview

The labs in this chapter cover all the basics of cluster storage resources, including:

Topology and Storage Architecture:

o the physical layer, including nodes, disks & storage pools

o the logical layer, including files, chunks, containers

Volumes, including with mirrors, snapshots and remote mirrors

These labs provide insight into how data is managed in a MapR cluster, and teach hands-on

experience configuring topologies, volumes and quotas. You have a great degree of control over

your organizations MapR storage resources. Configuring a cluster with appropriate topologies

and volumes has long-term impacts on performance, reliability and ease-of-management. This

lab is broken into three separate exercises that build on each other.

Lab Procedures

Always set up node topology before deploying cluster. Never leave nodes in /data/default-

rack

Key Tips:

Create volumes to contain different types of data on the cluster before deploying the

cluster. (E.g., create one volume per user, one volume per project, distinct volumes for

production work and development work, etc.) Dont let data accumulate at the root

level of the cluster.

MapR separates the concepts of volume ownership and quota accounting. Project

members can have full ownership of files and folders for a project, while the collective

storage for the whole project is restricted by a quota independent of individual users.

Rack Layout

In this training lab environment, our physical rack layout is hypothetical. If you were

configuring node topology in a physical cluster environment, then you would coordinate with

the team responsible for the physical setup of the cluster to build a diagram of the physical rack

layout. For this lab, lets assume our clusters nodes are contained in two racks.


39/107


Note: If applicable, you may need to coordinate your activities on your Team# cluster with the

other members of your team.

Lab 4.1: Configure Node Topology

The first step in getting a cluster ready for data storage is to set up the node topology. Node

topology describes the logical organization of the cluster. Grouping nodes into proximity-based

topologies, i.e. racks, helps to distribute data across physical failure domains, thus decreasing

the probability of data loss. It is also important to define higher-level logical topologies, typicallynamed / dat aand / decommi si oned, which serve as staging areas for nodes when

transitioning into and out of service.

/

data/

rack1/

r1_node1

r1_node2

r1_nodeN

rack2/

r2_node1

r2_node2

r2_nodeN

decomissioned/




40/107


41/107


5. Set the default physical topology using the CLI. You can change the default topology,

such that any new node added to the cluster will appear in the specified topology. In

this step, you are going to change the default topology to / dat a.

a. Open a SSH session with a node in the cluster.

b. Type the following command at a command line.

maprcli config load json | grep default

c. Notice the default topology.

d. To change it you would do the following:

maprcli config save -values

'{"cldb.default.volume.topology":"/data"}'

6. Verify that all nodes are assigned to a physical topology.

a. In the MCS Navigation pane under the Cluster group, click Nodes.

b. Look at the Topology pane and confirm that each node in the cluster appears in a

specific rack, and that no nodes remain under /def aul t - r ack.




42/107


Lab 4.2: Create Volumes and Set Quotas

In this lab exercise you will learn how to manage a MapR cluster in a shared environment.

Imagine that your cluster is going to be shared by up to 5 different groups each with multiple

users working on development and production projects. You need to manage the resources ofthe cluster so all of these groups can work simultaneously without consuming more than their

share of storage and compute resources. You also need to make sure that development

projects do not impinge upon production work.

In this exercise you will create independent volumes for each user and project, and then you will

impose quotas on those volumes.

Important!

Dont store data in the root volume (/).

If all data is in the root volume, you lose the ability to specify location, quota, or HA properties

for different types of data.

As soon as you set up your cluster, start creating volumes to organize data on the cluster. As this

lab will demonstrate, MapR recommends that you create at least the following volumes:

1. Create a separate volume for each user.

2. For active projects, create separate volumes for development work and production

activity.

Note: In order for a MapR cluster to function correctly, the user accounts and groups must be

set up identically across all nodes.




43/107


Lab 4.2 Overview

The diagram below illustrates the key concepts of this exercise. In this case user01 and user02

are in the Log Analysis Development group (loganalysis_dev). Each of these users has

permission to read and write data to the project volume as well as their own user volume. The

cumulative storage used by these volumes rolls up to a group referred to as an AccountingEntity. Each user, volume and Accounting Entity can have a separate disk quota for flexible

management of cluster disk usage.




44/107


Lab 4.2 Set-up

1. Set up the users and groups on all your cluster nodes. Note: they must all have the

same UID and GID on every node in the cluster. This is an opportunity to use the clush

utility if you wish.

# yum install clustershell

For example, run groupadd on every node in your cluster.

# groupadd -g 5000 loganalysis_dev

Add individual users on every node.

# useradd u 5001 g loganalysis_dev user17

Or

# clush -a groupadd -g 8000

# clush -a useradd -u 8001 -g 8000

2. Add the user to the MCS to the MCS permissions popup.




45/107


Name Username/loginID Groupname Teamname/Clusterna

me

user01 webcrawl_dev Team1

user02 webcrawl_dev Team1

user03 webcrawl_prod Team1

user04 webcrawl_prod Team1

user05 frauddetect_dev Team2

user06 frauddetect_dev Team2

user07 frauddetect_prod Team2

user08 frauddetect_prod Team2

user09 recommendations_dev Team3

user10 recommendations_dev Team3

user11 recommendations_prod Team3

user12 recommendations_prod Team3

user13 twittersentiment_dev Team4

user14 twittersentiment_dev Team4

user15 twittersentiment_prod Team4

user16 twittersentiment_prod Team4

user17 loganalysis_dev Team5

user18 loganalysis_dev Team5

user19 loganalysis_prod Team5

user20 loganalysis_prod Team5




46/107


Lab 4.2 Steps

Examine the volumes already on the cluster

1. Connect to the MCS for your cluster

2. Click Volumesunder MapR-FS in the left navigation pane:

Notice how many volumes are listed. Do these include systems volumes? Hint: noticewhether or not the Systems check box is selected on upper menu.

Display only the non-system volumes by de-selecting the System check box on

the upper menu.

Locate the New Volume button that lets you create a new volume.

What other volume actions are allowed in the Volume Actions modify volume

menu?

Examine volume properties from the volumes list

1. From the list of volumes, choose a volume to examine.

Look across the columns to find whether the volume of interest contains data, and if

so, what is the data size?

What is the replication factor listed for the volume you are examining?

2. Find more details for this volume on the Volumes Properties pane. Hint: Open the pane

by clicking the highlighted name of the volume.

What is the minimum replication factor for this volume?

Does the volume have a quota?




47/107


Practice Creating and Removing Volumes

3. Click the New Volume button.

4. Select Standard Volume for the Volume Type in the new pop up window.

5. Enter a volume name using your name (or some other unique name) and designate

volume number 1 (e.g. name-vol1 where name is your name) in the Volume Name

field.

6. Type the mount path /name-vol1 in the Mount Path field.

Note: MapR MCS will not create any parental directories above the mount point, so make them

beforehand if necessary with the mkdir command.

7. Verify /data is displayed in the Topology field (This is the default topology; we will

discuss topology in the next lecture).

8. Verify the default replication factor and minimum replication settings. Are they set to

what was recommended in the Volumes lecture?

9. At the bottom of the popup window, click OK to create the volume.

10.Verify that your new volume appears in the volumes list. Do you see the volumes

created by the other students in the class?

(Note: If not, you will need to go to the volume name filter the top and remove the

filter by clicking the minus sign:

Repeat the process in Step 1 to create a user volume 2 for your name.

Verify that your new volume appears in the volumes list.

Once again remove the filter so that you can view the full list of non-system

volumes.




48/107


Remove one volume

1. Decide which of your own volumes you want to remove and select it by clicking the

check box by the volume name.

2. Select Remove on the modify Volume menu. You will see this dialog box:

Make your choice for what style of removal you want and click the Remove Volume

button on lower right.

Verify in the volumes list that one of your volumes has disappeared.

Create a volume for each user

In this step, you will create a home volume for all project members, if applicable. On each user

volume:

Restrict the volume to the /data/rack2 topology, which prevents users from consuming

storage resources on /data/rack1.

Assign the Accounting Entity of the user volume to the appropriate group for that user.

Assigning this Accounting Entity prevents the members of the group from collectively

overshooting a storage quota for the project.

Set quotas for the user volume.

Note: user17 and loganalysis_dev are used as examples below. Be sure to substitute the

appropriate user name and group when you create the volumes for your team members.

1. In the MCS, in the Navigationpane under the MapR-FSgroup, click Volumes.

2. In the Volumes tab click the New Volume button.

3. Following the example below, enter the volume settings for each user volume in the

New Standard Volume dialog box.

Volume Setup section




49/107


Volume Type: Standard Volume

Volume Name: user17-homedir

Mount Path: /mapr//home/user17/vol

Topology: /data/rack2

User/Group (This specifies the Accounting Entity)

Group loganalysis_dev

Note: group must exist on all nodes in the cluster

Permissions: u: user 17 f cUsage Tracking

o Quotas(This specifies disk quota for the volume itself)o Volume Advisory Quota: 100G

o Volume Hard Quota: 128G

4. Click OK.




50/107


Command Line

It is also possible to create a new volume at the command line. For example:

maprcli volume create -path /home/user17/vol \

-ae loganalysis_dev -aetype 1 -topology /data/rack2 \

-quota 128G -advisoryquota 100G \

-user user17:fc -name user17-homedir

Note:Themaprcli volume createcommand requires specific ordering of

arguments. Make sure that the - nameoption comes last.

You can change quotas later at the command line. For example:

maprcli volume modify -quota 20G -advisoryquota 15G \

-name user17-homedir

5. Change ownership of the volume for the user. At a command line type:

chown user17 /mapr//home/user17/

Create a volume for your t eam pro ject

In this step, you will create a volume for your team project, if applicable. Bear in mind the

following criteria for your project volume

Restrict development volumes to the /data/rack2 topology, which prevents development

projects from consuming storage resources on /rack1.

Production volumes should be allowed to span the entire cluster, so they will have a

topology of /data

Set group permissions on each volume:

For developmentvolumes, members of both prod and dev groups get full control

Forproductionvolumes, only members of the prod group get full control

Assign your group as the Accounting Entity

Set quotas for the project volume:

Development volumes Advisory Quota is 9T and the Hard Quota is 10T

Productionvolumes Advisory Quota is 19T and the Hard Quota is 20T

Note: loganalysis_dev is used in the examples below. Be sure to substitute the appropriate user

name and group when you create the volumes for your project.




51/107


1. Create the top-level project directory under /mapr//home/, if it

doesnt exist. For example, at a command line type:

mkdir /mapr//home//

2. In the MCS, in the Navigation pane under the MapR-FS group, click Volumes.

3. Create the project volume. In the Volumes tab click the New Volume button.

4. Following the example below, enter the volume settings for the project volume in the

New Standard Volume dialog box.

Volume Setup section

Volume Type: Standard Volume

Volume Name: loganalysis-dev

MountPath: /mapr/// home/ l oganal ysi s_dev/ vol

Note: the example below is for a development group volume. If you are creating a

volume for a production group then the topology would be / dat a

Topology: / dat a/ r ack2

Permissions section

Note: the example below is for a development group volume. If you are creating a

volume for a production group, do not add permissions for the development group.

g:loganalysis_dev fc

g:loganalysis_prod fc

Usage Tracking

User/Group (This specifies the Accounting Entity)

Group loganalysis_dev

Quotas (This specifies disk quota for the volume itself)

Note: the examples below are for a development group volume. If you are creating a

volume for a production group the Advisory Quota is 19T and the Hard Quota is 20T.

Volume Advisory Quota: 9T

Volume Hard Quota: 10T

5. Click OK.

6. Change ownership and permissions of the project volume. At a command line type:




52/107


chgrp loganalysis_dev

/mapr//home/loganalysis_dev/vol

chmod g+rwx /mapr//home/loganalysis_dev/vol

Verify that the volumes are set up correctly

1. In the MCS, in the Navigation pane under the MapR-FS group, click Volumes. The

Volumes view appears, listing all volumes in the cluster.

2. Confirm that all of the volumes you created are listed in the Volumes view. Other

volumes that are part of the default cluster configuration may also appear here. You can

use the Filter option to list, for example, only the volumes with a mount path matching

/home*, as shown below.

3. Navigate the volumes at the command line and verify that they have been mounted. For

example:

ls -al /mapr//home/

ls -al /mapr//home/loganalysis_dev/vol

You should see the volumes you just created in the previous steps mounted in these

locations.




53/107


Set disk usage quotas for your project Accounting Entity

By setting a quota on an Accounting Entity, we can make sure that all volumes assigned to the

Accounting Entity (including user volumes and project volumes) do not collectively overshoot a

project maximum.

1. In the MCS, in the Navigationpane under the MapR-FSgroup, click User Disk Usage.

The User Disk Usage panel displays all users and groups that have been assigned as an

Accounting Entity (e.g. loganalysis_dev).

2. Click on your project Accounting Entity. The Group Properties dialog box appears.

3. Following the example below, enter the quota settings for your project Accounting

Entity in the Usage Trackingsection of the Group Propertiesdialog box.

For development projects:

Turn on User/Group Advisory Quota. Enter 9T

Turn on User/Group Hard Quota. Enter 10T

For production projects:

Turn on User/Group Advisory Quota. Enter 19T

Turn on User/Group Hard Quota. Enter 20T




54/107


Command Line

It is also possible to set the Accounting Entity quotas at the command line. For example:

maprcli entity modify -quota 10T -advisoryquota 9T \

-name loganalysis_dev -type 1

Conclusion

Before you begin adding data to your cluster or submitting jobs make a decision about topology

(node/data placement) and implement this decision on your cluster

Create volumes early and often. It is much easier to manage cluster data at a volume level than

managing all of the data on the cluster as one enormous data set. Imagine trying to manage

petabytes of data!

Creating separate volumes provides flexibility of resource management by separating ownership

from accounting

Do not use the / or /data/default-rack topology for data placement




55/107

Lesson 5: Data Ingestion, Access &

Availability

Labs Overview

Lesson 5 labs cover the following topics:

Accessing the cluster using NFS

Snapshots

Mirrors

Multiple Clusters and Disaster Recovery

5.1 Get Data into an NFS Cluster

Topics and tasks in this first lab will help you to

understand the significance of NFS in MapR

learn how to get data into a cluster using NFS

view and manipulate data directly on your cluster using standard Linux file commands via

NFS

Before you begin the lab steps, the cluster filesystem must be mounted on the data instance.

Create Input Directory for Data

Copy Data from Data Instance to Input Directory on Cluster

1. SSH to the data instance (NFS node for exercise and contained in your hosts file )

mkdir /mapr/

2. Mount you cluster to the NFS client node

# mount t nfs :/mapr /mapr/

3. Copy the data from the /etc directory on the data instance to the input directory on

your project volume that you created in the previous step

cp v /etc/*.conf

/mapr//home/loganalysis_dev/input

4. Verify that the data is now on in the input directory on your cluster volume


56/107


ls /mapr//home/loganalysis_dev/input

You should see a collection of files that end in .conf

5. Verify that the data you moved from the data instance is now on the cluster in your

project volume

l s / mapr / / home/ l oganal ysi s_dev/ i nput

Run a MapReduce Job on Data

1. Run a MapReduce job on the data

hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2- \

dev-examples.jar wordcount \

2. View the output of the MapReduce job

ls /mapr//home/loganalysis_dev/output1

Modify Data and Run MapReduce again

1. From the/home//input

Use `sed` to add some files to your input data directory

for i in `ls *`; do cp $i `echo $i | sed"s/.conf/AA.conf/g"`; done

2. Re-run the same MapReduce job on the data sending the output to a new directory


dev- \examples.jar wordcount /home//dev/input

\ /home//dev/output2

Compare Results f rom Both MapReduce Jobs

1. Compare the output from the MapReduce jobs

diff \ /mapr/my.cluster.com/home//output1/

\ part-r-00000 \

/mapr/my.cluster.com/home//output2/ \

part-r-00000 \

You should see the change you made in the previous step




57/107


Conclusion

In this lab you experienced copying data from an external data source to the cluster storage via

NFS. You were able to do so with standard Linux file commands that are familiar to system

administrators. This process would have been much more technically challenging and taken a

significantly longer time to perform without NFS.

Lab 5.2: SnapshotsExplore how snapshots work by creating snapshots at various points in time of a volume

containing changing data to see that each snapshot shows data from a fixed point in time. Also

see that snapshot creation is almost instantaneous and that the snapshot can preserve data that

has since been deleted or changed. Learn to apply a schedule so that snapshots are

automatically created at fixed intervals. Schedules also allow snapshots to expire at a time you

designate. This lad has 4 exercises:

Create a snapshot in two ways

Show how snapshots capture frozen views of past state

Show snapshots preserve deleted data

Create a snapshot schedule from the MCS

Create snapshot in two ways

This exercise will create a snapshot of a volume at a particular point in time, using two different

methods for making a snapshot.

Preparation: Before starting this exercise, you should have created a volume for your

experiments and mounted it. If you havent already created such a volume, do so now using the

MCS. Make sure that your volume is different from the volumes other students are using for

this exercise to minimize confusion about who is doing what.

Note: The diff and vi you used above are standard Linux commands. Because the cluster

filesystem is mounted via NFS, any standard Linux programs that operate on text files (sed,

awk, grep, etc.) can be used with data on your cluster. This would not be possible without

NFS. You would need to copy the file out of the cluster first before performing your task andthen copy the resultant file back into the cluster.




58/107


Put some sample data into your volume

1. Use ssh to log in to a node in your cluster. Use your own user id here.

$ ssh mapr@classnode-cluster

2. Change directory to your personal volume.

$ cd /mapr//snapshot_lab_mnt_user01

3. Create a data file called STATIC in your personal user-volume containing whatever

data you choose.

$ cat /etc/hosts > STATIC

Create a volume snapshot of your volume using MCS

Use the MCS to create a snapshot, as shown here:

Select New Snapshot from the pull down menu under Modify Volume on top bar, provide a

name for your snapshot, and click OK to create a snapshot of the selected volume, in this case,

snapshot_lab_vol_user01. This will create a snapshot of the volume you have selected.




59/107


60/107


Create new data files in your volume by running a shell scrip t

.Run the following commands:

$ cd /mapr//snapshot_lab_mnt_user01

$ while true; do

touch file-$(date +%T)

date >> log; sleep 13

done &

This creates a new file every 13 seconds as this script runs in the background. The file name of

each file will contain the time the file is created. The last command will also log the time eachfile is created. This log file will look something like this:

Thu Dec 13 17: 15: 44 PST 2012

Thu Dec 13 17: 15: 57 PST 2012

Thu Dec 13 17: 16: 10 PST 2012

Thu Dec 13 17: 16: 23 PST 2012

The files created will look something like this:

$ ls

f i l e- 17: 15: 44 f i l e- 17: 16: 23 l og

f i l e- 17: 15: 57 f i l e- 17: 16: 10 STATI C

$

Create a new snapshot, wait about 30 seconds, then create another snapshot

Note the last time notation that was displayed in the original ssh window when you created

each snapshot by putting a line into the log file.

$maprcli volume snapshot create -volumesnapshot_lab_vol_user01

-snapshotname snapshot3_user01; echo snapped $(date)

>> log




61/107


Explore the snapshot directory from CLI

1. Change directory into the mount point of the volume you created the snapshots for

earlier

2. List all files and directories there using "ls -a". Note that you won't see the .snapshot

directory because it is hidden. You can see the contents of the .snapshot directory if

you explicitly give its name, but you wont see it otherwise.

Even though you don't see the .snapshot directory using l sin the volume mount point, it is still

there and you can look inside. Do this:

$ ls alh .snapshot

total 2.5K

drwxr-xr-x. 5 root root 3 Jul 16 12:58 .

drwxr-xr-x. 2 root root 2 Jul 16 12:57 ..

drwxr-xr-x. 2 root root 1 Jul 16 12:24 snapshot2_user01

drwxr-xr-x. 2 root root 2 Jul 16 12:57 snapshot3_user01

drwxr-xr-x. 2 root root 1 Jul 16 12:24

SNP_of_lab_vol_user01_ ---2013-07-16.12-31-44

You should see the snapshots that you created earlier.

Note: You can also see a list of snapshots in the MCS along with details like when they werecreated and when they will expire. You will not, however, be able to see the contents of the

snapshots from the MCS.

1. List the contents of each snapshot. You should see that more files appear in each

subsequent snapshot, like this:

$ ls .snapshot/*

. snapshot / snapshot 1:

STATI C

. snapshot / snapshot 2:

f i l e- 08: 39: 16 f i l e- 08: 39: 55 f i l e- 08: 40: 34 f i l e- 08: 41: 13

f i l e- 08: 39: 29 f i l e- 08: 40: 08 f i l e- 08: 40: 47 l og




62/107


63/107


64/107


Schedule snapshots from MSC

You will need a schedule for this next part of the lab. Schedules are independent of volumes,

snapshots or mirrors. A schedule simply expresses a policy in terms of frequency and retention

times.

Create a custom schedule

Using the MCS to create a schedule:

1. Click Schedules under MapR-FS in the navigation pane

2. Click the New Schedule button

3. Give the schedule a name (every_5 _minutes) and a rule (say every 5 minutes and

retain(expire) after 45 minutes)

4. Click the save schedule button

Note: the schedule is not currently applied to any volumes

Apply the schedule

Now you should use the MCS to apply the custom schedule a snapshot schedule for one of your

volumes:

1. Click Volumes under MapR-FS in the Navigation pane

2. Click the name of one of your volumes

3. Scroll down to the Snapshot Scheduling section




65/107


66/107


Lab 5.3: Mirrors and schedules

Gain experience with making mirrors manually via the MCS and CLI. Also learn to apply a

schedule to update data from the source volume. This lab has three parts:

Create a mirror from the MapR Control System (MCS)

Apply a schedule to the mirror

Create a mirror from CLI and initiate a mirror sync

Create a mirror from the MCS

Create a local mirror based on a source volume

1. Connect to the cluster MCS

2. Choose one of the ways to set up a mirror volume, for instance, choose volumes from

the left bar to display volumes of choice for the source volume. If possible pick a volume

containing data for the source volume so will be able to verify that it is copied to the

new mirror volume.

2. Make the selection New Volume from top menu and fill in the template to make a

local mirror (Mounting is optional )




67/107


Now you have created the mirror volume, but no data has been copied to it.

3. Verify your new mirror volume exists by selecting Mirror Volumes on left bar menu to

display names of all mirrors.

Copy data to your new mirror volume

1. Use the MCS to start mirroring by selecting this option from the Modify Volume

button drop down menu.

2. Verify that data are copied to your mirror volume by watching the display of mirror

volumes. If there is a lot of data, you will see an indication that the copying is in

progress:




68/107


Apply a schedule to the mir ror

1. Use the MCS to apply a schedule to update your mirror volume.

Create a mirror from CLI and in itiate a mirror sync

Use CLI to create a new local mirror volume of a different source volume .

1. CLI to manually create a mirror volume

CLI example:

Determine schedule ids available

#maprcli schedule list

# maprcli volume create name -

source \ < source_vol_mirror@clusterName> -type 1 schedule

2. Use CLI to initiate a mirror sync

# maprcli volume mirror start name

OR

# maprcli volume mirror push name




69/107


70/107


Set Up

1. Verify all nodes in the source cluster have for the source cluster(line 1)

and configure all nodes to be aware of the (line2)

2. SSH to the node you are configuring on the source cluster

3. Verify in / opt / mapr / conf / mapr - cl ust er s. confthat the is there

Team1

4. Add a second line in / opt / mapr / conf / mapr - cl ust er s. conffor the remote

cluster in the format:

: 7222 : 7222

cluster2 is the name of the destination cluster

and are the CLDB nodes in the destination cluster

5. Restart the Warden on your all your node(s)

service mapr-warden restart

Note: there is a small bug that requires you to add more than one remote cluster to the mapr-

clusters.conf to be visible in the GUI (3.0.2)

Configure all nodes in the destination cluster

Configure all nodes in the destination cluster with a unique name for the destination cluster

and configure all nodes to be aware of the source cluster

1. SSH to the nodes you are configuring on the destination cluster

2. Edit /opt/mapr/conf/mapr-clusters.conf and verify your

3. Add a second line in / opt / mapr / conf / mapr - cl ust er s. conffor the source

cluster in the format:

: 7222 : 7222

Teamname2 is the name of the source cluster

and are the CLDB nodes in the source cluster

4. Restart the Warden on your nodes

service mapr-warden restart

Note: the remainder of the steps should be completed by each team




71/107


Verify that each cluster has a unique name

Verify that each cluster has a unique name and is aware of the other cluster

1. Log on to the MCS of the source cluster

2. Verify that the cluster name is cluster1

3. Click the + symbol next to the cluster1

4. Verify that cluster2 is listed under Available Clusters

5. Log on to the MCS of the destination cluster

6. Verify that the cluster name is cluster2

7. Click the + symbol next to the cluster2

8. Verify that cluster1 is listed under Available Clusters




72/107


Create a remote mirror volume on the destination cluster

You should be logged into the MCS on the destination cluster

1. Select Volumes in the Navigation pane

2. Click the New Volume button

o Volume Type: Remote Mirror Volumeo Enter a unique name for your mirror volumeo Enter the name of the source volumeo Source Cluster Name: cluster1o Enter a unique mount path for the mirror volume

The parent directory must already exist

o Ensure that the Mounted checkbox is checkedo Topology: /data

3.

Click the OK button

You should see confirmation at the top of the MCS indicating that the mirror volume was

created




73/107


Initiate mirroring to the destination cluster

1. If not already selected, click Volumes in the Navigation pane

2. Select the mirror volume you created in Step 4

3. Click Volume Actions

4. Select Start Mirroring




74/107


Verify data from source cluster was copied to destination c luster

1. SSH to any node on the destination cluster

2. List the contents of the destination mirror volume

hadoop f s l s /

or, if the cluster filesystem is mounted via NFS

l s / mapr / Team2/

You should see the exact same contents in the mirror volume as you do in the original source

volume

Conclusion

In this lab you learned how to copy data from one cluster to another using remote mirroring. As youlearned earlier in this course, MapR volumes allow you a greater degree of control over how to

manage data in the cluster. Mirroring the volumes that contain your business-critical data to a

remote cluster can significantly reduce the amount of key data you would lose and the time it would

take to resume productivity in the event of a disaster.

Lab 5.5: Using the HBase shell

The objective of this lab is to get you started with HBase shell and perform operations to create

a table, put data into the table, retrieve data from the table and delete data from the table.

Start HBase shell

1. Get a help listing which demonstrates some basic commands.

a. Get help specifically on the "put" command.

2. Create a table called 'Blog' with the following schema: blog title, blog topic, author first

name, author last name. The blog title and topic must be grouped together as they will

be saved together and retrieved together. Author first and last name must also be

grouped together.

3. List the new table you created in its directory, to confirm it was created.

4. Insert the following data to the 'Blog' table.




75/107


Where Title and Topic are in column family info and First and Last are in column family author

ID Title Topic First Last

1 MapR M7 is Now Available on Amazon

EMR

cloud Diana Truman

2 Enterprise Grade Solutions for HBase highavail Roopesh Nair

3 A Comparison of NoSQL Database

Platforms

nosql Jonathan Morgan

5. Count the number of rows. Make sure every row is printed to the screen as it iscounted.

6. Retrieve the entire record with ID '2'.

7. Retrieve only the title and topic for record with ID '3'.

8. Change the last name of the author with title "A Comparison of NoSQL Database

Platforms".

Display the record to verify the change.

Display both the new and old value. Can you explain why both values are there?

9. Display all the records.

10.Display the title and last name of all the records.

11.Display the title and topic of the first two records.

12.Delete the record with title "Enterprise Grade Solutions for HBase".

Verify that the record was deleted by scanning all records, or

Try to select just that record.

13.Drop the table 'Blog'.




76/107


Create a Table us ing MapR Control System (MCS)

1. Connect to MCS from a bowser using the notes from the instructor. Login with your

account.

2. Create a table called 'Blogtest' with the following schema: blog title, blog topic, authorfirst name, author last name. The blog title and topic must be grouped together as they

will be saved together and retrieved together. Author first and last name must also be

grouped together.

3. List the new table you created in its directory, to confirm it was created.

4. Insert some data and test. Also change the number of versions of cells you can keep

and test.

MapR Tables - Solut ions

1. Use cases fit for MapR tables:

A data store comprised of petabytes of semi-structured data.

A data store that will be access by large numbers of client requests, for example

thousands of reads per second.

2. Use cases not fit for MapR tables:

Access normalized relational data with SQL

Full text search

3. Columns may be created when data is inserted, they don't have to be defined up front.

MapR can scale up to very large numbers of columns per column family. However, table

name and column family have to be defined before data is inserted.

4. In addition to using the l i s t command in the HBase shell, you can use standard Linux

l s to list all tables (and files) stored in a particular directory.




77/107


HBase Shell Solution

You can find the commands below in the file Lab1_hbase_shel l _commands. t xt .

1. Start a hbase shell in your command window

user02@ip-10-196-89-226:~$ hbase shell

2. Hbase help command

hbase> help

hbase> help "put"

3. Create a table /user/user01/Blog with column families info and author

hbase> create '/user/user01/Blog', {NAME=>'info'},

{NAME=>'author'}

Since it was required that title and topic be grouped they will be stored as columns that

belong to the 'info' column family while 'first' and 'last' will belong to the 'author'

column family.

4. List the table:

hbase> list '/user/user01/'

5. Execute the following put statements to insert the records into Blog table:

hbase>put '/user/user01/Blog','1','info:title', 'MapR M7is Now Available on Amazon EMR'

hbase>put '/user/user01/Blog','1','info:topic','cloud'

hbase>put '/user/user01/Blog','1','author:first','Diana'

hbase>put '/user/user01/Blog','1','author:last','Truman'

hbase>put '/user/user01/Blog','2','info:title','Enterprise Grade Solutions for HBase'

hbase>put '/user/user01/Blog','2','info:topic','highavail'

hbase>put '/user/user01/Blog','2','author:first','Roopesh'

hbase>put '/user/user01/Blog','2','author:last','Nair'

hbase>put '/user/user01/Blog','3','info:title', 'AComparison of NoSQL Database Platforms'

hbase>put '/user/user01/Blog','3','info:topic','nosql'




78/107


hbase>put'/user/user01/Blog','3','author:first','Jonathan'

hbase>put '/user/user01/Blog','3','author:last','Morgan'

6. Count the number of rows of data inserted

hbase> count '/user/user01/Blog',INTERVAL=>1

7. Retrieve the entire record with ID 2

hbase> get '/user/user01/Blog','2'

8. Retrieve only the title and topic for record with ID '3'.

hbase> get

'/user/user01/Blog','3',{COLUMNS=>['info:title','info:topic

']}

9. The record with title "A Comparison of NoSQL Database Platforms" has ID 3. To update

its value execute a put operation with that ID.

hbase>put '/user/user01/Blog', '3','author:last','Smith'

To verify the put worked, select the record:

hbase> get '/user/user01/Blog','3',{COLUMNS=>'author:last'}

To display both version specify the number of versions in a get operation:

hbase> get '/user/user01/Blog','3',{COLUMNS=>'author:last', VERSIONS=>3}

The reason we see the old value is cells have up to three versions by default in MapR

tables.

10.Display all the records.

hbase> scan '/user/user01/Blog'




79/107


11.Display the title and last name of all the records.

hbase> scan'/user/user01/Blog',{COLUMNS=>['info:title','author:last']}

12.Display the title and topic of the first two records.

hbase> scan '/user/user01/Blog',

{COLUMNS=>['info:title','info:topic'],LIMIT=>2}

13.The record with title "Enterprise Grade Solutions for HBase" has record ID '2'; delete all

columns for record with ID '2':

hbase> delete '/user/user01/Blog','2','info:title'

hbase> delete '/user/user01/Blog','2','info:topic'

hbase> delete '/user/user01/Blog','2','author:first'

hbase> delete '/user/user01/Blog','2','author:last'

14.To delete a table in HBase shell, the table must first be disabled, and then you can drop

it.

hbase> disable '/user/user01/Blog'

hbase> drop '/user/user01/Blog'

Troubleshooting

NameError: undefined local variable or method `interval' for #

Happens for hbase> count '/user/user02/Blog', interval=>1

Use uppercase INTERVAL example: hbase> count '/user/user02/Blog', INTERVAL=>1




80/107


HBase shell commands (optional)

The objective of this optional lab lab is to run scripts from your hbase shell. These commands

can be run individually in a hbase shell. Or they can be pasted into a script and run. Example:

hbase> source "hbase_script.txt"

1. Open a vi session and insert the following into your script.

2. Adjust all references to home directory to the appropriate directory

3. Name your script

4. Run your script

Additional commands to experiment with

# NOTE: You can copy- past e mul t i pl e l i nes at a t i me# i nt o HBase shel l . Or , you can sour ce a scr i pt .# Exampl e: hbase> sour ce "hbase_scr i pt . t xt "

# Backgr ound i nf ormat i on on HBase Shel l at :# ht t p: / / wi ki . apache. or g/ hadoop/ Hbase/ Shel l########################################################### Sol ut i on t o Lab 1# NOTE: Change t he t abl e paths t o your own user di r ect ory# so your act i ons don' t conf l i ct wi t h ot her st udent s.# Exampl e: cr eat e ' / home/ user 12/ atabl e' , {NAME=>' cf 1' }##########################################################

hel phel p "put "

cr eat e ' / home/ user01/ Bl og' , {NAME=>' i nf o' }, {NAME=>' aut hor ' }l i st ' / home/ user 01/ '

put ' / home/ user 01/ Bl og' , ' 1' , ' i nf o: t i t l e' , ' MapR M7 i s Now Avai l abl eon Amazon EMR'put ' / home/ user01/ Bl og' , ' 1' , ' i nf o: t opi c ' , ' cl oud'put ' / home/ user01/ Bl og' , ' 1' , ' aut hor : f i r st ' , ' Di ana'put ' / home/ user 01/ Bl og' , ' 1' , ' aut hor : l ast ' , ' Tr uman'

put ' / home/ user01/ Bl og' , ' 2' , ' i nf o: t i t l e' , ' Ent erpr i se Gr adeSol ut i ons f or HBase'put ' / home/ user01/ Bl og' , ' 2' , ' i nf o: t opi c ' , ' hi ghavai l 'put ' / home/ user 01/ Bl og' , ' 2' , ' aut hor : f i r st ' , ' Roopesh'put ' / home/ user01/ Bl og' , ' 2' , ' aut hor : l ast ' , ' Nai r 'put ' / home/ user 01/ Bl og' , ' 3' , ' i nf o: t i t l e' , ' A Compar i son of NoSQLDat abase Pl atf or ms'put ' / home/ user01/ Bl og' , ' 3' , ' i nf o: t opi c ' , ' nosql 'put ' / home/ user01/ Bl og' , ' 3' , ' aut hor: f i r st ' , ' J onat han'




81/107


put ' / home/ user 01/ Bl og' , ' 3' , ' aut hor : l ast ' , ' Mor gan'

count ' / home/ user01/ Bl og' , I NTERVAL=>1

get ' / home/ user 01/ Bl og' , ' 2'get ' / home/ user 01/ Bl og' , ' 3' , {COLUMNS=>[ ' i nf o: t i t l e' , ' i nf o: t opi c' ] }

put ' / home/ user01/ Bl og' , ' 3' , ' aut hor: l ast ' , ' Smi t h'

get ' / home/ user01/ Bl og' , ' 3' , {COLUMNS=>' aut hor : l ast ' }

get ' / home/ user 01/ Bl og' , ' 3' , {COLUMNS=>' aut hor : l ast ' , VERSI ONS=>3}

scan ' / home/ user 01/ Bl og'scan ' / home/ user 01/ Bl og' , {COLUMNS=>[ ' i nf o: t i t l e' , ' aut hor : l ast ' ] }scan ' / home/ user 01/ Bl og' ,{COLUMNS=>[ ' i nf o: t i t l e' , ' i nf o: t opi c' ] , LI MI T=>2}

del ete ' / home/ user01/ Bl og' , ' 2' , ' i nf o: t i t l e'del et e ' / home/ user 01/ Bl og' , ' 2' , ' i nf o: t opi c 'del et e ' / home/ user 01/ Bl og' , ' 2' , ' aut hor: f i r st 'del et e ' / home/ user 01/ Bl og' , ' 2' , ' aut hor : l ast '

#di sabl e ' / home/ user 01/ Bl og'#dr op ' / home/ user01/ Bl og'

########################################################### Addi t i onal commands t o exper i ment wi t h# NOTE: You can copy- past e mul t i pl e l i nes at a t i me# i nt o HBase shel l . Or , you can sour ce a scr i pt .# Exampl e: hbase> sour ce "hbase_scr i pt . t xt "##########################################################

# add cont ent col umn- f ami l y t o tabl eal t er ' / home/ user01/ Bl og' , {NAME=>' cont ent ' }

# i nser t r ow 1put ' / home/ user 01/ Bl og' , ' Di ana- 001' , ' i nf o: t i t l e' , ' MapR M7 i s NowAvai l abl e on Amazon EMR'put ' / home/ user 01/ Bl og' , ' Di ana- 001' , ' i nf o: aut hor ' , ' Di ana'put ' / home/ user 01/ Bl og' , ' Di ana- 001' , ' i nf o: dat e' , ' 2013. 05. 06'put ' / home/ user 01/ Bl og' , ' Di ana- 001' , ' cont ent : post ' , ' Lor em i psumdol or si t amet , consectet ur adi pi si ci ng el i t '

# i nser t r ow 2put ' / home/ user 01/ Bl og' , ' Di ana- 002' , ' i nf o: t i t l e' , ' I mpl ement i ngTi meouts wi t h Fut ureTask'put ' / home/ user 01/ Bl og' , ' Di ana- 002' , ' i nf o: aut hor ' , ' Di ana'put ' / home/ user 01/ Bl og' , ' Di ana- 002' , ' i nf o: dat e' , ' 2011. 02. 14'put ' / home/ user 01/ Bl og' , ' Di ana- 002' , ' cont ent : post ' , ' Sed utper spi ci at i s unde omni s i st e nat us er r or si t '

# i nser t r ow 3




82/107


put ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: t i t l e' , ' Ent er pr i seGr ade Sol ut i ons f or HBase'put ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: aut hor ' , ' Roopesh'put ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: dat e' , ' 2012. 10. 20'put ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' cont ent : post ' , ' At ver o eoset accusamus et i ust o odi o di gni ssi mos duci mus'

# i nser t r ow 4put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: t i t l e' , ' A Compar i sonof NoSQL Dat abase Pl at f orms'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: aut hor ' , ' J onat han'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' , ' 2013. 01. 08'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' cont ent : post ' , ' Dui s aut ei r ur e dol or i n r epr ehender i t i n vol upt at e vel i t '

# i nser t r ow 5put ' / home/ user 01/ Bl og' , ' Syl vi a- 005' , ' i nf o: t i t l e' , ' Net Beans I DE7. 3. 1 I nt r oduces J ava EE 7 Suppor t 'put ' / home/ user01/ Bl og' , ' Syl vi a- 005' , ' i nf o: aut hor' , ' Syl vi a'put ' / home/ user 01/ Bl og' , ' Syl vi a- 005' , ' i nf o: dat e' , ' 2012. 07. 20'put ' / home/ user 01/ Bl og' , ' Syl vi a- 005' , ' cont ent : post ' , ' Except eursi nt occaecat cupi dat at non pr oi dent , sunt i n cul pa'

# count t he data you i nser t ed above, I NTERVAL speci f i es how of t encount s are di spl ayedcount ' / home/ user01/ Bl og' , {I NTERVAL=>2}count ' / home/ user01/ Bl og' , {I NTERVAL=>1}

# t hi s get won' t r et ur n anyt hi ng as t he rowkey doesn' t exi stget ' / home/ user 01/ Bl og' , ' unknownRowKey'

# r et r i eve ALL col umns f or t he pr ovi ded r owkeyget ' / home/ user 01/ Bl og' , ' J onat han- 004'# r et r i eve speci f i c col umns f or t he pr ovi ded r owkeyget ' / home/ user 01/ Bl og' , ' J onat han- 004' ,{COLUMN=>[ ' i nf o: aut hor ' , ' cont ent : post ' ] }

# r et r i eve dat a f or speci f i c col umns and t i mest ampget ' / home/ user 01/ Bl og' , ' J onat han- 004' ,{COLUMN=>[ ' i nf o: aut hor ' , ' cont ent : post ' ] , TI MESTAMP=>1326061625690}

# exer ci se di f f er ent scan opt i onsscan ' / home/ user 01/ Bl og'scan ' / home/ user01/ Bl og' , {STOPROW=>' Syl vi a' }

scan ' / home/ user 01/ Bl og' , {COLUMNS=>' i nf o: t i t l e' ,STARTROW=>' Syl vi a' , STOPROW=>' J onat han' }

# update t he recor d f ew t i mes and t hen r et r i eve back mul t i pl ever si on# onl y 3 ver si ons are kept by def aul tput ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' , ' 2012. 01. 09'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' , ' 2012. 01. 10'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' , ' 2012. 01. 11'




83/107


get ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>3}get ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>2}get ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>1}

# sel ect s 1 by def aul tget ' / home/ user01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: date' }

# del et e a r ecor d, del et e al l ver si ons of t he cel lget ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: dat e'del et e ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: dat e'get ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: dat e'

# del ete t he versi ons bef ore t he pr ovi ded t i mest ampget ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>3}del et e ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' ,1326254739791get ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>3}

# dr op t he t abl el i st ' / home/ user 01/ 'di sabl e ' / home/ user 01/ Bl og'dr op ' / home/ user 01/ Bl og'l i st ' / home/ user 01/ '

Using importtsv and copytable

The objective of this lab is to get you started with HBase shell and perform operations to create

a table, import flat tab separated data into the table, retrieve data from the table and delete

data from the table.

View Existing Table Using MCS

1. Log onto the MCS.

2. Select MapR-FS =>MapR Tables

3. Click on /user/mapr/ under Recently opened tables

If /user/mapr/ is not displayed under Recently opened tables, enter

/user/mapr/ in the Go to table field and click the Go button

4. Look at the information available in the Regions tab

Each row represents one region of data

The columns (Start Key, End Key, Physical Size, Logical size, etc.) represent

meaningful data about the table regions




84/107


85/107


Note: Highlighted area is syntax for 3.1 permissions

Notice the first column is defined as HBASE_ROW_KEY, this will take the first field of data

(namely the numerical index field) and make it the row:key.

Important: also notice that command above identifies each column in the data file as well as thecolumn family it belongs in. The column family used in the example below is cf1, cf2 and cf3. If

the table you are importing into has a different column family name, then you will need to

modify the command below to match the correct column family name.

6. While the import job is processing, look at the MCS to view changes to the table and

puts being processed on the node:

Click Nodes under Cluster

Click the Overview dropdown and change the value to Performance

If necessary, scroll to the right so you can see the Gets, Puts and Scans columns.

You should see a large number of puts across several nodes while your import is

processing

Click MapR Tables under MapR-FS

Click the name of the table you used for the import under Recently opened tables

Select the Regions tab

You should see that your table automatically split into a number of regions during

the import

7. In an hbase shell examine the data that has been imported

[root@CentOS001 data2]# hbase shell

HBase Shell; enter 'help' for list of supported commands. Type "exit" to

leave the HBase Shell

Ve

Administration of Hadoop Summer 2014 Lab Guide v3.1

Documents

Transcript of Administration of Hadoop Summer 2014 Lab Guide v3.1