Administration of Hadoop Summer 2014 Lab Guide v3.1

download Administration of Hadoop Summer 2014 Lab Guide v3.1

of 107

Transcript of Administration of Hadoop Summer 2014 Lab Guide v3.1

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    1/107

    Administration of HadoopLab Guide

    Summer 2014

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    2/107

    Cluster Admin on Hadoop

    This Certified Training Services Partner Program Guide (the Program Guide) is protected under

    U.S. and international copyright laws, and is the exclusive property of MapR Technologies,

    Inc. 2014, MapR Technologies, Inc. All rights reserved.

    PROPRIETARY AND CONFIDENTIAL INFORMATION ii

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    3/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 000

    12345 /67# '89:;:B? #8?8CD8EA

    Contents

    Administration of Hadoop Lab Guide ...................................................................... i

    Get Started ............................................................................................................ 8

    F8B GB6CB8E 4H G8B I7 6 =6J 8;D0C

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    4/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 0D

    12345 /67# '89:;:B? #8?8CD8EA

    .6J 4A4H "C8]0;?B6== D6=0E6B0

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    5/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* D

    12345 /67# '89:;:B? #8?8CD8EA

    "C69B098 ,C86B0;> 6;E #8K R

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    6/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* D0

    12345 /67# '89:;:B? #8?8CD8EA

    .6J WA[H /0CC

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    7/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* D00

    12345 /67# '89:;:B? #8?8CD8EA

    .6J YA2H G8B I7 G/'" AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA ^W

    .6J YA[H /8BC09?@ /

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    8/107

    Get Started

    Get Started 1: Set up a lab environment in Amazon Web

    Services (AWS)

    ':0? ?8B I7 7C

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    9/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* ^

    12345 /67# '89:;:B? #8?8CD8EA

    (MG 7C

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    10/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 43

    12345 /67# '89:;:B? #8?8CD8EA

    EA G8=89B fK67C]8K8;B ,

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    11/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 44

    12345 /67# '89:;:B? #8?8CD8EA

    JA dG M8?B N$C8>

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    12/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 42

    12345 /67# '89:;:B? #8?8CD8EA

    JA G8=89B f(== ',"f QC8@ C8D08S Vf ?B6B8 6;E ?B6BI? 9:89T? B< 9

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    13/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4[

    12345 /67# '89:;:B? #8?8CD8EA

    #" G8=89B f(MG /6;6>8K8;B ,

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    14/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 45

    12345 /67# '89:;:B? #8?8CD8EA

    JA ?8=89B B:8 fK67C]

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    15/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4W

    12345 /67# '89:;:B? #8?8CD8EA

    $ passwd mapr

    B:8; BV78 B:8 76??S

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    16/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4Y

    12345 /67# '89:;:B? #8?8CD8EA

    '< C8?B6CB B:8 0;?B6;98?@ C8786B B:8?8 ?B87?@ 6;E ?8=89B iGB6CBj 0; ?B87 WA #8K8KJ8C@ V

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    17/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4Z

    12345 /67# '89:;:B? #8?8CD8EA

    Get Started 2: Setup passwordless ssh access between

    nodes

    M:8; B8?B0;> :6CES6C8 ; U6E

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    18/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4P

    12345 /67# '89:;:B? #8?8CD8EA

    Get Started 3: Log into the class cluster

    -

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    19/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 4^

    12345 /67# '89:;:B? #8?8CD8EA

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    20/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 23

    12345 /67# '89:;:B? #8?8CD8EA

    &" .0; 6?

    892]I?8C

    76??S

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    21/107

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    22/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 22

    12345 /67# '89:;:B? #8?8CD8EA

    Get Started 4: Explore the MapR Control System

    ':8 /67# ,

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    23/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 2[

    12345 /67# '89:;:B? #8?8CD8EA

    Lab Procedure

    Log on and explore different views of the cluster

    !" ,

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    24/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 25

    12345 /67# '89:;:B? #8?8CD8EA

    &" GB87 WA %; B:8 ;6D0>6B08 D08S QC

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    25/107

    Cluster Admin on Hadoop

    "#$"#%&'(#) (*+ ,$*-%+&*'%(. %*-$#/('%$* 2W

    12345 /67# '89:;:B? #8?8CD8EA

    ^A ,"d IB0=0L6B06B0

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    26/107

    Lesson 1: Pre-install

    Lab OverviewIn this lesson you will learn where you can download a collection of tools and scripts that we will

    use to prepare the cluster hardware for the parallel execution of tests and then test and

    measure the performance of the hardware components for our cluster to determine that they

    are functioning properly and within the specifications for Hadoop installation. we will also

    identify the current firmware for each of the new hardware components in the cluster, and

    update these components to make sure that they have matching firmware.

    Lab 1.1: Pre-install validation downloads, setup and clustershell

    Lab 1.2: Network, Memory and IO

    Lab Procedures

    Lab 1.1: Pre-install validation

    Note: One of the most common causes for a failure when installing Hadoop is that the hardware

    is not within the necessary specifications. You can see a list of the current hardware and OS

    specifications at: http://doc.mapr.com/display/MapR/Preparing+Each+Node

    The Professional Services team at MapR has developed a collection of all of the tools and scripts

    that we will need to validate our hardware and prepare it for installation.

    1. Download the cluster-validation package onto your master node from:

    https://github.com/jbenninghoff/cluster-validation/archive/master.zip

    Extract master.zip and move the pre-install and post-install folder directly under /root for

    simplicity.

    2. Here, we will find two directories, pre-installand post-install. We will use the tools and

    scripts inside the pre-install directory to validate our new hardware prior to installing

    Hadoop. We will use tools and scripts in the post-install later, to test our new cluster

    after we have completed our install.

    https://github.com/jbenninghoff/cluster-validation/archive/master.ziphttps://github.com/jbenninghoff/cluster-validation/archive/master.ziphttps://github.com/jbenninghoff/cluster-validation/archive/master.zip
  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    27/107

    Cluster Admin on Hadoop

    Note: The tools and files in this collection are updated frequently, so we should always make

    sure we download the latest package when preparing for a new Hadoop installation.

    3. To prepare the cluster for these validation tests, choose one node on the cluster to be

    your set up master node. Generate ssh keys on this node, and make sure that it has

    passwordless ssh access to all other nodes on the cluster. You can find steps for how todo this in your lab guide at the end of this guide.

    4. Inside the pre-install directory is a clustershell rpm. Install this rpm on the master node

    with passwordless ssh access to the rest of our cluster. We will be making all further

    commands for this exercise from this master node, using clush to propagate those

    commands throughout the rest of our hardware.

    5. Once installed, update the file to include an entry for all:

    /etc/clustershell/groups

    then the host names for the nodes we will use, such as:

    all: node[0-19]

    6. Once we have our node names listed, type the following to copy the /root/pre-install

    directory to all of our node hardware.:

    # clush -a --copy /root/pre-install

    7. When that is complete, type to confirm that all of the nodes have a copy of the package:

    # clush -Ba ls /root/pre-install

    8. After we have a copy of the pre-install package on all nodes, we are ready to start our

    hardware validation tests. First, we will run an audit of our hardware to see exactly

    what we have on each node, and to verify that they all have a similar configuration. to

    run the cluster-audit.sh script, type:

    /root/pre-install/cluster-audit.sh | tee cluster-audit.log

    This will list hardware specifications from each of the new nodes.

    We can examine the output log to look for hardware or software that does not match the

    requirements to install Hadoop, or discrepancies in the hardware or software from one node to

    the next.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 20

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    28/107

    Cluster Admin on Hadoop

    Note: that the audit output will give us deltas when looking at things like the RAM. It will tell us

    the total about of RAM, number of slots and then the types of DIMMs found, but it will not tell

    us which exact DIMMs are in which slots. Also, if only one DIMM type is listed, then all slots

    have the same DIMM type

    Lab 1.2: Network, Memory and IO

    1. Evaluate the network interconnect bandwidth.

    Inside the pre-install directory, update the network-test.shfile so that the half1

    and half2 arrays contain the correct IP addresses for our hardware nodes. Next delete

    the exit command, and save the file.

    2. When the file has been updated, type:

    # / r oot / pr e- i nst al l / net wor k- t est . sh | t ee net wor k- t est . l og

    This will run RPC test to validate our network bandwidth. This test should take about 2

    minutes to run, maybe a little longer.

    We should expect to see results of about 90% of our peak bandwidth. Thus, with a

    1GbE network, we should expect to see results of about 115MB/sec, or with a 10GbE

    network, look for results around 1100MB/sec. If we are not seeing results in this range,

    then we need to check with our network administrators to verify the connections and

    firmware.

    3. Next, we will evaluate the raw memory performance. Type to run the stream59 utility:

    # cl ush - Ba ' / r oot / pr e- i nst al l / memor y- t est . sh | gr epTr i ad' | t ee memor y- t est . l og

    This tests the memory performance of the cluster. The exact bandwidth of memory is

    highly variable and is dependent on the speed of the DIMMs, the number of memory

    channels and to a lesser degree, the CPU frequency.

    4. Evaluate the raw disk performance. Thedisk-test.shscript will run IOzone on our

    hard drives to test their performance.

    Note: This process is destructive to any existing data, so make sure the drives do not have any

    needed data on them, and that you do not run this test after you have installed MapR Hadoop

    on the cluster.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 21

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    29/107

    Cluster Admin on Hadoop

    Type:

    # clush -ab /root/pre-install/disk-test.sh

    When you first run this script, it will list out the spindles to be tested. We need to verify

    that this list is correct, and then edit the script to run the test.

    The comments in the script will direct us to the edits that we need to make. When we are done,

    we save the file and run the script again to perform the test.

    If we have a large number of total drives, the summIOzone.sh script will provide us with a

    summary of the disk-test.sh output.

    We will keep the results of this test with the other benchmark tests for post installation

    comparison.

    Conclusion

    Now that we have run all of our hardware tests, and compiled benchmarks for all of our

    components, we have one final task to prepare our new hardware for installation.

    The firmware for the new hardware must be up to date with vendor specifications and match

    across each of the nodes of the same type. The BIOS versions and settings must also match for

    similar nodes. In addition, the firmware for the management interfaces needs to be the same

    on each of these nodes. Any other hardware components that we may have in our system, such

    as NICs or onboard RAID controllers also need to have updated and matching firmware.

    We will need to refer to the manual for each node vendor that we are including, and update the

    firmware and BIOS according to their specifications. If there is a discrepancy in our BIOS or

    firmware between nodes from the same vendor, then we can see inconsistent performance

    across nodes.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 22

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    30/107

    Lesson 2: Install MapR software

    Lab Overview

    !" $%&' ()(*+&'( ,-. /&00 &"'$100 1 2134 +0.'$(*5 !$ &' 10'- &63-*$1"$ $- +-"'&7(* %-/ 61", -8 (1+%

    '(*9&+(' /&00 :( *.""&"; -" $%( ("$&*( +0.'$(* $- ("'.*( $%1$ ,-. %19( 1 *-:.'$

    %& B-; &"$- $%( 61'$(* "-7( -8 ,-.* +0.'$(* 1' 7('+*&:(7 1:-9(C -* 1' 7('+*&:(7 :, ,-.*

    &"'$*.+$-*5

    '& D19&;1$( $- $%( E%-6(E613* 7&*(+$-*,>

    $ cd /home/mapr

    (& F-/"0-17 $%( 613*G'($.3 31+@1;(>$ wget http://package.mapr.com/releases/v.3.1.1//mapr-setup

    )& F-/"0-17 $%( 3(6 @(, $- $%( 61'$(* "-7( &" ,-. +0.'$(*

    *& H($ $%( 3(*6&''&-"' -" $%( '($.3G613* 8&0( 1"7 3(6 @(,>

    $ chmod 755 mapr-setup

    $ chmod 600

    +& 4." $%( 613*G'($.3 '+*&3$5 D-$(> $%&' '+*&3$ /&00 +*(1$( E-3$E613*G&"'$100(* 7&*(+$-*, 1"7

    177&$&-"10 '.:7&*(+$-*&('

    $ sudo ./mapr-setup

    ===============================================

    Self Extracting Installer for MapR Installation

    ===============================================

    Extracting installer.......

    Copying setup files to "/opt/mapr-installer"......

    Installed to "/opt/mapr-installer"

    ====================================

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    31/107

    Cluster Admin on Hadoop

    I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD OP

    QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75

    Run "/opt/mapr-installer/bin/install" as super user, to

    begin install process

    [root@ip-10-170-125-38 ec2-user]#

    ,& ?-3, $%( '$.7("$'RUSUPRSP53(6 @(, $- $%( E -3$E613*G&"'$100(*E:&" 7&*(+$-*,>

    $ mv /opt/mapr-installer/bin

    -& !8 ,-. 1*( .'&"; 1 +-"8&; 8&0( /&$% $%( &"'$100(*C (7&$ $%( +-"8&;5()1630( 8&0( $- '3(+&8, $%(

    +-"$*-0 "-7(' 1"7 71$1 "-7(' &"8-*61$&-"5

    $ vi config.example

    L%&' &"8-*61$&-" +1" 10'- :( &"3.$ /%(" *.""&"; $%( &"'$100(*C &8 "-$ .'&"; 1 +-"8&; 8&0(5

    =77&$&-"10 &"8-*61$&-" +1" :( '3(+&8&(7 &" $%( +-"8&; 8&0( 1' /(00C &"+0.7&";>

    o F&'@ .'(75 !"#$> 8-* =61V-"C 7&'@ 1*( $%( 8-00-/&"; /dev/xvdf,

    /dev/xvdgo 2,'A0 71$1:1'( &"8-*61$&-" W'(( ,-. &"'$*.+$-* 8-* !I 177*(''X

    o 4(3-'&$-*&(' W+1" :( 0-+10X

    o Y(*'&-"

    o H(+.*&$,

    o 2U

    o ?0.'$(*"16(

    o K$+5

    ...............................................................

    Z K1+% D-7( '(+$&-" +1" '3(+&8, "-7(' &" $%( 8-00-/&"; 8-*61$

    Z D-7(> 7&'@SC 7&'@PC 7&'@OZ H3(+&8,&"; 7&'@' &' -3$&-"105 !" /%&+% +1'( $%( 7(81.0$ 7&'@ &"8-*61$&-"

    Z 8*-6 $%( F(81.0$ '(+$&-" /&00 :( 3&+@(7 .3

    [?-"$*-0\D-7(']

    ^&3GSRGSUSG_`GSU_a > E7(9E)978C E7(9E)97;

    ^&3GSRGSUSGO_GSbba > E7(9E)978C E7(9E)97;

    ^&3GSRGSUTGS`GSb`a > E7(9E)978C E7(9E)97;

    [F1$1\D-7(']

    ^&3GSRGSUSGPOGPPba > E7(9E)978C E7(9E)97;

    ^&3GSRGSURGSS`GSPUa E7(9E)978C E7(9E)97;

    ^&3GSRGSUTGPOGTSa > E7(9E)978C E7(9E)97;

    [?0&("$\D-7(']

    Z?S

    Z?P

    [J3$&-"']

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    32/107

    Cluster Admin on Hadoop

    I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD OO

    QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75

    2134(7.+( c $*.(

    M=4D c 810'(

    613*G&"'$10053, [G%] [G'] [Gf HfFJ\fHK4] [G. 4K2JLK\fHK4]

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    33/107

    Cluster Admin on Hadoop

    I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD OT

    QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75

    [GG3*&91$(G@(, I4!Y=LK\jKM\N!BK] [G@] [Gj]

    [GG'@&3G+%(+@'] [GGA.&($] [GG+8; ?Ng\BJ?=L!JD]

    [GG7(:.;] [GG31''/-*7 4K2JLK\I=HH]

    [GG'.7-G31''/-*7 HfFJ\I=HH]

    k"(/C177l 555

    3-'&$&-"10 1*;.6("$'>

    k"(/C177l

    "(/ H$1*$ "(/ !"'$1001$&-"

    177 =77 $- 1" ()&'$&"; !"'$1001$&-"

    -3$&-"10 1*;.6("$'>

    GG+8; ?Ng\BJ?=L!JD +-"8&; 8&0( $- .'(*

    GG7(:.; *." &"'$100(* &" 7(:.; 6-7(

    GG31''/-*7 4K2JLK\I=HH

    *(6-$( ''% .'(* 31''/-*7

    GG3*&91$(G@(, I4!Y=LK\jKM\N!BK

    .'( $%&' 8&0( $- 1.$%("$&+1$( $%( +-""(+$&-"

    GGA.&($ *." &"'$100(* &" "-"G&"$(*1+$&9( 6-7(

    GG'@&3G+%(+@' '@&3 3*(G+%(+@' WF=DgK4JfHX

    GG'.7-G31''/-*7 HfFJ\I=HH

    '.7- .'(* 31''/-*7GjC GG1'@G'.7-G31'' 1'@ 8-* '.7- 31''/-*7

    Gf HfFJ\fHK4C GG'.7-G.'(* HfFJ\fHK4

    7('&*(7 '.7- .'(* W7(81.0$c*--$X

    G%C GG%(03 '%-/ $%&' %(03 6(''1;( 1"7 ()&$

    G@C GG1'@G31'' 1'@ 8-* HH< 31''/-*7

    G'C GG'.7- *." -3(*1$&-"' /&$% '.7- W"-31''/7X

    G. 4K2JLK\fHK4C GG.'(* 4K2JLK\fHK4

    0& =5 !8 ,-. 1*( "-$ .'&"; 1 +-"8&; 8&0(C *." $%( &"'$100(*>

    $ sudo /opt/mapr-installer/bin/install K s private-key

    -u ec2-user U root debug new

    1"7 8&00 &" $%( +0.'$(* 7($1&0' /%(" 3*-63$(7 0&'$(7 1:-9(5

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    34/107

    Cluster Admin on Hadoop

    I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD O_

    QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75

    12

    d5 !8 ,-. 1*( .'&"; 1 +-"8&; 8&0(C *." $%( &"'$100(* $- 7($(*6&"( &8 $%( 31*16($(*' ,-. %19(

    '3(+&8&(7 1*( +-**(+$5

    $ sudo /opt/mapr-installer/bin/install K s --cfg

    config.example

    --private-key -u ec2-user -U root --debug new

    %3& !" $%( '.661*, *('3-"'( 1*(1 +%--'( W1X:-*$ 18$(* ()16&"&"; ,-.* 31*16($(*'5

    %%&4(*." $%( &"'$100(* /&$% $%( Gquiet1*.;.(6("$ 8-* "-"G&"$(*1+$&9( 6-7( 1"7 $%( &

    $- :1+@;*-."7 $%( &"'$100(* &" +1'( $%( /&"7-/ &' 0-'$ -* $%( 013$-3 ;-(' $- %&:(*"1$(

    6-7(5 L%&' $&6(C '(0(+$ W+X $- +-"$&".( /&$% $%( &"'$100 18$(* *(9&(/&"; $%( 31*16($(*'5

    A. $ sudo /opt/mapr-installer/bin/install K s private-

    key -u ec2-user U root debug --quiet new &

    OR

    B. $sudo /opt/mapr-installer/bin/install --cfg

    config.example

    --private-key students07172012.pem -u ec2-user -s -U root -

    -debug --quiet new &

    D-$(> Y&(/ 7($1&0' 1:-.$ &"'$100&"; -" 1" JH -$%(* $%1" 4(7EE///5613*5+-6E7-+E7&'301,E2134E!"'$100&";m2134mH-8$/1*(

    L%( 176&"&'$*1$&9( .'(* /%- '%-.07 :( ;&9(" 8.00 3(*6&''&-" &' n613*o 1"7 $%( .'(*

    31''/-*7 &' n613*o5

    e%(" *(;&'$(*&"; ,-.* +0.'$(* '(0(+$ 1" 2U L*&10 0&+("'(5 =0'-C :( '.*( $- 1330, ,-.*

    2U 0&+("'( :(8-*( ,-. +0-'( $%( B&+("'( 21"1;(6("$ 7&10-;5

    %'&e1$+% $%( &"'$1001$&-" 3*-+('' 1"7 0--@ 8-* $%( 91*&-.' 31+@1;(' :(&"; &"'$100(75 =8$(*

    $%( +-"$*-0 "-7(' %19( :((" &"'$100(7 W.'.100, PRGOR6&"X 0-; &"$- $%( 2?H :, 3-&"$ ,-.*

    :*-/'(* $- $%( !I 177*('' -8 -"( -8 $%( +-"$*-0 "-7('C 1$ 3-*$ `TTO>

    http://ControlNodeIP:8443/

    %(&=++(3$ $%( 2134 1;*((6("$C 1"7 '(0(+$ $%( 0&+("'(' 0&"@ &" $%( .33(* *&;%$ +-*"(*5

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    35/107

    Cluster Admin on Hadoop

    I4JI4!KL=4M =DF ?JDN!FKDL!=B !DNJ42=L!JD Oh

    QPRST 2134 L(+%"-0-;&('C !"+5 =00 4&;%$' 4('(*9(75

    %)&=330, $%( $(63-*1*, 2U 0&+("'( *(+(&9(7 /%(" *(;&'$(*&"; 8-* $%( +-.*'(5 !8 ,-. 7- "-$

    %19( 1 $(63-*1*, 0&+("'(C +-"$1+$ $*1&"&";i613*5+-6-* 1'@ ,-.* &"'$*.+$-* &8 ,-. 1*(

    $1@&"; 1 +01''*--6 -* 9&*$.10 $*1&"&"; +01''5

    %*&=8$(* ,-. %19( '.++(''8.00, 1330&(7 1 $*&10 0&+("'( ,-. 61, "-$&+( $%1$ '-6( -8 $%( "-7('

    &" $%( +0.'$(* %19( -*1";( &+-"' &" $%( %(1$613 &"7&+1$&"; $%1$ $%(, %19( 7(;*17(7

    '(*9&+(5

    %+&=' $%( &"'$100(* +-"$&".(' $- &"'$100 31+@1;('C 1"7 $%( /1*7(" '(*9&+( '$1*$ $%( '(*9&+('

    -" (1+% "-7(C /( /&00 :(;&" $- '(( $%( "-7(' $.*" ;*(("5 K9("$.100, 100 -8 $%( "-7(' /&00

    :( ;*(("C &"7&+1$&"; $%1$ 100 "-7(' 1*( 1+$&9( 1"7 %(10$%,5

    Conclusion

    I01" ,-.* '(*9&+( 01,-.$ 3*&-* $- &"'$100&"; $%( 2134 '-8$/1*(

    o 21@( '.*( $%1$ ,-. %19( &7("$&8&(7 /%(*( $%( @(, 61"1;(6("$ '(*9&+(' W?BFdC

    p--@((3(*C q-:L*1+@(*C e(:'(*9(*X /&00 :( *.""&"; &" $%( +0.'$(*

    o K"'.*( $%1$ ,-. %19( ("-.;% &"'$1"+(' -8 $%( 61"1;(6("$ '(*9&+(' $- 61&"$1&" $%(

    0(9(0 -8 '(*9&+( $%1$ &' 133*-3*&1$( 8-* ,-.* -*;1"&V1$&-"

    N-00-/ $%( 3*-+(7.*(' -.$0&"(7 &" $%( 2134 7-+.6("$1$&-" ."7(* !"'$1001$&-" g.&7(

    o %$$3>EE613*5+-6E7-+E7&'301,E2134E!"'$1001$&-"mg.&7(

    f'( $%( 2?H $- 9(*&8, $%1$ $%( +0.'$(* &"'$1001$&-" &' +-630($( 1"7 $%1$ $%( +0.'$(* &' "-/

    1+$&9(

    Discussion

    S5 J"+( ,-. '(( $%1$ $%( +0.'$(* &' 1+$&9(C $*, ()30-*&"; $%( 2?H :, +0&+@&"; -" $%( 7&88(*("$

    0&"@' &" $%( D19&;1$&-" 31"( 1"7 -" $%( F1'%:-1*75 e%1$ /&00 ,-. :( 1:0( $- 6-"&$-*

    -"+( ,-. :(;&" $- .'( ,-.* +0.'$(*r

    P5 e%1$ /-.07 ,-.* "()$ '$(3 :( 18$(* &"'$100&"; $%( +0.'$(*r

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    36/107

    Lesson 3: Post-install

    Lab Overview

    If you remember, the package that we downloaded in our pre-install lesson contained a post-

    install directory. That directory contains all of the tools and scripts that we need to run post

    install benchmarks to make sure our new cluster is performing as expected.

    First, we will test the drive throughput. As with our pre-install tests, we will use clush to push

    this test to all of the nodes on our cluster.

    Lab Procedures

    3.1 Run RWSpeedTest1. Log into the master node that we used for our pre-install tests and navigate to the

    directory /root/post-install. In here we will find the file runRWSpeedTest.sh.

    2. Note: This script uses an HDFS API to stress test the io subsystem. The output provides

    an estimate of the maximum throughput the io subsystem can deliver. To begin the test,

    type:

    # clush -Ba /root/post-install/runRWSpeedTest.sh | tee

    RWSpeedTest.log

    3. After we run our RWSpeed, we can compare our results to our pre-install IOzone tests.We should expect to see similar results, within 10-15% of the pre-installation test.

    3.2 TeraGen/TeraSort

    Teragenis a map/reduce program that will generate 1GB of synthetic data, and Terasort

    samples this data and uses map/reduce to sort it into a total order. These two tests together

    will challenge the upper limits of our clusters performance

    1. Type:

    # maprcli volume create -name data1 replication 1 mount 1

    path /root/data1

    # mkdir data1/out1

    # mkdir data1/out2

    2. Verify that the new directories exist, then type:

    hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-

    dev-examples.jar teragen 10000000 /data1/out1

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    37/107

    Cluster Admin on Hadoop

    3. This will create 1TB worth of small number data. Once teragen has finished then type to

    sort the newly created data:

    hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-

    dev-examples.jar terasort /data1/out1 /data1/out2

    When we are running Terasort, we can use the MCS to watch the node usage. When we set the

    heatmap to show Disk Usage, we can see the load on each node. We are looking for the load

    to be spread evenly across our cluster. Hotspots suggest a problem with a hard drive or its

    controller. We can change the view of our heatmap to look at the load of different resources of

    our cluster as we run our tests.

    In addition to the heatmap views, we can look at the services and jobs. Since we are using

    synthetic code, we know that it functions properly. If we have a job or task failure, then we

    have an issue with our hardware.

    When Terasort is finished, we can compare the results with our RWSpeedTest results. We

    should expect to see our Terasort throughput to be between 50% to 70% of our RWSpeedTest

    throughput. Since we know the Terasort job code does not have any errors, if we see

    performance that doesnt match our expectations, we know we have a problem with the

    hardware in our cluster.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 32

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    38/107

    Lesson 4: Configure Cluster Storage

    Resources

    Lab Overview

    The labs in this chapter cover all the basics of cluster storage resources, including:

    Topology and Storage Architecture:

    o the physical layer, including nodes, disks & storage pools

    o the logical layer, including files, chunks, containers

    Volumes, including with mirrors, snapshots and remote mirrors

    These labs provide insight into how data is managed in a MapR cluster, and teach hands-on

    experience configuring topologies, volumes and quotas. You have a great degree of control over

    your organizations MapR storage resources. Configuring a cluster with appropriate topologies

    and volumes has long-term impacts on performance, reliability and ease-of-management. This

    lab is broken into three separate exercises that build on each other.

    Lab Procedures

    Always set up node topology before deploying cluster. Never leave nodes in /data/default-

    rack

    Key Tips:

    Create volumes to contain different types of data on the cluster before deploying the

    cluster. (E.g., create one volume per user, one volume per project, distinct volumes for

    production work and development work, etc.) Dont let data accumulate at the root

    level of the cluster.

    MapR separates the concepts of volume ownership and quota accounting. Project

    members can have full ownership of files and folders for a project, while the collective

    storage for the whole project is restricted by a quota independent of individual users.

    Rack Layout

    In this training lab environment, our physical rack layout is hypothetical. If you were

    configuring node topology in a physical cluster environment, then you would coordinate with

    the team responsible for the physical setup of the cluster to build a diagram of the physical rack

    layout. For this lab, lets assume our clusters nodes are contained in two racks.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    39/107

    Cluster Admin on Hadoop

    Note: If applicable, you may need to coordinate your activities on your Team# cluster with the

    other members of your team.

    Lab 4.1: Configure Node Topology

    The first step in getting a cluster ready for data storage is to set up the node topology. Node

    topology describes the logical organization of the cluster. Grouping nodes into proximity-based

    topologies, i.e. racks, helps to distribute data across physical failure domains, thus decreasing

    the probability of data loss. It is also important to define higher-level logical topologies, typicallynamed / dat aand / decommi si oned, which serve as staging areas for nodes when

    transitioning into and out of service.

    /

    data/

    rack1/

    r1_node1

    r1_node2

    r1_nodeN

    rack2/

    r2_node1

    r2_node2

    r2_nodeN

    decomissioned/

    PROPRIETARY AND CONFIDENTIAL INFORMATION 34

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    40/107

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    41/107

    Cluster Admin on Hadoop

    5. Set the default physical topology using the CLI. You can change the default topology,

    such that any new node added to the cluster will appear in the specified topology. In

    this step, you are going to change the default topology to / dat a.

    a. Open a SSH session with a node in the cluster.

    b. Type the following command at a command line.

    maprcli config load json | grep default

    c. Notice the default topology.

    d. To change it you would do the following:

    maprcli config save -values

    '{"cldb.default.volume.topology":"/data"}'

    6. Verify that all nodes are assigned to a physical topology.

    a. In the MCS Navigation pane under the Cluster group, click Nodes.

    b. Look at the Topology pane and confirm that each node in the cluster appears in a

    specific rack, and that no nodes remain under /def aul t - r ack.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 36

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    42/107

    Cluster Admin on Hadoop

    Lab 4.2: Create Volumes and Set Quotas

    In this lab exercise you will learn how to manage a MapR cluster in a shared environment.

    Imagine that your cluster is going to be shared by up to 5 different groups each with multiple

    users working on development and production projects. You need to manage the resources ofthe cluster so all of these groups can work simultaneously without consuming more than their

    share of storage and compute resources. You also need to make sure that development

    projects do not impinge upon production work.

    In this exercise you will create independent volumes for each user and project, and then you will

    impose quotas on those volumes.

    Important!

    Dont store data in the root volume (/).

    If all data is in the root volume, you lose the ability to specify location, quota, or HA properties

    for different types of data.

    As soon as you set up your cluster, start creating volumes to organize data on the cluster. As this

    lab will demonstrate, MapR recommends that you create at least the following volumes:

    1. Create a separate volume for each user.

    2. For active projects, create separate volumes for development work and production

    activity.

    Note: In order for a MapR cluster to function correctly, the user accounts and groups must be

    set up identically across all nodes.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 37

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    43/107

    Cluster Admin on Hadoop

    Lab 4.2 Overview

    The diagram below illustrates the key concepts of this exercise. In this case user01 and user02

    are in the Log Analysis Development group (loganalysis_dev). Each of these users has

    permission to read and write data to the project volume as well as their own user volume. The

    cumulative storage used by these volumes rolls up to a group referred to as an AccountingEntity. Each user, volume and Accounting Entity can have a separate disk quota for flexible

    management of cluster disk usage.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 38

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    44/107

    Cluster Admin on Hadoop

    Lab 4.2 Set-up

    1. Set up the users and groups on all your cluster nodes. Note: they must all have the

    same UID and GID on every node in the cluster. This is an opportunity to use the clush

    utility if you wish.

    # yum install clustershell

    For example, run groupadd on every node in your cluster.

    # groupadd -g 5000 loganalysis_dev

    Add individual users on every node.

    # useradd u 5001 g loganalysis_dev user17

    Or

    # clush -a groupadd -g 8000

    # clush -a useradd -u 8001 -g 8000

    2. Add the user to the MCS to the MCS permissions popup.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 39

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    45/107

    Cluster Admin on Hadoop

    Name Username/loginID Groupname Teamname/Clusterna

    me

    user01 webcrawl_dev Team1

    user02 webcrawl_dev Team1

    user03 webcrawl_prod Team1

    user04 webcrawl_prod Team1

    user05 frauddetect_dev Team2

    user06 frauddetect_dev Team2

    user07 frauddetect_prod Team2

    user08 frauddetect_prod Team2

    user09 recommendations_dev Team3

    user10 recommendations_dev Team3

    user11 recommendations_prod Team3

    user12 recommendations_prod Team3

    user13 twittersentiment_dev Team4

    user14 twittersentiment_dev Team4

    user15 twittersentiment_prod Team4

    user16 twittersentiment_prod Team4

    user17 loganalysis_dev Team5

    user18 loganalysis_dev Team5

    user19 loganalysis_prod Team5

    user20 loganalysis_prod Team5

    PROPRIETARY AND CONFIDENTIAL INFORMATION 40

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    46/107

    Cluster Admin on Hadoop

    Lab 4.2 Steps

    Examine the volumes already on the cluster

    1. Connect to the MCS for your cluster

    2. Click Volumesunder MapR-FS in the left navigation pane:

    Notice how many volumes are listed. Do these include systems volumes? Hint: noticewhether or not the Systems check box is selected on upper menu.

    Display only the non-system volumes by de-selecting the System check box on

    the upper menu.

    Locate the New Volume button that lets you create a new volume.

    What other volume actions are allowed in the Volume Actions modify volume

    menu?

    Examine volume properties from the volumes list

    1. From the list of volumes, choose a volume to examine.

    Look across the columns to find whether the volume of interest contains data, and if

    so, what is the data size?

    What is the replication factor listed for the volume you are examining?

    2. Find more details for this volume on the Volumes Properties pane. Hint: Open the pane

    by clicking the highlighted name of the volume.

    What is the minimum replication factor for this volume?

    Does the volume have a quota?

    PROPRIETARY AND CONFIDENTIAL INFORMATION 41

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    47/107

    Cluster Admin on Hadoop

    Practice Creating and Removing Volumes

    3. Click the New Volume button.

    4. Select Standard Volume for the Volume Type in the new pop up window.

    5. Enter a volume name using your name (or some other unique name) and designate

    volume number 1 (e.g. name-vol1 where name is your name) in the Volume Name

    field.

    6. Type the mount path /name-vol1 in the Mount Path field.

    Note: MapR MCS will not create any parental directories above the mount point, so make them

    beforehand if necessary with the mkdir command.

    7. Verify /data is displayed in the Topology field (This is the default topology; we will

    discuss topology in the next lecture).

    8. Verify the default replication factor and minimum replication settings. Are they set to

    what was recommended in the Volumes lecture?

    9. At the bottom of the popup window, click OK to create the volume.

    10.Verify that your new volume appears in the volumes list. Do you see the volumes

    created by the other students in the class?

    (Note: If not, you will need to go to the volume name filter the top and remove the

    filter by clicking the minus sign:

    Repeat the process in Step 1 to create a user volume 2 for your name.

    Verify that your new volume appears in the volumes list.

    Once again remove the filter so that you can view the full list of non-system

    volumes.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 42

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    48/107

    Cluster Admin on Hadoop

    Remove one volume

    1. Decide which of your own volumes you want to remove and select it by clicking the

    check box by the volume name.

    2. Select Remove on the modify Volume menu. You will see this dialog box:

    Make your choice for what style of removal you want and click the Remove Volume

    button on lower right.

    Verify in the volumes list that one of your volumes has disappeared.

    Create a volume for each user

    In this step, you will create a home volume for all project members, if applicable. On each user

    volume:

    Restrict the volume to the /data/rack2 topology, which prevents users from consuming

    storage resources on /data/rack1.

    Assign the Accounting Entity of the user volume to the appropriate group for that user.

    Assigning this Accounting Entity prevents the members of the group from collectively

    overshooting a storage quota for the project.

    Set quotas for the user volume.

    Note: user17 and loganalysis_dev are used as examples below. Be sure to substitute the

    appropriate user name and group when you create the volumes for your team members.

    1. In the MCS, in the Navigationpane under the MapR-FSgroup, click Volumes.

    2. In the Volumes tab click the New Volume button.

    3. Following the example below, enter the volume settings for each user volume in the

    New Standard Volume dialog box.

    Volume Setup section

    PROPRIETARY AND CONFIDENTIAL INFORMATION 43

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    49/107

    Cluster Admin on Hadoop

    Volume Type: Standard Volume

    Volume Name: user17-homedir

    Mount Path: /mapr//home/user17/vol

    Topology: /data/rack2

    User/Group (This specifies the Accounting Entity)

    Group loganalysis_dev

    Note: group must exist on all nodes in the cluster

    Permissions: u: user 17 f cUsage Tracking

    o Quotas(This specifies disk quota for the volume itself)o Volume Advisory Quota: 100G

    o Volume Hard Quota: 128G

    4. Click OK.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 44

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    50/107

    Cluster Admin on Hadoop

    Command Line

    It is also possible to create a new volume at the command line. For example:

    maprcli volume create -path /home/user17/vol \

    -ae loganalysis_dev -aetype 1 -topology /data/rack2 \

    -quota 128G -advisoryquota 100G \

    -user user17:fc -name user17-homedir

    Note:Themaprcli volume createcommand requires specific ordering of

    arguments. Make sure that the - nameoption comes last.

    You can change quotas later at the command line. For example:

    maprcli volume modify -quota 20G -advisoryquota 15G \

    -name user17-homedir

    5. Change ownership of the volume for the user. At a command line type:

    chown user17 /mapr//home/user17/

    Create a volume for your t eam pro ject

    In this step, you will create a volume for your team project, if applicable. Bear in mind the

    following criteria for your project volume

    Restrict development volumes to the /data/rack2 topology, which prevents development

    projects from consuming storage resources on /rack1.

    Production volumes should be allowed to span the entire cluster, so they will have a

    topology of /data

    Set group permissions on each volume:

    For developmentvolumes, members of both prod and dev groups get full control

    Forproductionvolumes, only members of the prod group get full control

    Assign your group as the Accounting Entity

    Set quotas for the project volume:

    Development volumes Advisory Quota is 9T and the Hard Quota is 10T

    Productionvolumes Advisory Quota is 19T and the Hard Quota is 20T

    Note: loganalysis_dev is used in the examples below. Be sure to substitute the appropriate user

    name and group when you create the volumes for your project.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 45

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    51/107

    Cluster Admin on Hadoop

    1. Create the top-level project directory under /mapr//home/, if it

    doesnt exist. For example, at a command line type:

    mkdir /mapr//home//

    2. In the MCS, in the Navigation pane under the MapR-FS group, click Volumes.

    3. Create the project volume. In the Volumes tab click the New Volume button.

    4. Following the example below, enter the volume settings for the project volume in the

    New Standard Volume dialog box.

    Volume Setup section

    Volume Type: Standard Volume

    Volume Name: loganalysis-dev

    MountPath: /mapr/// home/ l oganal ysi s_dev/ vol

    Note: the example below is for a development group volume. If you are creating a

    volume for a production group then the topology would be / dat a

    Topology: / dat a/ r ack2

    Permissions section

    Note: the example below is for a development group volume. If you are creating a

    volume for a production group, do not add permissions for the development group.

    g:loganalysis_dev fc

    g:loganalysis_prod fc

    Usage Tracking

    User/Group (This specifies the Accounting Entity)

    Group loganalysis_dev

    Quotas (This specifies disk quota for the volume itself)

    Note: the examples below are for a development group volume. If you are creating a

    volume for a production group the Advisory Quota is 19T and the Hard Quota is 20T.

    Volume Advisory Quota: 9T

    Volume Hard Quota: 10T

    5. Click OK.

    6. Change ownership and permissions of the project volume. At a command line type:

    PROPRIETARY AND CONFIDENTIAL INFORMATION 46

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    52/107

    Cluster Admin on Hadoop

    chgrp loganalysis_dev

    /mapr//home/loganalysis_dev/vol

    chmod g+rwx /mapr//home/loganalysis_dev/vol

    Verify that the volumes are set up correctly

    1. In the MCS, in the Navigation pane under the MapR-FS group, click Volumes. The

    Volumes view appears, listing all volumes in the cluster.

    2. Confirm that all of the volumes you created are listed in the Volumes view. Other

    volumes that are part of the default cluster configuration may also appear here. You can

    use the Filter option to list, for example, only the volumes with a mount path matching

    /home*, as shown below.

    3. Navigate the volumes at the command line and verify that they have been mounted. For

    example:

    ls -al /mapr//home/

    ls -al /mapr//home/loganalysis_dev/vol

    You should see the volumes you just created in the previous steps mounted in these

    locations.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 47

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    53/107

    Cluster Admin on Hadoop

    Set disk usage quotas for your project Accounting Entity

    By setting a quota on an Accounting Entity, we can make sure that all volumes assigned to the

    Accounting Entity (including user volumes and project volumes) do not collectively overshoot a

    project maximum.

    1. In the MCS, in the Navigationpane under the MapR-FSgroup, click User Disk Usage.

    The User Disk Usage panel displays all users and groups that have been assigned as an

    Accounting Entity (e.g. loganalysis_dev).

    2. Click on your project Accounting Entity. The Group Properties dialog box appears.

    3. Following the example below, enter the quota settings for your project Accounting

    Entity in the Usage Trackingsection of the Group Propertiesdialog box.

    For development projects:

    Turn on User/Group Advisory Quota. Enter 9T

    Turn on User/Group Hard Quota. Enter 10T

    For production projects:

    Turn on User/Group Advisory Quota. Enter 19T

    Turn on User/Group Hard Quota. Enter 20T

    PROPRIETARY AND CONFIDENTIAL INFORMATION 48

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    54/107

    Cluster Admin on Hadoop

    Command Line

    It is also possible to set the Accounting Entity quotas at the command line. For example:

    maprcli entity modify -quota 10T -advisoryquota 9T \

    -name loganalysis_dev -type 1

    Conclusion

    Before you begin adding data to your cluster or submitting jobs make a decision about topology

    (node/data placement) and implement this decision on your cluster

    Create volumes early and often. It is much easier to manage cluster data at a volume level than

    managing all of the data on the cluster as one enormous data set. Imagine trying to manage

    petabytes of data!

    Creating separate volumes provides flexibility of resource management by separating ownership

    from accounting

    Do not use the / or /data/default-rack topology for data placement

    PROPRIETARY AND CONFIDENTIAL INFORMATION 49

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    55/107

    Lesson 5: Data Ingestion, Access &

    Availability

    Labs Overview

    Lesson 5 labs cover the following topics:

    Accessing the cluster using NFS

    Snapshots

    Mirrors

    Multiple Clusters and Disaster Recovery

    5.1 Get Data into an NFS Cluster

    Topics and tasks in this first lab will help you to

    understand the significance of NFS in MapR

    learn how to get data into a cluster using NFS

    view and manipulate data directly on your cluster using standard Linux file commands via

    NFS

    Before you begin the lab steps, the cluster filesystem must be mounted on the data instance.

    Create Input Directory for Data

    Copy Data from Data Instance to Input Directory on Cluster

    1. SSH to the data instance (NFS node for exercise and contained in your hosts file )

    mkdir /mapr/

    2. Mount you cluster to the NFS client node

    # mount t nfs :/mapr /mapr/

    3. Copy the data from the /etc directory on the data instance to the input directory on

    your project volume that you created in the previous step

    cp v /etc/*.conf

    /mapr//home/loganalysis_dev/input

    4. Verify that the data is now on in the input directory on your cluster volume

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    56/107

    Cluster Admin on Hadoop

    ls /mapr//home/loganalysis_dev/input

    You should see a collection of files that end in .conf

    5. Verify that the data you moved from the data instance is now on the cluster in your

    project volume

    l s / mapr / / home/ l oganal ysi s_dev/ i nput

    Run a MapReduce Job on Data

    1. Run a MapReduce job on the data

    hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2- \

    dev-examples.jar wordcount \

    2. View the output of the MapReduce job

    ls /mapr//home/loganalysis_dev/output1

    Modify Data and Run MapReduce again

    1. From the/home//input

    Use `sed` to add some files to your input data directory

    for i in `ls *`; do cp $i `echo $i | sed"s/.conf/AA.conf/g"`; done

    2. Re-run the same MapReduce job on the data sending the output to a new directory

    hadoop jar /opt/mapr/hadoop/hadoop-0.20.2/hadoop-0.20.2-

    dev- \examples.jar wordcount /home//dev/input

    \ /home//dev/output2

    Compare Results f rom Both MapReduce Jobs

    1. Compare the output from the MapReduce jobs

    diff \ /mapr/my.cluster.com/home//output1/

    \ part-r-00000 \

    /mapr/my.cluster.com/home//output2/ \

    part-r-00000 \

    You should see the change you made in the previous step

    PROPRIETARY AND CONFIDENTIAL INFORMATION 51

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    57/107

    Cluster Admin on Hadoop

    Conclusion

    In this lab you experienced copying data from an external data source to the cluster storage via

    NFS. You were able to do so with standard Linux file commands that are familiar to system

    administrators. This process would have been much more technically challenging and taken a

    significantly longer time to perform without NFS.

    Lab 5.2: SnapshotsExplore how snapshots work by creating snapshots at various points in time of a volume

    containing changing data to see that each snapshot shows data from a fixed point in time. Also

    see that snapshot creation is almost instantaneous and that the snapshot can preserve data that

    has since been deleted or changed. Learn to apply a schedule so that snapshots are

    automatically created at fixed intervals. Schedules also allow snapshots to expire at a time you

    designate. This lad has 4 exercises:

    Create a snapshot in two ways

    Show how snapshots capture frozen views of past state

    Show snapshots preserve deleted data

    Create a snapshot schedule from the MCS

    Create snapshot in two ways

    This exercise will create a snapshot of a volume at a particular point in time, using two different

    methods for making a snapshot.

    Preparation: Before starting this exercise, you should have created a volume for your

    experiments and mounted it. If you havent already created such a volume, do so now using the

    MCS. Make sure that your volume is different from the volumes other students are using for

    this exercise to minimize confusion about who is doing what.

    Note: The diff and vi you used above are standard Linux commands. Because the cluster

    filesystem is mounted via NFS, any standard Linux programs that operate on text files (sed,

    awk, grep, etc.) can be used with data on your cluster. This would not be possible without

    NFS. You would need to copy the file out of the cluster first before performing your task andthen copy the resultant file back into the cluster.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 52

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    58/107

    Cluster Admin on Hadoop

    Put some sample data into your volume

    1. Use ssh to log in to a node in your cluster. Use your own user id here.

    $ ssh mapr@classnode-cluster

    2. Change directory to your personal volume.

    $ cd /mapr//snapshot_lab_mnt_user01

    3. Create a data file called STATIC in your personal user-volume containing whatever

    data you choose.

    $ cat /etc/hosts > STATIC

    Create a volume snapshot of your volume using MCS

    Use the MCS to create a snapshot, as shown here:

    Select New Snapshot from the pull down menu under Modify Volume on top bar, provide a

    name for your snapshot, and click OK to create a snapshot of the selected volume, in this case,

    snapshot_lab_vol_user01. This will create a snapshot of the volume you have selected.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 53

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    59/107

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    60/107

    Cluster Admin on Hadoop

    Create new data files in your volume by running a shell scrip t

    .Run the following commands:

    $ cd /mapr//snapshot_lab_mnt_user01

    $ while true; do

    touch file-$(date +%T)

    date >> log; sleep 13

    done &

    This creates a new file every 13 seconds as this script runs in the background. The file name of

    each file will contain the time the file is created. The last command will also log the time eachfile is created. This log file will look something like this:

    Thu Dec 13 17: 15: 44 PST 2012

    Thu Dec 13 17: 15: 57 PST 2012

    Thu Dec 13 17: 16: 10 PST 2012

    Thu Dec 13 17: 16: 23 PST 2012

    The files created will look something like this:

    $ ls

    f i l e- 17: 15: 44 f i l e- 17: 16: 23 l og

    f i l e- 17: 15: 57 f i l e- 17: 16: 10 STATI C

    $

    Create a new snapshot, wait about 30 seconds, then create another snapshot

    Note the last time notation that was displayed in the original ssh window when you created

    each snapshot by putting a line into the log file.

    $maprcli volume snapshot create -volumesnapshot_lab_vol_user01

    -snapshotname snapshot3_user01; echo snapped $(date)

    >> log

    PROPRIETARY AND CONFIDENTIAL INFORMATION 55

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    61/107

    Cluster Admin on Hadoop

    Explore the snapshot directory from CLI

    1. Change directory into the mount point of the volume you created the snapshots for

    earlier

    2. List all files and directories there using "ls -a". Note that you won't see the .snapshot

    directory because it is hidden. You can see the contents of the .snapshot directory if

    you explicitly give its name, but you wont see it otherwise.

    Even though you don't see the .snapshot directory using l sin the volume mount point, it is still

    there and you can look inside. Do this:

    $ ls alh .snapshot

    total 2.5K

    drwxr-xr-x. 5 root root 3 Jul 16 12:58 .

    drwxr-xr-x. 2 root root 2 Jul 16 12:57 ..

    drwxr-xr-x. 2 root root 1 Jul 16 12:24 snapshot2_user01

    drwxr-xr-x. 2 root root 2 Jul 16 12:57 snapshot3_user01

    drwxr-xr-x. 2 root root 1 Jul 16 12:24

    SNP_of_lab_vol_user01_ ---2013-07-16.12-31-44

    You should see the snapshots that you created earlier.

    Note: You can also see a list of snapshots in the MCS along with details like when they werecreated and when they will expire. You will not, however, be able to see the contents of the

    snapshots from the MCS.

    1. List the contents of each snapshot. You should see that more files appear in each

    subsequent snapshot, like this:

    $ ls .snapshot/*

    . snapshot / snapshot 1:

    STATI C

    . snapshot / snapshot 2:

    f i l e- 08: 39: 16 f i l e- 08: 39: 55 f i l e- 08: 40: 34 f i l e- 08: 41: 13

    f i l e- 08: 39: 29 f i l e- 08: 40: 08 f i l e- 08: 40: 47 l og

    PROPRIETARY AND CONFIDENTIAL INFORMATION 56

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    62/107

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    63/107

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    64/107

    Cluster Admin on Hadoop

    Schedule snapshots from MSC

    You will need a schedule for this next part of the lab. Schedules are independent of volumes,

    snapshots or mirrors. A schedule simply expresses a policy in terms of frequency and retention

    times.

    Create a custom schedule

    Using the MCS to create a schedule:

    1. Click Schedules under MapR-FS in the navigation pane

    2. Click the New Schedule button

    3. Give the schedule a name (every_5 _minutes) and a rule (say every 5 minutes and

    retain(expire) after 45 minutes)

    4. Click the save schedule button

    Note: the schedule is not currently applied to any volumes

    Apply the schedule

    Now you should use the MCS to apply the custom schedule a snapshot schedule for one of your

    volumes:

    1. Click Volumes under MapR-FS in the Navigation pane

    2. Click the name of one of your volumes

    3. Scroll down to the Snapshot Scheduling section

    PROPRIETARY AND CONFIDENTIAL INFORMATION 59

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    65/107

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    66/107

    Cluster Admin on Hadoop

    Lab 5.3: Mirrors and schedules

    Gain experience with making mirrors manually via the MCS and CLI. Also learn to apply a

    schedule to update data from the source volume. This lab has three parts:

    Create a mirror from the MapR Control System (MCS)

    Apply a schedule to the mirror

    Create a mirror from CLI and initiate a mirror sync

    Create a mirror from the MCS

    Create a local mirror based on a source volume

    1. Connect to the cluster MCS

    2. Choose one of the ways to set up a mirror volume, for instance, choose volumes from

    the left bar to display volumes of choice for the source volume. If possible pick a volume

    containing data for the source volume so will be able to verify that it is copied to the

    new mirror volume.

    2. Make the selection New Volume from top menu and fill in the template to make a

    local mirror (Mounting is optional )

    PROPRIETARY AND CONFIDENTIAL INFORMATION 61

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    67/107

    Cluster Admin on Hadoop

    Now you have created the mirror volume, but no data has been copied to it.

    3. Verify your new mirror volume exists by selecting Mirror Volumes on left bar menu to

    display names of all mirrors.

    Copy data to your new mirror volume

    1. Use the MCS to start mirroring by selecting this option from the Modify Volume

    button drop down menu.

    2. Verify that data are copied to your mirror volume by watching the display of mirror

    volumes. If there is a lot of data, you will see an indication that the copying is in

    progress:

    PROPRIETARY AND CONFIDENTIAL INFORMATION 62

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    68/107

    Cluster Admin on Hadoop

    Apply a schedule to the mir ror

    1. Use the MCS to apply a schedule to update your mirror volume.

    Create a mirror from CLI and in itiate a mirror sync

    Use CLI to create a new local mirror volume of a different source volume .

    1. CLI to manually create a mirror volume

    CLI example:

    Determine schedule ids available

    #maprcli schedule list

    # maprcli volume create name -

    source \ < source_vol_mirror@clusterName> -type 1 schedule

    2. Use CLI to initiate a mirror sync

    # maprcli volume mirror start name

    OR

    # maprcli volume mirror push name

    PROPRIETARY AND CONFIDENTIAL INFORMATION 63

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    69/107

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    70/107

    Cluster Admin on Hadoop

    Set Up

    1. Verify all nodes in the source cluster have for the source cluster(line 1)

    and configure all nodes to be aware of the (line2)

    2. SSH to the node you are configuring on the source cluster

    3. Verify in / opt / mapr / conf / mapr - cl ust er s. confthat the is there

    Team1

    4. Add a second line in / opt / mapr / conf / mapr - cl ust er s. conffor the remote

    cluster in the format:

    : 7222 : 7222

    cluster2 is the name of the destination cluster

    and are the CLDB nodes in the destination cluster

    5. Restart the Warden on your all your node(s)

    service mapr-warden restart

    Note: there is a small bug that requires you to add more than one remote cluster to the mapr-

    clusters.conf to be visible in the GUI (3.0.2)

    Configure all nodes in the destination cluster

    Configure all nodes in the destination cluster with a unique name for the destination cluster

    and configure all nodes to be aware of the source cluster

    1. SSH to the nodes you are configuring on the destination cluster

    2. Edit /opt/mapr/conf/mapr-clusters.conf and verify your

    3. Add a second line in / opt / mapr / conf / mapr - cl ust er s. conffor the source

    cluster in the format:

    : 7222 : 7222

    Teamname2 is the name of the source cluster

    and are the CLDB nodes in the source cluster

    4. Restart the Warden on your nodes

    service mapr-warden restart

    Note: the remainder of the steps should be completed by each team

    PROPRIETARY AND CONFIDENTIAL INFORMATION 65

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    71/107

    Cluster Admin on Hadoop

    Verify that each cluster has a unique name

    Verify that each cluster has a unique name and is aware of the other cluster

    1. Log on to the MCS of the source cluster

    2. Verify that the cluster name is cluster1

    3. Click the + symbol next to the cluster1

    4. Verify that cluster2 is listed under Available Clusters

    5. Log on to the MCS of the destination cluster

    6. Verify that the cluster name is cluster2

    7. Click the + symbol next to the cluster2

    8. Verify that cluster1 is listed under Available Clusters

    PROPRIETARY AND CONFIDENTIAL INFORMATION 66

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    72/107

    Cluster Admin on Hadoop

    Create a remote mirror volume on the destination cluster

    You should be logged into the MCS on the destination cluster

    1. Select Volumes in the Navigation pane

    2. Click the New Volume button

    o Volume Type: Remote Mirror Volumeo Enter a unique name for your mirror volumeo Enter the name of the source volumeo Source Cluster Name: cluster1o Enter a unique mount path for the mirror volume

    The parent directory must already exist

    o Ensure that the Mounted checkbox is checkedo Topology: /data

    3.

    Click the OK button

    You should see confirmation at the top of the MCS indicating that the mirror volume was

    created

    PROPRIETARY AND CONFIDENTIAL INFORMATION 67

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    73/107

    Cluster Admin on Hadoop

    Initiate mirroring to the destination cluster

    1. If not already selected, click Volumes in the Navigation pane

    2. Select the mirror volume you created in Step 4

    3. Click Volume Actions

    4. Select Start Mirroring

    PROPRIETARY AND CONFIDENTIAL INFORMATION 68

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    74/107

    Cluster Admin on Hadoop

    Verify data from source cluster was copied to destination c luster

    1. SSH to any node on the destination cluster

    2. List the contents of the destination mirror volume

    hadoop f s l s /

    or, if the cluster filesystem is mounted via NFS

    l s / mapr / Team2/

    You should see the exact same contents in the mirror volume as you do in the original source

    volume

    Conclusion

    In this lab you learned how to copy data from one cluster to another using remote mirroring. As youlearned earlier in this course, MapR volumes allow you a greater degree of control over how to

    manage data in the cluster. Mirroring the volumes that contain your business-critical data to a

    remote cluster can significantly reduce the amount of key data you would lose and the time it would

    take to resume productivity in the event of a disaster.

    Lab 5.5: Using the HBase shell

    The objective of this lab is to get you started with HBase shell and perform operations to create

    a table, put data into the table, retrieve data from the table and delete data from the table.

    Start HBase shell

    1. Get a help listing which demonstrates some basic commands.

    a. Get help specifically on the "put" command.

    2. Create a table called 'Blog' with the following schema: blog title, blog topic, author first

    name, author last name. The blog title and topic must be grouped together as they will

    be saved together and retrieved together. Author first and last name must also be

    grouped together.

    3. List the new table you created in its directory, to confirm it was created.

    4. Insert the following data to the 'Blog' table.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 69

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    75/107

    Cluster Admin on Hadoop

    Where Title and Topic are in column family info and First and Last are in column family author

    ID Title Topic First Last

    1 MapR M7 is Now Available on Amazon

    EMR

    cloud Diana Truman

    2 Enterprise Grade Solutions for HBase highavail Roopesh Nair

    3 A Comparison of NoSQL Database

    Platforms

    nosql Jonathan Morgan

    5. Count the number of rows. Make sure every row is printed to the screen as it iscounted.

    6. Retrieve the entire record with ID '2'.

    7. Retrieve only the title and topic for record with ID '3'.

    8. Change the last name of the author with title "A Comparison of NoSQL Database

    Platforms".

    Display the record to verify the change.

    Display both the new and old value. Can you explain why both values are there?

    9. Display all the records.

    10.Display the title and last name of all the records.

    11.Display the title and topic of the first two records.

    12.Delete the record with title "Enterprise Grade Solutions for HBase".

    Verify that the record was deleted by scanning all records, or

    Try to select just that record.

    13.Drop the table 'Blog'.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 70

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    76/107

    Cluster Admin on Hadoop

    Create a Table us ing MapR Control System (MCS)

    1. Connect to MCS from a bowser using the notes from the instructor. Login with your

    account.

    2. Create a table called 'Blogtest' with the following schema: blog title, blog topic, authorfirst name, author last name. The blog title and topic must be grouped together as they

    will be saved together and retrieved together. Author first and last name must also be

    grouped together.

    3. List the new table you created in its directory, to confirm it was created.

    4. Insert some data and test. Also change the number of versions of cells you can keep

    and test.

    MapR Tables - Solut ions

    1. Use cases fit for MapR tables:

    A data store comprised of petabytes of semi-structured data.

    A data store that will be access by large numbers of client requests, for example

    thousands of reads per second.

    2. Use cases not fit for MapR tables:

    Access normalized relational data with SQL

    Full text search

    3. Columns may be created when data is inserted, they don't have to be defined up front.

    MapR can scale up to very large numbers of columns per column family. However, table

    name and column family have to be defined before data is inserted.

    4. In addition to using the l i s t command in the HBase shell, you can use standard Linux

    l s to list all tables (and files) stored in a particular directory.

    PROPRIETARY AND CONFIDENTIAL INFORMATION 71

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    77/107

    Cluster Admin on Hadoop

    HBase Shell Solution

    You can find the commands below in the file Lab1_hbase_shel l _commands. t xt .

    1. Start a hbase shell in your command window

    user02@ip-10-196-89-226:~$ hbase shell

    2. Hbase help command

    hbase> help

    hbase> help "put"

    3. Create a table /user/user01/Blog with column families info and author

    hbase> create '/user/user01/Blog', {NAME=>'info'},

    {NAME=>'author'}

    Since it was required that title and topic be grouped they will be stored as columns that

    belong to the 'info' column family while 'first' and 'last' will belong to the 'author'

    column family.

    4. List the table:

    hbase> list '/user/user01/'

    5. Execute the following put statements to insert the records into Blog table:

    hbase>put '/user/user01/Blog','1','info:title', 'MapR M7is Now Available on Amazon EMR'

    hbase>put '/user/user01/Blog','1','info:topic','cloud'

    hbase>put '/user/user01/Blog','1','author:first','Diana'

    hbase>put '/user/user01/Blog','1','author:last','Truman'

    hbase>put '/user/user01/Blog','2','info:title','Enterprise Grade Solutions for HBase'

    hbase>put '/user/user01/Blog','2','info:topic','highavail'

    hbase>put '/user/user01/Blog','2','author:first','Roopesh'

    hbase>put '/user/user01/Blog','2','author:last','Nair'

    hbase>put '/user/user01/Blog','3','info:title', 'AComparison of NoSQL Database Platforms'

    hbase>put '/user/user01/Blog','3','info:topic','nosql'

    PROPRIETARY AND CONFIDENTIAL INFORMATION 72

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    78/107

    Cluster Admin on Hadoop

    hbase>put'/user/user01/Blog','3','author:first','Jonathan'

    hbase>put '/user/user01/Blog','3','author:last','Morgan'

    6. Count the number of rows of data inserted

    hbase> count '/user/user01/Blog',INTERVAL=>1

    7. Retrieve the entire record with ID 2

    hbase> get '/user/user01/Blog','2'

    8. Retrieve only the title and topic for record with ID '3'.

    hbase> get

    '/user/user01/Blog','3',{COLUMNS=>['info:title','info:topic

    ']}

    9. The record with title "A Comparison of NoSQL Database Platforms" has ID 3. To update

    its value execute a put operation with that ID.

    hbase>put '/user/user01/Blog', '3','author:last','Smith'

    To verify the put worked, select the record:

    hbase> get '/user/user01/Blog','3',{COLUMNS=>'author:last'}

    To display both version specify the number of versions in a get operation:

    hbase> get '/user/user01/Blog','3',{COLUMNS=>'author:last', VERSIONS=>3}

    The reason we see the old value is cells have up to three versions by default in MapR

    tables.

    10.Display all the records.

    hbase> scan '/user/user01/Blog'

    PROPRIETARY AND CONFIDENTIAL INFORMATION 73

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    79/107

    Cluster Admin on Hadoop

    11.Display the title and last name of all the records.

    hbase> scan'/user/user01/Blog',{COLUMNS=>['info:title','author:last']}

    12.Display the title and topic of the first two records.

    hbase> scan '/user/user01/Blog',

    {COLUMNS=>['info:title','info:topic'],LIMIT=>2}

    13.The record with title "Enterprise Grade Solutions for HBase" has record ID '2'; delete all

    columns for record with ID '2':

    hbase> delete '/user/user01/Blog','2','info:title'

    hbase> delete '/user/user01/Blog','2','info:topic'

    hbase> delete '/user/user01/Blog','2','author:first'

    hbase> delete '/user/user01/Blog','2','author:last'

    14.To delete a table in HBase shell, the table must first be disabled, and then you can drop

    it.

    hbase> disable '/user/user01/Blog'

    hbase> drop '/user/user01/Blog'

    Troubleshooting

    NameError: undefined local variable or method `interval' for #

    Happens for hbase> count '/user/user02/Blog', interval=>1

    Use uppercase INTERVAL example: hbase> count '/user/user02/Blog', INTERVAL=>1

    PROPRIETARY AND CONFIDENTIAL INFORMATION 74

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    80/107

    Cluster Admin on Hadoop

    HBase shell commands (optional)

    The objective of this optional lab lab is to run scripts from your hbase shell. These commands

    can be run individually in a hbase shell. Or they can be pasted into a script and run. Example:

    hbase> source "hbase_script.txt"

    1. Open a vi session and insert the following into your script.

    2. Adjust all references to home directory to the appropriate directory

    3. Name your script

    4. Run your script

    Additional commands to experiment with

    # NOTE: You can copy- past e mul t i pl e l i nes at a t i me# i nt o HBase shel l . Or , you can sour ce a scr i pt .# Exampl e: hbase> sour ce "hbase_scr i pt . t xt "

    # Backgr ound i nf ormat i on on HBase Shel l at :# ht t p: / / wi ki . apache. or g/ hadoop/ Hbase/ Shel l########################################################### Sol ut i on t o Lab 1# NOTE: Change t he t abl e paths t o your own user di r ect ory# so your act i ons don' t conf l i ct wi t h ot her st udent s.# Exampl e: cr eat e ' / home/ user 12/ atabl e' , {NAME=>' cf 1' }##########################################################

    hel phel p "put "

    cr eat e ' / home/ user01/ Bl og' , {NAME=>' i nf o' }, {NAME=>' aut hor ' }l i st ' / home/ user 01/ '

    put ' / home/ user 01/ Bl og' , ' 1' , ' i nf o: t i t l e' , ' MapR M7 i s Now Avai l abl eon Amazon EMR'put ' / home/ user01/ Bl og' , ' 1' , ' i nf o: t opi c ' , ' cl oud'put ' / home/ user01/ Bl og' , ' 1' , ' aut hor : f i r st ' , ' Di ana'put ' / home/ user 01/ Bl og' , ' 1' , ' aut hor : l ast ' , ' Tr uman'

    put ' / home/ user01/ Bl og' , ' 2' , ' i nf o: t i t l e' , ' Ent erpr i se Gr adeSol ut i ons f or HBase'put ' / home/ user01/ Bl og' , ' 2' , ' i nf o: t opi c ' , ' hi ghavai l 'put ' / home/ user 01/ Bl og' , ' 2' , ' aut hor : f i r st ' , ' Roopesh'put ' / home/ user01/ Bl og' , ' 2' , ' aut hor : l ast ' , ' Nai r 'put ' / home/ user 01/ Bl og' , ' 3' , ' i nf o: t i t l e' , ' A Compar i son of NoSQLDat abase Pl atf or ms'put ' / home/ user01/ Bl og' , ' 3' , ' i nf o: t opi c ' , ' nosql 'put ' / home/ user01/ Bl og' , ' 3' , ' aut hor: f i r st ' , ' J onat han'

    PROPRIETARY AND CONFIDENTIAL INFORMATION 75

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    81/107

    Cluster Admin on Hadoop

    put ' / home/ user 01/ Bl og' , ' 3' , ' aut hor : l ast ' , ' Mor gan'

    count ' / home/ user01/ Bl og' , I NTERVAL=>1

    get ' / home/ user 01/ Bl og' , ' 2'get ' / home/ user 01/ Bl og' , ' 3' , {COLUMNS=>[ ' i nf o: t i t l e' , ' i nf o: t opi c' ] }

    put ' / home/ user01/ Bl og' , ' 3' , ' aut hor: l ast ' , ' Smi t h'

    get ' / home/ user01/ Bl og' , ' 3' , {COLUMNS=>' aut hor : l ast ' }

    get ' / home/ user 01/ Bl og' , ' 3' , {COLUMNS=>' aut hor : l ast ' , VERSI ONS=>3}

    scan ' / home/ user 01/ Bl og'scan ' / home/ user 01/ Bl og' , {COLUMNS=>[ ' i nf o: t i t l e' , ' aut hor : l ast ' ] }scan ' / home/ user 01/ Bl og' ,{COLUMNS=>[ ' i nf o: t i t l e' , ' i nf o: t opi c' ] , LI MI T=>2}

    del ete ' / home/ user01/ Bl og' , ' 2' , ' i nf o: t i t l e'del et e ' / home/ user 01/ Bl og' , ' 2' , ' i nf o: t opi c 'del et e ' / home/ user 01/ Bl og' , ' 2' , ' aut hor: f i r st 'del et e ' / home/ user 01/ Bl og' , ' 2' , ' aut hor : l ast '

    #di sabl e ' / home/ user 01/ Bl og'#dr op ' / home/ user01/ Bl og'

    ########################################################### Addi t i onal commands t o exper i ment wi t h# NOTE: You can copy- past e mul t i pl e l i nes at a t i me# i nt o HBase shel l . Or , you can sour ce a scr i pt .# Exampl e: hbase> sour ce "hbase_scr i pt . t xt "##########################################################

    # add cont ent col umn- f ami l y t o tabl eal t er ' / home/ user01/ Bl og' , {NAME=>' cont ent ' }

    # i nser t r ow 1put ' / home/ user 01/ Bl og' , ' Di ana- 001' , ' i nf o: t i t l e' , ' MapR M7 i s NowAvai l abl e on Amazon EMR'put ' / home/ user 01/ Bl og' , ' Di ana- 001' , ' i nf o: aut hor ' , ' Di ana'put ' / home/ user 01/ Bl og' , ' Di ana- 001' , ' i nf o: dat e' , ' 2013. 05. 06'put ' / home/ user 01/ Bl og' , ' Di ana- 001' , ' cont ent : post ' , ' Lor em i psumdol or si t amet , consectet ur adi pi si ci ng el i t '

    # i nser t r ow 2put ' / home/ user 01/ Bl og' , ' Di ana- 002' , ' i nf o: t i t l e' , ' I mpl ement i ngTi meouts wi t h Fut ureTask'put ' / home/ user 01/ Bl og' , ' Di ana- 002' , ' i nf o: aut hor ' , ' Di ana'put ' / home/ user 01/ Bl og' , ' Di ana- 002' , ' i nf o: dat e' , ' 2011. 02. 14'put ' / home/ user 01/ Bl og' , ' Di ana- 002' , ' cont ent : post ' , ' Sed utper spi ci at i s unde omni s i st e nat us er r or si t '

    # i nser t r ow 3

    PROPRIETARY AND CONFIDENTIAL INFORMATION 76

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    82/107

    Cluster Admin on Hadoop

    put ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: t i t l e' , ' Ent er pr i seGr ade Sol ut i ons f or HBase'put ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: aut hor ' , ' Roopesh'put ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: dat e' , ' 2012. 10. 20'put ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' cont ent : post ' , ' At ver o eoset accusamus et i ust o odi o di gni ssi mos duci mus'

    # i nser t r ow 4put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: t i t l e' , ' A Compar i sonof NoSQL Dat abase Pl at f orms'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: aut hor ' , ' J onat han'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' , ' 2013. 01. 08'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' cont ent : post ' , ' Dui s aut ei r ur e dol or i n r epr ehender i t i n vol upt at e vel i t '

    # i nser t r ow 5put ' / home/ user 01/ Bl og' , ' Syl vi a- 005' , ' i nf o: t i t l e' , ' Net Beans I DE7. 3. 1 I nt r oduces J ava EE 7 Suppor t 'put ' / home/ user01/ Bl og' , ' Syl vi a- 005' , ' i nf o: aut hor' , ' Syl vi a'put ' / home/ user 01/ Bl og' , ' Syl vi a- 005' , ' i nf o: dat e' , ' 2012. 07. 20'put ' / home/ user 01/ Bl og' , ' Syl vi a- 005' , ' cont ent : post ' , ' Except eursi nt occaecat cupi dat at non pr oi dent , sunt i n cul pa'

    # count t he data you i nser t ed above, I NTERVAL speci f i es how of t encount s are di spl ayedcount ' / home/ user01/ Bl og' , {I NTERVAL=>2}count ' / home/ user01/ Bl og' , {I NTERVAL=>1}

    # t hi s get won' t r et ur n anyt hi ng as t he rowkey doesn' t exi stget ' / home/ user 01/ Bl og' , ' unknownRowKey'

    # r et r i eve ALL col umns f or t he pr ovi ded r owkeyget ' / home/ user 01/ Bl og' , ' J onat han- 004'# r et r i eve speci f i c col umns f or t he pr ovi ded r owkeyget ' / home/ user 01/ Bl og' , ' J onat han- 004' ,{COLUMN=>[ ' i nf o: aut hor ' , ' cont ent : post ' ] }

    # r et r i eve dat a f or speci f i c col umns and t i me- st ampget ' / home/ user 01/ Bl og' , ' J onat han- 004' ,{COLUMN=>[ ' i nf o: aut hor ' , ' cont ent : post ' ] , TI MESTAMP=>1326061625690}

    # exer ci se di f f er ent scan opt i onsscan ' / home/ user 01/ Bl og'scan ' / home/ user01/ Bl og' , {STOPROW=>' Syl vi a' }

    scan ' / home/ user 01/ Bl og' , {COLUMNS=>' i nf o: t i t l e' ,STARTROW=>' Syl vi a' , STOPROW=>' J onat han' }

    # update t he recor d f ew t i mes and t hen r et r i eve back mul t i pl ever si on# onl y 3 ver si ons are kept by def aul tput ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' , ' 2012. 01. 09'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' , ' 2012. 01. 10'put ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' , ' 2012. 01. 11'

    PROPRIETARY AND CONFIDENTIAL INFORMATION 77

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    83/107

    Cluster Admin on Hadoop

    get ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>3}get ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>2}get ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>1}

    # sel ect s 1 by def aul tget ' / home/ user01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: date' }

    # del et e a r ecor d, del et e al l ver si ons of t he cel lget ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: dat e'del et e ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: dat e'get ' / home/ user 01/ Bl og' , ' Roopesh- 003' , ' i nf o: dat e'

    # del ete t he versi ons bef ore t he pr ovi ded t i mest ampget ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>3}del et e ' / home/ user 01/ Bl og' , ' J onat han- 004' , ' i nf o: dat e' ,1326254739791get ' / home/ user 01/ Bl og' , ' J onat han- 004' , {COLUMN=>' i nf o: dat e' ,VERSI ONS=>3}

    # dr op t he t abl el i st ' / home/ user 01/ 'di sabl e ' / home/ user 01/ Bl og'dr op ' / home/ user 01/ Bl og'l i st ' / home/ user 01/ '

    Using importtsv and copytable

    The objective of this lab is to get you started with HBase shell and perform operations to create

    a table, import flat tab separated data into the table, retrieve data from the table and delete

    data from the table.

    View Existing Table Using MCS

    1. Log onto the MCS.

    2. Select MapR-FS =>MapR Tables

    3. Click on /user/mapr/ under Recently opened tables

    If /user/mapr/ is not displayed under Recently opened tables, enter

    /user/mapr/ in the Go to table field and click the Go button

    4. Look at the information available in the Regions tab

    Each row represents one region of data

    The columns (Start Key, End Key, Physical Size, Logical size, etc.) represent

    meaningful data about the table regions

    PROPRIETARY AND CONFIDENTIAL INFORMATION 78

    2014 MapR Technologies, Inc. All Rights Reserved.

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    84/107

  • 8/10/2019 Administration of Hadoop Summer 2014 Lab Guide v3.1

    85/107

    Cluster Admin on Hadoop

    Note: Highlighted area is syntax for 3.1 permissions

    Notice the first column is defined as HBASE_ROW_KEY, this will take the first field of data

    (namely the numerical index field) and make it the row:key.

    Important: also notice that command above identifies each column in the data file as well as thecolumn family it belongs in. The column family used in the example below is cf1, cf2 and cf3. If

    the table you are importing into has a different column family name, then you will need to

    modify the command below to match the correct column family name.

    6. While the import job is processing, look at the MCS to view changes to the table and

    puts being processed on the node:

    Click Nodes under Cluster

    Click the Overview dropdown and change the value to Performance

    If necessary, scroll to the right so you can see the Gets, Puts and Scans columns.

    You should see a large number of puts across several nodes while your import is

    processing

    Click MapR Tables under MapR-FS

    Click the name of the table you used for the import under Recently opened tables

    Select the Regions tab

    You should see that your table automatically split into a number of regions during

    the import

    7. In an hbase shell examine the data that has been imported

    [root@CentOS001 data2]# hbase shell

    HBase Shell; enter 'help' for list of supported commands. Type "exit" to

    leave the HBase Shell

    Ve