Hadoop Workshop using Cloudera on Amazon EC2
-
Upload
imc-institute -
Category
Technology
-
view
1.009 -
download
4
Transcript of Hadoop Workshop using Cloudera on Amazon EC2
![Page 1: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/1.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 1
Hadoop Workshop usingCloudera on Amazon EC2
May 2015
Dr.Thanachart NumnondaIMC Institute
Modifiy from Original Version by Danairat T.Certified Java Programmer, TOGAF – Silver
![Page 2: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/2.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Launch a virtual server on EC2 Amazon Web Services
![Page 3: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/3.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 4: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/4.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Virtual Server
This lab will use a EC2 virtual server to install aHadoop server using the following features:
1. Ubuntu Server 14.04 LTS
2. m3.xLarge 4vCPU, 15 GB memory, 80 GB SSD
3. Security group: create new
4. Keypair: imchadoop
![Page 5: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/5.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Select a EC2 service and click on Lunch Instance
![Page 6: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/6.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Select an Amazon Machine Image (AMI) andUbuntu Server 14.04 LTS (PV)
![Page 7: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/7.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Choose m3.xlarge Type virtual server
![Page 8: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/8.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Leave configuration details as default
![Page 9: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/9.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Add Storage: 30 GB
![Page 10: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/10.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Name the instance
![Page 11: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/11.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Select Create a new security group > Add Rule asfollows
![Page 12: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/12.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Click Launch and choose imchadoop as a key pair
![Page 13: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/13.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Review an instance / click Connect for an instruction to connect to the instance
![Page 14: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/14.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Connect to an instance from Mac/Linux
![Page 15: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/15.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Connect to an instance from Windows using Putty
![Page 16: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/16.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Connect to the instance
![Page 17: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/17.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Installing Cloudera on EC2
![Page 18: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/18.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Download Cloudera Manager
1) Type command >wgethttp://archive.cloudera.com/cm5/installer/latest/cloudera-manager-installer.bin
2) Type command > chmod u+x cloudera-manager-installer.bin
3) Type command > sudo ./cloudera-manager-installer.bin
![Page 19: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/19.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 20: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/20.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 21: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/21.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Login to Cloudera Manager
Wait several minutes for the Cloudera Manager Server to complete its startup.
Then running web browser: http:// public-ip: 7180
![Page 22: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/22.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Select Cloudera Express Edition
![Page 23: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/23.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 24: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/24.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Provide your instance <public ip>addresses in the cluster
![Page 25: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/25.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 26: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/26.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 27: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/27.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 28: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/28.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 29: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/29.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Browse the private key (imchadoop.pem) file which we have downloaded in theprevious part. Keep Passphrase as blank
![Page 30: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/30.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 31: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/31.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
If you see the above error, DO NOT worry at all, it’s known issue. You can findthe known issue list at Cloudera Issue List.
Click “Back” button until home screen then click “Continue” button
![Page 32: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/32.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 33: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/33.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
If you see the above error, DO NOT worry at all, it’s known issue. You can findthe known issue list at Cloudera Issue List.
Click “Back” button until home screen then click “Continue” button
![Page 34: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/34.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 35: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/35.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Now you will find a tab “Currently Managed Hosts” with their private dns andprivate ip address. Select all and click “Continue”
![Page 36: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/36.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 37: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/37.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 38: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/38.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 39: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/39.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 40: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/40.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 41: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/41.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 42: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/42.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 43: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/43.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 44: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/44.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 45: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/45.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Finish
![Page 46: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/46.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 47: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/47.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Running Hue
![Page 48: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/48.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Running Hue
![Page 49: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/49.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Sign in to Hue
![Page 50: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/50.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Starting Hue on Cloudera
![Page 51: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/51.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 52: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/52.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Viewing HDFS
![Page 53: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/53.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 54: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/54.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Importing/Exporting Data to HDFS
![Page 55: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/55.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Importing Data to Hadoop
Download War and Peace Full Text
www.gutenberg.org/ebooks/2600
$hadoop fs -mkdir input
$hadoop fs -mkdir output
$hadoop fs -copyFromLocal Downloads/pg2600.txt input
![Page 56: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/56.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Review file in Hadoop HDFS
[hdadmin@localhost bin]$ hadoop fs -cat input/pg2600.txt
List HDFS File
Read HDFS File
Retrieve HDFS File to Local File System
Please see also http://hadoop.apache.org/docs/r1.0.4/commands_manual.html
[hdadmin@localhost bin]$ hadoop fs -copyToLocal input/pg2600.txt tmp/file.txt
![Page 57: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/57.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Review file in Hadoop HDFS using File Browse
![Page 58: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/58.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Review file in Hadoop HDFS using Hue
![Page 59: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/59.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hadoop Port Numbers
Daemon DefaultPort
Configuration Parameter inconf/*-site.xml
HDFS Namenode 50070 dfs.http.address
Datanodes 50075 dfs.datanode.http.address
Secondarynamenode 50090 dfs.secondary.http.address
MR JobTracker 50030 mapred.job.tracker.http.address
Tasktrackers 50060 mapred.task.tracker.http.address
![Page 60: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/60.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Removing data from HDFS usingShell Command
hdadmin@localhost detach]$ hadoop fs -rm input/pg2600.txt
Deleted hdfs://localhost:54310/input/pg2600.txt
hdadmin@localhost detach]$
![Page 61: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/61.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Writing Map/ReduceProgram on Eclipse
![Page 62: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/62.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Starting Eclipse in Cloudera VM
![Page 63: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/63.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Create a Java Project
Let's name it HadoopWordCount
![Page 64: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/64.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 64
Add dependencies to the project
● Add the following two JARs to your build path● hadoop-common.jar and hadoop-mapreduce-client-core.jar. Both can be
founded at /usr/lib/hadoop/client● By perform the following steps
– Add a folder named lib to the project
– Copy the mentioned JARs in this folder
– Right-click on the project name >> select Build Path >> thenConfigure Build Path
– Click on Add Jars, select these two JARs from the lib folder
![Page 65: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/65.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 65
Add dependencies to the project
![Page 66: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/66.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 66
Writing a source code
● Right click the project, the select New >> Package● Name the package as org.myorg● Right click at org.myorg, the select New >> Class● Name the package as WordCount● Writing a source code as shown in previoud slides
![Page 67: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/67.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 67
![Page 68: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/68.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 68
Building a Jar file
● Right click the project, the select Export● Select Java and then JAR file● Provide the JAR name, as wordcount.jar● Leave the JAR package options as default● In the JAR Manifest Specification section, in the botton, specify the Main
class● In this case, select WordCount● Click on Finish● The JAR file will be build and will be located at cloudera/workspace
Note: you may need to re-size the dialog font size by select
Windows >> Preferences >> Appearance >> Colors and Fonts
![Page 69: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/69.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 69
![Page 70: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/70.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Running Map Reduce andDeploying to Hadoop Runtime
Environment
![Page 71: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/71.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Running Map Reduce Program
![Page 72: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/72.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Reviewing MapReduce Job in Hue
![Page 73: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/73.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Reviewing MapReduce Job in Hue
![Page 74: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/74.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Reviewing MapReduce Output Result
![Page 75: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/75.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Reviewing MapReduce Output Result
![Page 76: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/76.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Reviewing MapReduce Output Result
![Page 77: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/77.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Running Map Reduceusing Oozie workflow
![Page 78: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/78.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Using Hue: select WorkFlow >> Editor
![Page 79: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/79.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 79
Create a new workflow
● Click Create button; the following screen will be displayed● Name the workflow as WordCountWorkflow
![Page 80: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/80.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 80
Select a Java job for the workflow
● From the Oozie editor, drag Java and drop between start and end
![Page 81: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/81.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 81
Edit the Java Job● Assign the following value
– Name: WordCount
– Jar name: wordcount.jar (select … choose upload from local machine)
– Main Class: org.myorg.WordCount
– Arguments: input/* output/wordcount_output2
![Page 82: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/82.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 82
Submit the workflow● Click Done, follow by Save● Then click submit
![Page 83: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/83.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Working with a csv data
![Page 84: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/84.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 84
A sample CSV data
● The input data is access logs with the following form
Date, Requesting-IP-Address
● We will write a map reduce program to count the number of hits to thewebsite per country.
![Page 85: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/85.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
HitsByCountryMapper.java
package learning.bigdata.mapreduce;
import java.io.IOException;
import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;
public class HitsByCountryMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static String[] COUNTRIES = { "India", "UK", "US", "China" };private Text outputKey = new Text();private IntWritable outputValue = new IntWritable();
@Overrideprotected void setup(Context context) throws IOException, InterruptedException {
super.setup(context);}
@Overridepublic void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
try {String valueString = value.toString();
// Split the value string to get Date and ipAddressString[] row = valueString.split(",");
![Page 86: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/86.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
HitsByCountryMapper.java
// row[0]= Date and row[1]=ipAddressString ipAddress = row[1];
// Get the country name to which the ipAddress belongsString countryName = getCountryNameFromIpAddress(ipAddress);outputKey.set(countryName);outputValue.set(1);context.write(outputKey, outputValue);
} catch (ArrayIndexOutOfBoundsException ex) {context.getCounter("Custom counters", "MAPPER_EXCEPTION_COUNTER").increment(1);ex.printStackTrace();
}}
private static String getCountryNameFromIpAddress(String ipAddress) {
if (ipAddress != null && !ipAddress.isEmpty()) {
int randomIndex = Math.abs(ipAddress.hashCode()) % COUNTRIES.length;return COUNTRIES[randomIndex];
}
return null;}
}
![Page 87: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/87.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
HitsByCountryReducer.java
package learning.bigdata.mapreduce;
import java.io.IOException;import java.util.Iterator;
import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;
public class HitsByCountryReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private Text outputKey = new Text();private IntWritable outputValue = new IntWritable();private int count = 0;
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException,InterruptedException {
count = 0;Iterator<IntWritable> iterator = values.iterator();while (iterator.hasNext()) {
IntWritable value = iterator.next();count += value.get();
}outputKey.set(key);outputValue.set(count);context.write(outputKey, outputValue);
}}
![Page 88: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/88.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
HitsByCountry.javapackage learning.bigdata.main;
import learning.bigdata.mapreduce.HitsByCountryMapper;import learning.bigdata.mapreduce.HitsByCountryReducer;
import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.conf.Configured;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;import org.apache.hadoop.util.Tool;import org.apache.hadoop.util.ToolRunner;
public class HitsByCountry extends Configured implements Tool {
private static final String JOB_NAME = "Calculating hits by country";
public static void main(String[] args) throws Exception {
if (args.length < 2) {System.out.println("Usage: HitsByCountry <comma separated input directories> <output dir>");System.exit(-1);
}int result = ToolRunner.run(new HitsByCountry(), args);System.exit(result);
}
![Page 89: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/89.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
HitsByCountry.java@Overridepublic int run(String[] args) throws Exception {
try {Configuration conf = getConf();Job job = Job.getInstance(conf);
job.setJarByClass(HitsByCountry.class);job.setJobName(JOB_NAME);
job.setMapperClass(HitsByCountryMapper.class);job.setMapOutputKeyClass(Text.class);job.setMapOutputValueClass(IntWritable.class);
job.setReducerClass(HitsByCountryReducer.class);job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);
job.setInputFormatClass(TextInputFormat.class);job.setOutputFormatClass(TextOutputFormat.class);
FileInputFormat.setInputPaths(job, args[0]);FileOutputFormat.setOutputPath(job, new Path(args[1]));
boolean success = job.waitForCompletion(true);return success ? 0 : 1;
} catch (Exception e) {e.printStackTrace();return 1;
}}
}
![Page 90: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/90.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 91: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/91.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Lecture: Developing ComplexHadoop MapReduce
Applications
![Page 92: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/92.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 92
Choosing appropriate Hadoop data types
● Hadoop uses the Writable interface based classes asthe data types for the MapReduce computations.
● Choosing the appropriate Writable data types for yourinput, intermediate, and output data can have a largeeffect on the performance and the programmability ofyour MapReduce programs.
● In order to be used as a value data type, a data typemust implement the org.apache.hadoop.io.Writableinterface.
● In order to be used as a key data type, a data type mustimplement theorg.apache.hadoop.io.WritableComparable<T> interface
![Page 94: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/94.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 94
Hadoop built-in data types
● Text: This stores a UTF8 text● BytesWritable: This stores a sequence of bytes● VIntWritable and VLongWritable: These store variable
length integer and long values● NullWritable: This is a zero-length Writable type that can
be used when you don't want to use a key or value type
![Page 95: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/95.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 95
Hadoop built-in data types
● The following Hadoop build-in collection data types canonly be used as value types.
– ArrayWritable: This stores an array of values belonging to aWritable type.
– TwoDArrayWritable: This stores a matrix of values belonging tothe same Writable type.
– MapWritable: This stores a map of key-value pairs. Keys andvalues should be of the Writable data types.
– SortedMapWritable: This stores a sorted map of key-valuepairs. Keys should implement the WritableComparable interface.
![Page 96: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/96.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 96
Implementing a custom Hadoop Writabledata type
● we can easily write a custom Writable data type byimplementing the org.apache.hadoop.io.Writable interface
● The Writable interface-based types can be used asvalue types in Hadoop MapReduce computations.
![Page 100: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/100.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 100
Choosing a suitable Hadoop InputFormatfor your input data format
● Hadoop supports processing of many different formatsand types of data through InputFormat.
● The InputFormat of a Hadoop MapReduce computationgenerates the key-value pair inputs for the mappers byparsing the input data.
● InputFormat also performs the splitting of the input datainto logical partitions
![Page 101: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/101.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 101
InputFormat that Hadoop provide
● TextInputFormat: This is used for plain text files.TextInputFormat generates a key-value record for eachline of the input text files.
● NLineInputFormat: This is used for plain text files.NlineInputFormat splits the input files into logical splitsof fixed number of lines.
● SequenceFileInputFormat: For Hadoop Sequence fileinput data
● DBInputFormat: This supports reading the input data forMapReduce computation from a SQL table.
![Page 102: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/102.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 102
Implementing new input data formats
● Hadoop enables us to implement and specify customInputFormat implementations for our MapReducecomputations.
● A InputFormat implementation should extend theorg.apache.hadoop.mapreduce.InputFormat<K,V>
abstract class● overriding the createRecordReader() and getSplits()
methods.
![Page 103: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/103.jpg)
Danairat T., 2013, [email protected] Data Hadoop – Hands On Workshop 103
Formatting the results of MapReducecomputations – using HadoopOutputFormats
● it is important to store the result of a MapReducecomputation in a format that can be consumedefficiently by the target application
● We can use Hadoop OutputFormat interface to definethe data storage format
● A OutputFormat prepares the output location andprovides a RecordWriter implementation to perform theactual serialization and storage of the data.
● Hadoop uses theorg.apache.hadoop.mapreduce.lib.output.
TextOutputFormat<K,V> as the default OutputFormat
![Page 104: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/104.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Analytics UsingMapReduce
![Page 105: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/105.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 106: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/106.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 107: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/107.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 108: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/108.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 109: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/109.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 110: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/110.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 111: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/111.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 112: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/112.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 113: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/113.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 114: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/114.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 115: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/115.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 116: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/116.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 117: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/117.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 118: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/118.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
LectureUnderstanding HBase
![Page 119: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/119.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
IntroductionAn open source, non-relational, distributed database
HBase is an open source, non-relational, distributed databasemodeled after Google's BigTable and is written in Java. It isdeveloped as part of Apache Software Foundation's ApacheHadoop project and runs on top of HDFS (, providingBigTable-like capabilities for Hadoop. That is, it provides afault-tolerant way of storing large quantities of sparse data.
![Page 120: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/120.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
HBase Features
● Hadoop database modelled after Google's Bigtable● Column oriented data store, known as Hadoop Database● Support random realtime CRUD operations (unlike
HDFS)● No SQL Database● Opensource, written in Java● Run on a cluster of commodity hardware
Hive.apache.org
![Page 121: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/121.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
When to use Hbase?
● When you need high volume data to be stored ● Un-structured data● Sparse data● Column-oriented data● Versioned data (same data template, captured at various
time, time-elapse data)● When you need high scalability
Hive.apache.org
![Page 122: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/122.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Which one to use?
● HDFS● Only append dataset (no random write)● Read the whole dataset (no random read)
● HBase● Need random write and/or read● Has thousands of operation per second on TB+ of data
● RDBMS● Data fits on one big node● Need full transaction support● Need real-time query capabilities
Hive.apache.org
![Page 123: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/123.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 124: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/124.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
![Page 125: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/125.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
HBase Components
Hive.apache.org
● Region● Row of table are stores
● Region Server● Hosts the tables
● Master● Coordinating the Region
Servers● ZooKeeper● HDFS● API
● The Java Client API
![Page 126: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/126.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
HBase Shell Commands
Hive.apache.org
![Page 127: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/127.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Hands-On: Running HBase
![Page 128: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/128.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Starting HBase shell
[hdadmin@localhost ~]$
[hdadmin@localhost ~]$ hbase shell
HBase Shell; enter 'help<RETURN>' for list of supported commands.
Type "exit<RETURN>" to leave the HBase Shell
Version 0.94.10, r1504995, Fri Jul 19 20:24:16 UTC 2013
hbase(main):001:0>
![Page 129: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/129.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Create a table and insert data in HBase
hbase(main):009:0> create 'test', 'cf'
0 row(s) in 1.0830 seconds
hbase(main):010:0> put 'test', 'row1', 'cf:a', 'val1'
0 row(s) in 0.0750 seconds
hbase(main):011:0> scan 'test'
ROW COLUMN+CELL
row1 column=cf:a, timestamp=1375363287644,value=val1
1 row(s) in 0.0640 seconds
hbase(main):002:0> get 'test', 'row1'
COLUMN CELL
cf:a timestamp=1375363287644, value=val1
1 row(s) in 0.0370 seconds
![Page 130: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/130.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Using Data Browsers in Hue for HBase
![Page 131: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/131.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Using Data Browsers in Hue for HBase
![Page 132: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/132.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Using Data Browsers in Hue for HBase
![Page 133: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/133.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Recommendation to Further Study
![Page 134: Hadoop Workshop using Cloudera on Amazon EC2](https://reader034.fdocuments.in/reader034/viewer/2022042522/55a9f8291a28abab5d8b4659/html5/thumbnails/134.jpg)
Danairat T., , [email protected]: Thanachart Numnonda, [email protected] May 2015Hadoop Workshop using Cloudera on Amazon EC2
Thank you
www.imcinstitute.comwww.facebook.com/imcinstitute