Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine...
-
Upload
clinton-may -
Category
Documents
-
view
228 -
download
2
description
Transcript of Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine...
![Page 1: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/1.jpg)
Working with Hadoop
![Page 2: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/2.jpg)
Requirement
• Virtual machine software – VM Ware– VirtualBox
• Virtual machine images– Download from Cloudera (Founded by leaders in the field, including father of Hadoop)
![Page 3: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/3.jpg)
Start the Virtual Machine
![Page 4: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/4.jpg)
Inside the Virtual machine
•CentOS 6.4•JDK•Hadoop 2.5.0•Eclipse 4.2.6 (Juno)
![Page 5: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/5.jpg)
Basics of HDFS (routine)
5
• With Terminal– hadoop– hadoop version– hadoop jar– hadoop fs …– hadoop fs -ls : List all file in HDFS– hadoop fs –put / -get / -mkdir / -rmdir...
![Page 6: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/6.jpg)
Copy Files from Windows to VM
• WinSCP (see Demo at bin\scp_ssh\winscp575)– Protocol scp– Hostname (Get from ifconfig in Terminal)– Username/Passoword = cloudera/cloudera
6
![Page 7: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/7.jpg)
Copy Files from VM (CentOS) to HDFS
• hadoop fs -put localfiles /user/cloudera
7
![Page 8: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/8.jpg)
Copy Files from Windows to HDFS
• Via HUE services
8
![Page 9: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/9.jpg)
Using web server – port 8888 (File manager)
![Page 10: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/10.jpg)
Hadoop Administration
• http://hostname:50070/dfshealth.html#tab-overview
10
![Page 11: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/11.jpg)
WordCount Example in HadoopWordCount Example in Hadoop
• #1: Via guidelines in Cloudera website• #2: Directly in Eclipse (Preferred)
![Page 12: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/12.jpg)
WordCount in Cloudera Website
• http://www.cloudera.com/content/cloudera/en/documentation/hadoop-tutorial/CDH5/Hadoop-Tutorial/ht_wordcount1.html
• Source code downloaded from http://tiny.cloudera.com/hadoopTutorialSample
• Source code details and explanations: http://www.cloudera.com/content/cloudera/en/documentation/hadoop-tutorial/CDH5/Hadoop-Tutorial/ht_wordcount1_source.html
12
![Page 13: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/13.jpg)
WordCount in Cloudera Website
• Create directory in HDFS– $ hadoop fs -mkdir /user/cloudera – $ hadoop fs -chown cloudera /user/cloudera– $ hadoop fs -mkdir /user/cloudera/wordcount
/user/cloudera/wordcount/input• Create sample text
– 1: Directly in CentOS $$ echo "Hadoop is an elephant" > file0 $ echo "Hadoop is as yellow as can be" > file1 $ echo "Oh what a yellow fellow is Hadoop" > file2And then move to HDFS$ hadoop fs -put file* /user/cloudera/wordcount/input– 2: Create in Windows and Copy to HDFS via HUE
13
![Page 14: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/14.jpg)
WordCount in Cloudera Website
• Compilation error
14
![Page 15: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/15.jpg)
WordCount Example in HadoopWordCount Example in Hadoop
• #1: Via guidelines in Cloudera website• #2: Directly in Eclipse (Preferred)
![Page 16: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/16.jpg)
WordCount in Eclipse environment
• http://kishorer.in/2014/10/22/running-a-wordcount-mapreduce-example-in-hadoop-2-4-1-single-node-cluster-in-ubuntu-14-04-64-bit/
• https://www.youtube.com/watch?v=hJsaChh2Yhk (Some parts are different for ClouderaVM)
16
![Page 17: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/17.jpg)
![Page 18: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/18.jpg)
18
![Page 19: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/19.jpg)
19
![Page 20: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/20.jpg)
Update source codes (from website)
20
![Page 21: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/21.jpg)
Adding JAR files to Project
21
![Page 22: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/22.jpg)
usr/lib/hadoop; usr/lib/hadoop/lib;usr/lib/hadoop-mapreduce; usr/lib/hadoop-mapreduce/lib
22
![Page 23: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/23.jpg)
Run ConfigRun Run Configurations
23
![Page 24: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/24.jpg)
File Export
24
![Page 25: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/25.jpg)
25
![Page 26: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/26.jpg)
Update Properties in jar file
26
![Page 27: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/27.jpg)
Prepare for run
• Make HDFS directory
27
![Page 28: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/28.jpg)
Copy sample input to HDFS (via HUE)
28
![Page 29: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/29.jpg)
Run the example (in .jar folder)(Make sure to remove output folder before
use)
29
![Page 30: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/30.jpg)
View the result
30
![Page 31: Working with Hadoop. Requirement Virtual machine software –VM Ware –VirtualBox Virtual machine images –Download from Cloudera (Founded by leaders in the.](https://reader035.fdocuments.in/reader035/viewer/2022062401/5a4d1ad07f8b9ab05997121d/html5/thumbnails/31.jpg)
Other sources
• Very nice example @ https://hadoop.apache.org/docs/r1.2.1/mapred_tutorial.html
31