Automated Hadoop Cluster Construction on EC2
-
Upload
markkerzner -
Category
Technology
-
view
1.561 -
download
2
description
Transcript of Automated Hadoop Cluster Construction on EC2
![Page 1: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/1.jpg)
Automated Hadoop Clusters on EC2
Mark KerznerSHMsoft
![Page 2: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/2.jpg)
What is Hadoop? :) :) :)
Everybody knows that ... What is your definition?
![Page 3: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/3.jpg)
What is a cloud?
Everybody knows that, but 1. Elastic resources2. Internet delivery3. SAAS4. Virtualization5. Device-enabled6. Only (1) or all of the above
![Page 4: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/4.jpg)
You are the Hadoop programmer
... and you need tools What are your alternatives?● IDE● Local "cluster"● Pseudo-distributed cluster● EC2
![Page 5: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/5.jpg)
You are the Hadoop programmer
... and you need tools What are your alternatives?● IDE - compile and run the code● Local "cluster" - local file system● Pseudo-distributed cluster - test outside● EC2 - test on the cluster, test for scale
![Page 6: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/6.jpg)
What are your resources
● Tom White, "Hadoop, the Definitive Guide"● www.hadoopilluminated.com
![Page 7: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/7.jpg)
For real play, you need a cluster
![Page 8: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/8.jpg)
Hadoop+ (oh, by the way...)
HBase, Cassandra, MongoDB, NoSQL, Dynamo, BigTable, Dryad (MS), Azure (MS), MapReduce, MapR (EMC), Cloudera distribution, EMC distribution, IBM distribution...
![Page 9: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/9.jpg)
WhirrSetup export AWS_ACCESS_KEY_ID=... export AWS_SECRET_ACCESS_KEY=... Installcurl -O http://www.apache.org/dist/whirr/whirr-0.7.1/whirr-0.7.1.tar.gztar zxf whirr-0.7.1.tar.gz; cd whirr-0.7.1 Generate key sssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa_whirr Runbin/whirr launch-cluster --config recipes/zookeeper-ec2.properties --private-key-file ~/.ssh/id_rsa_whirr
![Page 10: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/10.jpg)
Whirr limitations
● No EBS● All or nothing● Generates configuration artifacts● Takes over your computer, no more local
development - uses proxy● Hard to customize
![Page 11: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/11.jpg)
Amazon EMR
![Page 12: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/12.jpg)
EMR limitations
● No choice of image● Fixed architecture● Hard to debug● Hard to customize
![Page 13: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/13.jpg)
You do it
Repeat the manual procedure, only automate it PrepareAMI, Java, Hadoop On-the-flyStart AMI, login, configure, start services, verify, run test jobs
![Page 14: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/14.jpg)
You do it - advanced
On startup Under-provision, over-provision, progress On-the-fly Monitor, run test jobs, watch for cluster deterioration
![Page 15: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/15.jpg)
Cloudera Manager
![Page 16: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/16.jpg)
MapR Manager
![Page 17: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/17.jpg)
On the large scale
Hadoop 0.20 - up to 4,000 nodesHadoop 0.23 - up to 20,000GridGain - 100's of 1,000's
![Page 18: Automated Hadoop Cluster Construction on EC2](https://reader034.fdocuments.in/reader034/viewer/2022052321/556263b9d8b42a14048b4dd8/html5/thumbnails/18.jpg)
Thank you
Questions?