Big Data in the Cloud: An example using Ansible, R, RHadoop, and AppScale to deploy a big data...

16
Big Data in the Clouds An example using Ansible, R, RHadoop, and AppScale to deploy a big data environment on AWS/Eucalyptus

Transcript of Big Data in the Cloud: An example using Ansible, R, RHadoop, and AppScale to deploy a big data...

Big Data in the Clouds

An example using Ansible, R, RHadoop, and AppScale to deploy a big data environment on AWS/Eucalyptus

Big Data Environment

●Why? Why R? Why AppScale? Why AWS/Eucalyptus?●Environments needing to process “big data” are in high-demand●Flexibility in deploying big data environments - AWS has Elastic MapReduce; Eucalyptus has ?

Goals

●Deploy open source big data environment on IaaS●Same deployment method can be used on both public and private IaaS (hybrid?)

The Architecture

Ansible

●http://www.ansibleworks.com/●Open Source Configuration Management using SSH●Flexible, powerful, efficient, secure●http://ansible.cc/docs/

R and RHadoop

●http://www.r-project.org/● open source statistics

software; very flexible, and powerful

●http://www.revolutionanalytics.com/

● Provides enterprise analytics software using R

●https://github.com/RevolutionAnalytics/RHadoop/wiki

AppScale

●http://www.appscale.com●PaaS that implements Google App Engine APIs on different public/private IaaS, and virtual environments.●http://www.slideshare.net/shatteredNirvana/intro-to-app-engine-and-appscale●Ships with Cloudera for back-end support of Google App Engine MapReduce API implementation

AWS EC2/Eucalyptus

●http://aws.amazon.com●Cloud API that has pretty much become a standard●http://www.eucalyptus.com●Closely follows AWS APIs for EC2, S3, IAM (soon ELB, CloudWatch, and AutoScaling)

Deployment

AWS/Eucalyptus●Account/User Credentials

● EC2_ACCESS_KEY● EC2_SECRET_KEY● EC2_URL

●IAM policy for EC2 policies to launch instances, create security groups, authorize ports, image management (bundle, upload, and register)

AppScale● Pre-built AppScale Images

● AWS - ami-4e472227● Eucalyptus - AppScale

image found @ http://emis-catalog.s3.amazonaws.com/index.html

● appscale-tools - https://github.com/AppScale/appscale-tools

● appscale init cloud● edit AppScaleFile● appscale up

●Test deployment using wordcount program written in R - wordcount.R●SSH into head node, pull out wordcount.R file - tar zxf rmr2_2.0.2.tar.gz rmr2/tests/wordcount.R●Execute it - Rscript rmr2/tests/wordcount.R

Test - Wordcount.R

Results

Contact InfoAppScale - [email protected]

Eucalyptus - [email protected]

Questions?Demo