Post on 20-Aug-2015
Amazon Elastic MapReduce
Peter Sirota
Amazon Elas+c MapReduce
! Enables customers to easily and cost-‐effec+vely process vast amounts of data.
! U+lizes a hosted Hadoop framework running on the web-‐scale infrastructure of Amazon.
! Launched in the US in April and EU in July of 2009
Amazon Elas+c MapReduce
! Large scale data processing has a lot of MUCK and we want to remove it for our customers
! Hard to manage compute clusters ! Hard to tune Hadoop ! Hadoop issues preven+ng smooth opera+on in the cloud
Amazon.com Confiden+al 3
Hadoop made simple and easy
Input S3 bucket
Output S3 bucket
Amazon S3
Hadoop
Amazon EC2 Instances
Input dataset
output results
Deploy Application
Web Console, Command line tools
End
Notify
Get Results Input Data
Amazon Elastic MapReduce
Hadoop Hadoop
Hadoop
Hadoop
Hadoop
Elastic MapReduce
Elastic MapReduce
Amazon Elastic MapReduce Benefits
Elastic Uses as many or as few EC2 instances as needed. Spin up large or small job flows in minutes.
Easy to use Get up and running quickly with easy-to-use web console, robust command line clients and sample jobs. No configuration necessary.
Reliable Fault tolerant service built on top of battle-tested AWS infrastructure. Automatically retries failed tasks.
Cost Effective We monitor progress of your jobs and turn off resources when job flow is done.
Problems customers solve with Elas+c MapReduce
! Data mining (Log processing, click stream analysis, similari+es, etc.)
! Bio-‐informa+cs (Genome analysis)
! Financial simula+on (Monte Carlo simula+on)
! File processing (resize jpegs) ! Web indexing
7 Amazon.com Confiden+al
Customer Feedback
! Pros: ! Amazon Elas+c MapReduce makes it easy to run Hadoop applica+ons.
! Reliable plaZorm for produc+on data-‐processing
! Challenges: ! Simple tasks such as log processing require fluency in MapReduce
! Hadoop applica+ons are difficult to develop
New Features
! Support for Apache Pig – August 2009 ! Batch and interac+ve mode
! Concurrent access to mul+ple file systems
! Loading resources from Amazon S3
! Addi+onal Piggybank func+ons ! Integra+on with Elas+c MapReduce Client and Web Console
New Features
! Support for Apache Hive 0.4 – Today ! Batch and interac+ve mode
! Integra+on with Elas+c MapReduce Client and Web Console
! Addi+ons to Hive • Load table par++ons automa+cally from Amazon S3
• Specify an off-‐instance metadata store
• Op+mized data writes to Amazon S3 • Reference resources on Amazon S3
Amazon Elas+c MapReduce Ecosystem
! Karmasphere Studio for Hadoop – NetBeans IDE for development, debugging, deployment and management of Hadoop jobs ! Deploy Hadoop jobs to Elas+c MapReduce
! Monitor progress of Elas+c MapReduce job flows ! Amazon S3 file browser ! Elas+c MapReduce HDFS browser
Amazon Elas+c MapReduce Ecosystem
! Support for Cloudera’s Hadoop distribu+on (private beta) ! Op+onally use Cloudera’s Hadoop while execu+ng Elas+c MapReduce job flows
! Get support from Cloudera for the Elas+c MapReduce job flows
Q&A