Instance Types - University of Florida. Amazon Web Services... · Instance Types •Standard...
Transcript of Instance Types - University of Florida. Amazon Web Services... · Instance Types •Standard...
Instance Types
• Standard Instances:
– Small: 1.7GBmem, 1EC2Compute Unit (EC2CU), 160GB local instance storage(lis), 32/64bits.
– Medium: 3.75 GBmem, 2EC2CU, 410GBlis, 32/64bits.
– Large: 7.5GBmem, 4EC2CU, 850GBlis, 64bits
– Extra Large: 15GBmem, 8EC2CU, 1690GBlis, 64bits.
• Micro Instances: 613MBmem, 2ECUs, EBS
• High-Memory Instances: 17.1, 34.2, 68.4GBs.
• High-CPU Instances (5EC2CU or 20EC2CU)
• Cluster GPU Instances (22GBmem, 33.5EC2CU, 2xNVIDIA Tesla “Fermi” M2050 GPUs, 1690GBlis, 10GEthernet. 21
1EC2CU: equivalent of 1.0-1.2GHz
2007 AMD Opteron or 2007 Intel Xeon
processor
Instance vs. VM
• Instance = VM + hardware (instance type)
• AMI (Amazon Machine Image) = VM “image”
• VM “image”= OS + software
• Users specify the type of VM and hardware (i.e., instance type) when setting up an instance
22
OS and Software
• Amazon Machine Images (AMIs) are preconfigured with an ever-
growing list of operating systems (win2008OS including in price!!)
23
24
Pricing: On-Demand Instance
Data Transfer Charge
chow 25
AWS’s Free Usage Tier
26
Amazon S3 (Simple Storage Service) Basics
• Data stored as objects (files) in buckets
– “key” to file is path
– identified by <bucket> + <path>
– No real directories, just path segments
• Great as persistent storage for data
– Reliable – up to 99.999999999%
– Scalable – up to petabytes of data
– Fast – highly parallel requests
S3 Access
• Via your web browser
• Various command line tools
– s3cmd
• Or via HTTP REST interface
– Create (PUT/POST), Read (GET), Delete (DELETE)
S3 Limitations
• Can’t be modified (no random write or append)
• Max size of 5TB (5GB per upload request)
S3 Pricing
• Varies by region
• Data in is (currently) free
– Data out is also free within same region
– Otherwise starts at $0.12/GB
• Storage cost is per GB-month
– Starts at $0.140/GB, drops w/volume
S3 Access Control List (ACL)
• Read/Write permissions on per-bucket basis
– Read == listing objects in bucket
– Write == create/overwrite/delete objects in bucket
• Read/Write permissions on per-object (file) basis
– Read == read object data & metadata
S3
• Amazon web services S3 API support the ability to: – Find buckets and objects (jar file, data file,
etc.)
– Discover their meta data
– Create new buckets
– Upload new objects
– Delete existing buckets and objects
– Distcp/s3distcp from S3 to HDFS for computation
Amazon EMR
• A web service that allow cost-effective large data processing
• Hadoop (HDFS + Map-Reduce) over EC2 and S3
• EMR is mostly used for data intensive tasks
– Examples: web indexing, data mining, log analysis, data warehousing, machine learning, financial analysis, scientific simulation, bioinformatics
33
Apache Hadoop Stack for Data analytics
34
HDFS
HBase
Pig, Hive, Mahout
Map Reduce
Sqoop Flume
Resource
Management & Workflow
Yarn
Zookeeper
Why Use Elastic MapReduce?
• Reduce hardware & IT personnel costs
– Pay for what you actually use
– Don’t pay for people you don’t need
– Don’t pay for capacity you don’t need
• More agility, less wait time for hardware
– Don’t waste time buying/racking/configuring servers
– Many server classes to choose from (micro to massive)
• Less time doing Hadoop deployment & version mgmt
– Optimized Hadoop is pre-installed
Amazon Mechanical Turk
• A web service that exposes an on-demand global workforce ready to complete small tasks in exchange for micro-payments
• Frictionless. Outsourcing per-se is irrelevant.
• A web services API
• Examples?
Identify Road Markings
How It Works
38
www.mturk.com
Workers
Artificial, Artificially
Intelligent Software
Requester (Developer)
Human Intelligence Tasks (HITs)
Completed HITs
Worker Qualifications
Example Application: Podcast transcription service provider, which transcribes audio into high-quality text
• Amazon Simple Storage: Stores the podcasts and related files
• Amazon Mechanical Turk + EMR: voice recognition algorithms transcribe podcasts
• Amazon EMR: index text within search engine
Learn More About AWS
• AWS: http://aws.amazon.com
• EC2 Resources:
– http://docs.amazonwebservices.com/AWSEC2/latest/UserGuide/
• Amazon EMR: http://aws.amazon.com/elasticmapreduce/
Homework (Last Friday)
• Setup AWS account
• Watch Video on AWS EMR
– Getting Started (11:04)
– Signing up for an AWS account, generating a key-pair, and setting up an S3 bucket. Running Jobs (14:47)
– Creating, monitoring, and getting results from you EMR Job Flow. Clusters of Servers (10:50)
– EC2 instance types, pricing, and Hadoop cluster configuration. Dealing with Data (18:54)
– S3 architectures, pricing, and access control.
41
Homework (Cont.)
• AWS Hands-on Lab 0
• Follow the instructions from the tutorial and repeat the tasks including: create an account, working with S3, create cluster, and run a job, setup instances.
• Compile jar file using the source code posted on course website
42
Summary
• Cloud Computing
• AWS
• EC2 and S3
• EMR and AMT
• Hands-on Lab 0 – warming up
43