Instance Types - University of Florida. Amazon Web Services... · Instance Types •Standard...

Post on 28-Apr-2018

215 views 1 download

Transcript of Instance Types - University of Florida. Amazon Web Services... · Instance Types •Standard...

Instance Types

• Standard Instances:

– Small: 1.7GBmem, 1EC2Compute Unit (EC2CU), 160GB local instance storage(lis), 32/64bits.

– Medium: 3.75 GBmem, 2EC2CU, 410GBlis, 32/64bits.

– Large: 7.5GBmem, 4EC2CU, 850GBlis, 64bits

– Extra Large: 15GBmem, 8EC2CU, 1690GBlis, 64bits.

• Micro Instances: 613MBmem, 2ECUs, EBS

• High-Memory Instances: 17.1, 34.2, 68.4GBs.

• High-CPU Instances (5EC2CU or 20EC2CU)

• Cluster GPU Instances (22GBmem, 33.5EC2CU, 2xNVIDIA Tesla “Fermi” M2050 GPUs, 1690GBlis, 10GEthernet. 21

1EC2CU: equivalent of 1.0-1.2GHz

2007 AMD Opteron or 2007 Intel Xeon

processor

Instance vs. VM

• Instance = VM + hardware (instance type)

• AMI (Amazon Machine Image) = VM “image”

• VM “image”= OS + software

• Users specify the type of VM and hardware (i.e., instance type) when setting up an instance

22

OS and Software

• Amazon Machine Images (AMIs) are preconfigured with an ever-

growing list of operating systems (win2008OS including in price!!)

23

24

Pricing: On-Demand Instance

Data Transfer Charge

chow 25

AWS’s Free Usage Tier

26

Amazon S3 (Simple Storage Service) Basics

• Data stored as objects (files) in buckets

– “key” to file is path

– identified by <bucket> + <path>

– No real directories, just path segments

• Great as persistent storage for data

– Reliable – up to 99.999999999%

– Scalable – up to petabytes of data

– Fast – highly parallel requests

S3 Access

• Via your web browser

• Various command line tools

– s3cmd

• Or via HTTP REST interface

– Create (PUT/POST), Read (GET), Delete (DELETE)

S3 Limitations

• Can’t be modified (no random write or append)

• Max size of 5TB (5GB per upload request)

S3 Pricing

• Varies by region

• Data in is (currently) free

– Data out is also free within same region

– Otherwise starts at $0.12/GB

• Storage cost is per GB-month

– Starts at $0.140/GB, drops w/volume

S3 Access Control List (ACL)

• Read/Write permissions on per-bucket basis

– Read == listing objects in bucket

– Write == create/overwrite/delete objects in bucket

• Read/Write permissions on per-object (file) basis

– Read == read object data & metadata

S3

• Amazon web services S3 API support the ability to: – Find buckets and objects (jar file, data file,

etc.)

– Discover their meta data

– Create new buckets

– Upload new objects

– Delete existing buckets and objects

– Distcp/s3distcp from S3 to HDFS for computation

Amazon EMR

• A web service that allow cost-effective large data processing

• Hadoop (HDFS + Map-Reduce) over EC2 and S3

• EMR is mostly used for data intensive tasks

– Examples: web indexing, data mining, log analysis, data warehousing, machine learning, financial analysis, scientific simulation, bioinformatics

33

Apache Hadoop Stack for Data analytics

34

HDFS

HBase

Pig, Hive, Mahout

Map Reduce

Sqoop Flume

Resource

Management & Workflow

Yarn

Zookeeper

Why Use Elastic MapReduce?

• Reduce hardware & IT personnel costs

– Pay for what you actually use

– Don’t pay for people you don’t need

– Don’t pay for capacity you don’t need

• More agility, less wait time for hardware

– Don’t waste time buying/racking/configuring servers

– Many server classes to choose from (micro to massive)

• Less time doing Hadoop deployment & version mgmt

– Optimized Hadoop is pre-installed

Amazon Mechanical Turk

• A web service that exposes an on-demand global workforce ready to complete small tasks in exchange for micro-payments

• Frictionless. Outsourcing per-se is irrelevant.

• A web services API

• Examples?

Identify Road Markings

How It Works

38

www.mturk.com

Workers

Artificial, Artificially

Intelligent Software

Requester (Developer)

Human Intelligence Tasks (HITs)

Completed HITs

Worker Qualifications

Example Application: Podcast transcription service provider, which transcribes audio into high-quality text

• Amazon Simple Storage: Stores the podcasts and related files

• Amazon Mechanical Turk + EMR: voice recognition algorithms transcribe podcasts

• Amazon EMR: index text within search engine

Homework (Last Friday)

• Setup AWS account

• Watch Video on AWS EMR

– Getting Started (11:04)

– Signing up for an AWS account, generating a key-pair, and setting up an S3 bucket. Running Jobs (14:47)

– Creating, monitoring, and getting results from you EMR Job Flow. Clusters of Servers (10:50)

– EC2 instance types, pricing, and Hadoop cluster configuration. Dealing with Data (18:54)

– S3 architectures, pricing, and access control.

41

Homework (Cont.)

• AWS Hands-on Lab 0

• Follow the instructions from the tutorial and repeat the tasks including: create an account, working with S3, create cluster, and run a job, setup instances.

• Compile jar file using the source code posted on course website

42

Summary

• Cloud Computing

• AWS

• EC2 and S3

• EMR and AMT

• Hands-on Lab 0 – warming up

43