Cloud Computing

6

Click here to load reader

Transcript of Cloud Computing

Page 1: Cloud Computing

Introduction to Cloud Computing for Developers Key concepts, the players and their offerings

Eeraj Jan Qaisar Equity Capital Markets

Major Financial Institution New York, U.S.A.

[email protected]

Abstract—This paper provides a basic introduction to cloud computing for software developers and covers key concepts, major players and their offerings in the market today.

I. INTRODUCTION The term cloud computing conjures up images of a

mysterious computing paradigm accessible and understood by only a select group of experts. Barring a few outliers, the current mainstream developer does not feel the need or cannot really leverage the cloud simply because their current work does not allow them to learn about it.

This paper attempts to clear up the mist and introduces cloud computing in plain English terms, illustrating the benefits, impact, key players and gives ideas on how developers can learn and experiment with the key technologies associated with cloud computing.

II. TRADITIONAL IT / COMPUTING Before we delve into the ways of the cloud, let us quickly

cover the current traditional computing landscape. Traditional computing, also called on-premise computing is the way things have been done far. The term on-premise implies that the computing infrastructure is owned and managed by the organization that uses them.

On-premise computing is also typified by the need to plan upfront about storage and compute capacity well in advance or pay the price later by dealing with the complexity of physically acquiring and setup of the required servers, storage, databases and other pieces of compute requirements.

No matter how well you plan ahead, one of these two outcomes is certain with the traditional model: either constrained capacity, limited by the infrastructure you acquire or, excess capacity which means wasted capital that could have been deployed for other strategic initiatives.

Apart from high capital cost and high fixed cost, the complexity and effort associated with management, upgrade and patching required for internal infrastructure is not trivial and often a stumbling block that hampers the organization’s agility.

To put it simply, for most firms, the main objective and core competence is not to run and manage huge data centers

1and associated infrastructure. It would be nice to offload at least some of this work to another entity and instead focus on the core objectives of the organization.

Think about it: How long does it take you to acquire a server or new technology, build it and have it ready in production in your current infrastructure? What if you reach the capacity and you have to do it over again?

III. CLOUD COMPUTING In plain English, cloud computing can be defined as the

ability to use computing resources – applications, storage and processing power over the internet. These computing resources are hosted and managed by “someone else” (the cloud provider).

If you like long definitions: “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., networks, servers, storage, applications and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction.” [The National Institute of Standards and Technology 16th definition of cloud computing].

From a business context, cloud computing may mean new business models enabling new lines of business that either would not have been possible technologically without the cloud or the cost/time to implement it would have been prohibitive. An example of the former is Apple’s Siri, cloud-based natural-language intelligent assistant. For the latter, one just needs to look at the many start-ups that took an idea and grew rapidly without a massive upfront investment by leveraging the cloud to host their products. Pinterest is one such example. This online picture pinboard startup serves up about 17 million views a month and has massive storage capacity, all hosted on Amazon’s cloud platform (Amazon Web Services).

1 978-1-4673-1645-3/12/$31.00 ©2012 IEEE

Page 2: Cloud Computing

TABLE I. KEY ATTRIBUTES OF CLOUD COMPUTING Attribute Notes

Elastic

Elasticity refers to the ability to expand and reduce the computing power as needed.

Scalable Ability to handle increased demands for CPU, storage and bandwidth as needed. Note that traditional computing infrastructure can be scalable but is not elastic.

Shared Shared or multi-tenancy is another key attribute that refers to the fact that a given server or resource in the cloud can be used by more than one consumer (the tenant). Multi-tenancy may be undesirable in some cases and if needed, most of the major cloud providers the ability to get dedicated resources (i.e. not shared). Multi-tenancy is an important factor in making cloud computing economically viable for commercially hosted public cloud providers.

Metered A key benefit of the cloud computing model is the metered or pay as you go model (storage, processing, subscription, and bandwidth). You pay for the resources you consume. Elastic would not be as attractive if there was no ability to pay for the actual consumption. Contrast this with the traditional model where the price paid is generally fixed based on the server/storage you provision for your needs.

Economics shift

The economics shift from Capital Expenditure (CapEx) to Operational Expenditure (OpEx). This not only means less initial cash outlay, but could also result in tax advantages as OpEx is generally fully deductible in the year of expenditure.

Uses the internet

Needless to say, the various cloud based computing resources are accessible and delivered via the internet.

IV. CLOUD COMPUTING SERVICE MODELS Cloud computing service models or offerings can be

classified into 3 segments, IaaS (“closest to the metal”), to SaaS (“nirvana state, totally hands-off”). It is important to understand the three terms below.

• IaaS – Infrastructure as a Service. Includes servers, storage, virtual machines, load balancers and other core infrastructure stack.

• PaaS – Platform as a Service. Adds development and programming models to IaaS. Includes databases, execution frameworks/runtimes, web servers and development tools.

• SaaS – Software as a Service. Complete application offering in the cloud. Salesforce CRM, Google Apps/Gmail, Microsoft “Live”, Taleo and a lot more.

IaaS offers the most control with more involved management/provisioning of infrastructure resources. SaaS on the other end involves virtually no effort so far as infrastructure management is concerned. PaaS sits between these two so far as control and management is concerned.

Note: Distinction for cloud computing models can also be made based on whether it is a private cloud, public cloud or a hybrid cloud. This paper will cover the major offerings in the public clouds in the market today. Hybrid clouds, as the name implies, refer to a model where a portion of the application is hosted in the public cloud and the rest residing in the

organization’s own data center. All of the public cloud offerings discussed in this paper allow for creation of a hybrid model as well.

V. IMPACT ON DEVELOPERS Opinions on the impact that cloud computing will have on

developers range from dire predictions about a decimated job market to unbridled optimism proclaiming thousands of new jobs that will magically materialize.

“Cloud computing will destroy jobs” Information age article.

“Cloud Computing’s Role in Job Creation” – IDC study sponsored by Microsoft, claims tens of thousands of jobs will be created.

“Cloud’s Impact on IT Jobs – Evolution, not wholesale elimination” – ItbusinessEdge.com

As usual, the truth is somewhere in the middle. What is certain is that new roles will evolve and now is the time to prepare. The good news is that there are not many cloud experts out there anyway, so there is time to ramp up.

Your core programming skills, including Java and C# still apply. In addition, some other areas are suggested that one can look into to broaden the skill set.

• Python, PHP

• Disconnected systems and messaging

• REST, HTTP, NoSQL, Caching

• Integration, workflow, resource usage, data transfer

• Architecture and design becomes even more important

• Think of yourself as an integrator of services

• Understand vendor SLAs, hosting strategies, and security.

One decision that may be crucial for developers is to choose a cloud platform to learn and develop skills for. Given the variety of providers, it can be a confusing proposition at times. I suggest a simple solution – pick one of the major players from these three – Amazon (AWS), Microsoft (Windows Azure) or Salesforce (FORCE.COM) and start with that platform. The design and architecture skills you gain with one cloud platform will be useful with others as well.

VI. KEY PLAYERS AND THEIR OFFERINGS IN CLOUD COMPUTING

Amazon - Amazon Web Services (AWS). By some measures, the biggest IaaS player offering complete IT infrastructure in the cloud.

Microsoft - Both PaaS and SaaS offerings. Windows Azure and Office Live.

Page 3: Cloud Computing

Salesforce - Encompasses both PaaS and SaaS models. PaaS products include Force.com and Heroku. Salesforce CRM is probably the largest SaaS installation globally.

Google - Google App Engine and Google Docs

Apple - iCloud

There are several other providers including Rackspace and Joyent as well as private cloud providers including HP and IBM. These providers are mentioned here only in case the reader in interested in researching for more information.

The rest of this paper will present an overview of the offerings from the first three providers - Amazon, Microsoft and Salesforce.

VII. AMAZON WEB SERVICES Amazon Web Services (AWS) is the most complete

publicly available offering in the IaaS space today. Starting with the Simple Storage Service (S3), a cloud storage service a few years back, AWS is the 800 pound IaaS gorilla. Some of the current core AWS components include EC2, SQS, SimpleDB, HPC (High Performance Computing), Big Data, Caching service and several others.

As a developer, it is easy to get started learning about AWS. Currently AWS offers a free usage tier for one year that includes 750 usage hours of a micro EC2 instance and 15GB outbound data transfer. Simply sign up at http://aws.amazon.com and you are ready to go.

The basic building blocks of the AWS platform are covered below.

A. Simple Storage Service (S3) As the name implies, this is a storage service hosted on

AWS platform. It uses the Key/Value pair mechanism to identity stored data.

Some of the key features include:

• Write, read, and delete objects containing from 1 byte to 5 terabytes in size. Unlimited objects can be stored.

• Each object is stored in a bucket and retrieved via a unique, developer-assigned key.

• Authentication mechanisms exist, assign rights, etc.

• Designed to provide 99.999999999% durability and 99.99% availability of objects over a given year.

• HTTP and BitTorrent support

• Cost to use: Storage + Request + Data Transfer Out

Using S3 is quite straightforward:

//Using Amazon SDK for .NET

//storing Data

PutObjectRequest request = new PutObjectRequest();

request.WithBucketName(BUCKET_NAME);

request.WithKey(S3_KEY);

request.WithContentBody("This is body of S3 object.");

//request.WithFilePath(pathToFile);

client.PutObject(request);

//Read Data

GetObjectRequest request = new GetObjectRequest();

request.WithBucketName(BUCKET_NAME);

request.WithKey(S3_KEY);

GetObjectResponse response = client.GetObject(request);

StreamReader reader = new StreamReader(response.ResponseStream);

String content = reader.ReadToEnd();

The stored objects are also HTTP addressable:

http://mybucket.s3.amazonaws.com/homepage.html

While this code sample used C#/.NET, S3 can be used from Java, Python and other languages as well.

B. Elastic Compute Cloud (EC2) EC2 provides a virtual computing environment (virtual

machines) in the cloud. There are quite a few options for provisioning EC2:

On-demand instances: Provision the virtual machine when needed, terminate when done.

Reserved Instances: Capacity reservation to ensure that the required number of instances can be launched when needed. On-demand instances do not necessarily provide assurance that required numbers of instances will be available when needed.

Spot Instances: This is an interesting twist and introduces an eBay like model where you bid the price you are willing to pay for an EC2 instance. The instance becomes available when the pricing meets your threshold. This can be a good model for processes that can tolerate machines being taken down without any prior warning.

A few other key points about EC2:

• Computing power ranges from Micro Instance (613 MB RAM1 or 2 ECU) to Cluster Compute Eight Extra Large (60.5 GB RAM, 88 ECUnits, 3370 GB of local instance storage, 64-bit platform, 10 Gigabit Ethernet) and Cluster GPU Instances.

• You can choose pre-configured, templated image or create an Amazon Machine Image (AMI) containing your applications, libraries, data, and associated configuration settings.

Page 4: Cloud Computing

• Ability to select instance type(s) and operating system desired.

• Start, terminate, and monitor as many instances as needed, using the web service APIs or management console.

• Options: run in multiple locations, static IP endpoints, persistent block storage.

• Pay by instance-hours or data transfer.

Sample Costs: Small Instance $0.080 /hr (Linux), $0.115/hr (Windows), Eight Extra Large $2.400/hr (Linux) to $2.970/ hr (Windows)

Starting an EC2 instance using boto, Python library for AWS

from boto.ec2.connection import EC2Connection

AWS_ACCESS_KEY_ID = 'yourkey'

AWS_SECRET_ACCESS_KEY = 'yoursecret'

Conn = EC2Connection(AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY)

reservation = conn.run_instances('ami-5647a33f',

instance_type='m1.small',

key_name='mykey')

instance = reservation.instances[0]

while not instance.update() == 'running':

time.sleep(5)

Python boto library:

http://boto.cloudhackers.com

http://groups.google.com/group/boto-users

As the above example illustrates, AWS APIs are assessable from a variety of programming tools and languages.

C. Simple Queue Service (SQS) SQS is a hosted queue for storing messages as they travel

between applications and can be used to move data between distributed components asynchronously. It can store up to 64kb of text per message with 14 day message retention in queues.

SQS APIs are straightforward to use:

CreateQueue, DeleteQueue, SendMessage, DeleteMessage. Batch operations are supported.

1. Connect to AWS

AmazonSQS sqs = new AmazonSQSClient(new BasicAWSCredentials(AWS_KEY, AWS_SECRET));

2. Obtain a handle to your queue

String url = sqs.createQueue(new CreateQueueRequest(“my_queue")).getQueueUrl();

3. Store a message in the queue

sqs.sendMessage(new SendMessageRequest(url, “here is my message"));

VIII. MICROSOFT WINDOWS AZURE Microsoft’s Windows Azure is an IaaS and PaaS offering.

While primarily targeted at Windows based applications and technologies (.NET), it also supports PHP, Java and other non-Microsoft platforms.

Services offered include Compute, Storage, Service Bus, HPC (High Performance Computing), SQL Azure, CDN (Content Distribution) and Caching.

A. Windows Azure Compute Provides the processing power and is divided into

containers called roles. Each role is designed and optimized for a different computing scenario:

• Web Role – dedicated IIS resource for front end web applications.

• Worker Role – background processes, long running operations.

• VM Role – Custom Windows 2008 image.

When designing Windows Azure based applications, separation of web and worker roles is critical and determines scalability. You would not want to host a long running process in a web role container as it would likely bog down the resources required for IIS web server.

B. Windows Azure Service Bus Like Amazon’s SQS service, Windows Azure Service Bus

is a messaging and relay service that facilitates creation of disconnected/asynchronous applications. Based on pub/sub mechanism, it supports multiple transport protocols.

Service bus provides options for communication options with Windows Communication Foundation as well as REST based endpoints.

C. SQL Azure SQL Azure is a relational database hosted on Windows

Azure. While based on SQL Server technologies, its feature set and capabilities are different than SQL Server itself and so, careful thought must be given to the design and architecture of any application that involves using SQL Azure.

SQL Azure’s current database size limitation of 150GB may give pause for certain scenarios. Horizontal partitioning or sharding can be used to overcome the size limitation in many cases. This technique involves spreading the data across multiple databases (shards). The SQL Azure engine is intelligent enough take care of querying the correct data portions to give the right results.

An interesting capability of SQL Azure is the SQL Azure Data Sync service that allows one way or bi-directional data synchronization between SQL Azure database instance and an on-premise SQL Server instance.

Page 5: Cloud Computing

D. Caching In general, caching is used to improve application

performance with rapid access to frequently used data. Retrieving this data from the cache, instead of fetching it from storage or a database, results in greatly improved performance.

The caching service on Windows Azure is a distributed, in-memory, application cache service. Retrieving data from memory is much faster than any disk based access. Up to 4GB of cache size is supported on the Windows Azure platform.

For downloading full SDKs as well as information on a free trial for Windows Azure refer to http://www.windowsazure.com/en-us/develop/downloads/

IX. SALESFORCE’S CLOUD OFFERINGS Salesforce is the de-facto CRM (Customer Relationship

Management) software provider that is used virtually by all major corporations. This CRM software is provided as a SaaS, i.e. a fully hosted solution in the cloud.

While the general perception of Salesforce is that of a SaaS CRM provider, it often goes missing that this provider is also a major player in the PaaS space. One aspect that differentiates Salesforce’s PaaS offering from other vendors is the use of subscription (i.e. per user) based pricing model as opposed to resource consumption based model.

FORCE.COM is Salesforce’s PaaS product that allows creation of enterprise business applications using Salesforce’s tools and technologies. APEX, a strongly typed programming language that Salesforce created for execution on the FORCE.COM platform is quite similar to Java and C# and can be easily learnt by someone with a programming background.

FORCE.COM differs from other PaaS offerings in the sense that it is designed from the ground up for the cloud and offers declarative constructs around business scenarios like Workflow and Approvals that would otherwise require complex custom software code. The platform also offers support for Formulas that can be used as an expression language.

For creating user interfaces for the cloud applications hosted on FORCE.COM, Salesforce offers Visualforce, a component-based user interface framework. It uses a tag-based markup language, similar to ASP.NET tags. There are several pre-built components as well as ability to create own custom components.

Visualforce employs the model-view-controller (MVC) model and has the capacility for automated generation of controllers for use with Salesforce’s FORCE.COM database.

Another interesting PaaS offering from Salesforce that is worth exploring is Heroku that is designed to host applications based on Ruby in the cloud. Readers further interested in it may refer to http://www.heroku.com/

X. OTHER CLOUD PROVIDERS While there are several other vendors providing some

incarnation of “cloud” today, two others – Google and Apple will be covered briefly.

A. Google Docs and Google App Engine Google Docs, a SaaS offering is Google’s answer to

Microsoft Office and provides office productivity applications in the cloud. Apart from signing up, no programming expertise is required to use Google Docs. Despite its simplicity, Google Docs allows features like real-time collaboration on documents. Refer to https://docs.google.com/ for in-depth information.

Google App Engine is a PaaS platform supporting Java, Python and Go programming languages to create applications that can run on Google’s infrastructure. Applications are developed locally on your machine using App Engine SDK that emulates everything on the development machine. After development, the application is uploaded to Google App Engine infrastructure App Engine SDK where it can be accessed via the internet. For storage, App Engine offers both NoSQL (App Engine DataStore) and Cloud SQL (RDBMS) models.

Getting started is easy and a free account is offered at: https://developers.google.com/appengine/

B. Apple iCloud iCloud, introduced a few months back, works in

conjunction with Apple devices, the iPhone and iPad to store and synchronize documents, music, photos and other data.

Using iCloud Storage APIs, custom applications can be created to store documents and key value data in iCloud. iCloud wirelessly pushes documents to a user's device automatically.

A very interesting and powerful feature is iCloud’s ability to synchronize all content automatically to all of the user’s devices, when changes are made on any one device. An intriguing model indeed, waiting for use by creative developers.

Refer to http://developer.apple.com/icloud on how to start.

XI. THE CLOUD – SOME CLOSING THOUGHTS As with any new technology or process, separating the

ground reality from hype should be an exercise that needs to be performed for any serious cloud initiative.

To start with, designing and architecting applications for the cloud is not an exercise to be taken lightly. The role of the application architect becomes even more important. Experience with disconnected applications, asynchronous processes, deciding how to distribute application across different data centers to cope with provider’s data center outage, ability to deal with non-deterministic nature of resources available for multi-tenant hosts are all key skills needed to architect a successful cloud application.

Understanding the pricing model for IaaS and PaaS model can be a challenge too. On surface, these models are simple,

Page 6: Cloud Computing

but if care is not exercised, one could end up paying enormous amounts of fees because one developer did not take care to shut down the EC2 instance when it was no longer needed or someone did not think about the bandwidth charges when transferring mammoth amounts of data in/out of the cloud application.

Needless to say, the network can make or break the cloud. For an organization relying completely on a SaaS product, for example, a network outage could put it out of business. Hopefully, for most businesses, a reliable and fast network is already a must-have criterion in order to be viable.

Provider and data lock-in are two areas that need to be evaluated carefully. Does your cloud provider provide an easy way to get the data out of the cloud? What if you are unhappy with the services provided and want to move a complete cloud application from one provider to another? Also related to these two concerns is the cloud provider’s long term staying power. What happens if they close shop?

For any business dealing with customer or other sensitive data, data protection is a major concern. How secure is your data in the cloud? Can your cloud provider provide an independent audit confirming that it meets the standards for data protection?

Cloud outages have happened and will happen. Applications need to be designed for failure.

The march to the cloud may be bumpy but is inexorable. Let us get ready for the brave new world of the cloud.