Cloud computing

Cloud Computing Considerations

Malisa Ncube

Developer Evangelist – Microsoft

be.com @malisancube

[email protected]

mailto:[email protected]

When it comes to creating an app, the backend is one of the biggest hurdles in getting the app out.

OSNetwork

StorageScale

What does it take to run an app?

???

my app

Inspired by Steve Marxhttp://blog.smarx.com/posts/what-is-windows-azure-a-hand-drawn-video

OSNetwork

StorageScale


???

my app


my app

peak

unused

scale

What does it take to run an app?peak

my appsave$$$

scale

Scalability

• Measured by the number of users that the application can support effectively at the same time.

• Relates to hardware resources needed (CPU,Memory, Disk and network bandwidth.• Application logic runs on compute nodes and data on data nodes.• Vertical scaling is achieved by increasing resources within existing nodes. This is limited by

hardware.• Horizontal scaling is achieved by adding more nodes. It is more efficient with homogeneous

nodes. • A scale unit is a combination of resources that needs to be scaled together in horizontal

homogeneous nodes.• Resource contention limits scalability. • Scalability is business concern. Google noticed a 20% reduction in traffic after introducing

500ms to page response time. Amazon 100ms caused 1% decrease in revenue.

The cloud

• Gives (illusion) infinite resources and limited by capacity of individual virtual machines.• Enabled by short term resource rental model• Enabled by metered pay-for-use model. Usage costs are transparent.• Enabled by self-service, on-demand, programmatic provisioning and

releasing of resources, scaling is automatable.• Gives an ecosystem of managed platform services such as VMs, data

storage, networking, messaging and caching.• Gives a simplified application development model.

A cloud native application

• Lets the platform do the hard stuff by leveraging the application services.• Uses non-blocking asynchronous communication in a loosely coupled architecture.• Scales horizontally in an elastic mechanism.• Does not waste resources• Handles scaling events, node failures, transient failures without downtime or

performance degradation.• Uses geographic distribution to minimize network latency.• Upgrades without downtime.• Scales automatically using proactive and reactive actions.• Monitors and manages application logs as nodes come and go.

The cloud approaches

Horizontal scaling compute Pattern

• Horizontal scaling is reversible.• Supports scaling out and scaling in• Stateful nodes• They keep user session information• They have single point of failure

• Stateless nodes• Store session information externally from the nodes.

Queue-Centric Workflow Pattern

• Used in web applications to decouple communication between web-tier and service tier by focusing on the flow of commands.

• A service tier that is unreliable or slow can affect the web tier negatively.• All communication is asynchronous as message over a queue• The sender and receiver are loosely coupled. Neither one knows about the implementation

of the other.• There is some edge cases where the risk of invisibility windows occurs when processing

takes longer than allowed. • Idempotency concerns. Database transactions, compensating transaction.• Poison messages placed in dead letter queue.• QCW is not full CQRS as it does not articulate the read model.

Autoscaling Pattern

• Assumes horizontal scaling architecture• Concerns are cost optimization and scalability• Auto-scaling solutions enable scheduled (proactive and reactive) rules

that enable the provisioning of resources as needed.• Throttling by selectively enabling or disabling features or functionality

based on environmental signals.

Eventual Consistency

• Simultaneous requests for the same data may result in different values.• Leads to better performance and lower cost.• Uses Brewer’s CAP theorem (Consistency Availability and Partition

tolerance). 3 Guarantees and application an pick only 2.• Consistency. Everyone get the same answer.• Availability. Clients have ongoing access (even if there is a partial system failure)• Partition tolerance. Means correct operation even if some nodes are cut of from

the network.

• DNS updates and NoSQL are examples of eventually consistent services.

MapReduce Pattern

• Data processing approach for processing highly parallelized datasets.• Require a mapper and reducer functions. Accepting data and producing output with

subsets of data and output of the mapper aggregated and sent to the reducer.• Used to process documents, server logs, social graphs.• Hadoop implements MR as a batch processing system, optimized for large amounts

of data than response time.• Created by Google Inc.• Most effective to bring compute function to data• Commonly refered to as BigData.• Hadoop has abstractions on top that create functions e.g. (Mahout - ML, Hive – SQL

like, Pig – dataflow, Sqoop – RDBMs connector)

Database sharding/Federation Pattern• A database divided into several shards, where each database row

exists only on one shard.• Shards do not reference other shards.• Slave shard nodes a typically eventually consistent and readonly.• Programming model is simplified by maintaining a single logical

database with horizontal scaling.• Fan-Out queries used to make updates to dependent federation

members. Similar to Windows Azure SQL Data Sync and MapReduce.

Multitenancy and commodity Hardware • Multitenancy – multi companies using the system, usually a software

system with an illusion that they are the only tenant.• Multitenancy in the cloud are standard: DNS Services, Hardware for

VMs, Load balancers, Identity management among others.• Commonly used in SaaS environments where each tenant runs in a

secure sandbox (HyperV, RDBMS).• Perfomance managed by using quotas, running resource hungry

service with those less intensive.• Commodity hardware fails occasionally. Plan on it happening on your

compute nodes and plan on handling it.

Busy Signal Pattern

• Applies to services or resources accessed over a network where a signal response is busy.• These may include management, data services and more, and

periodic transient failure should be expected. E.g. Busy signal on telephones.• A good application should be able to handle retries and properly

handle failures.• On HTTP. Response 503 Service Unavailable.• Clearly identify Busy Signal and Errors and retry on Busy state after an

interval. Log them for further analysis of patterns.

Node Failure Pattern

• Concerns availability and graceful handling of unexpected application/hardware failures, reboots or node shutdown.• Application state should be in reliable storage, not on local disk or

individual node.• Avoid single point of failure by using the N+1 rule.• AWS & Azure send signals from nodes indicating shutdown and traffic is

routed to different tenants.• An approach would include having the UI code to retry on failures,

throttling some of the features while the recovery is taking place.• Azure runs in two fault domains

Network latency problem

• Network latency is a function of distance and bandwith• Consider Data Compression, Background processing, Predictive Fetching.• Move applications closer to users• Move application data closer to users• Ensure nodes within your application are closer together (Colocation)• WA uses Affinity Groups • Consider Valet/Key Pattern for public or temporary access. (Blob storage)

protected through hashing.• Consider Content Delivery Network (CDN) – global distributed cache

effective for frequently accessed content. Can be inconsistent.

Feedback, materials and contacts

Sign-up for your free 90-day trialhttp://aka.ms/azure4afrikatrial

Contacts on twitter

Bloghttp://malisancube.com

@malisancube

Cloud computing

Software

Transcript of Cloud computing