Cloud architecture
-
Upload
mahmoud-moussa -
Category
Software
-
view
199 -
download
2
Transcript of Cloud architecture
Cloud Architecture Basic PatternsMahmoud Moussa
MEA Technology Evangelist : Azure
Agenda
• Problems Areas in the Cloud
Problem Area in the Cloud
Problem Areas in the Cloud
Defines the proportion of time that the system up and running. It will be affected
by system errors, infrastructure problems, malicious attacks, and system load. It is
usually measured as a percentage of uptime. Cloud applications typically provide
users with a service level agreement (SLA), which means that applications must be
designed and implemented in a way that maximizes availability
the key element of cloud applications, and influences most of the quality attributes.
Data is typically hosted in different locations and across multiple servers for
reasons such as performance, scalability or availability, and this can present a range
of challenges. For example, data consistency must be maintained, and data will
typically need to be synchronized across different locations.
Problem Areas in the Cloud
consistency and coherence in component design and deployment, maintainability
to simplify administration and development, and reusability to allow components
and subsystems to be used in other applications and in other scenarios. Decisions
made during the design and implementation phase have a huge impact on the
quality and the total cost of ownership of cloud hosted applications and services
The distributed nature of cloud applications requires a messaging infrastructure
that connects the components and services, ideally in a loosely coupled manner in
order to maximize scalability. Asynchronous messaging is widely used, and
provides many benefits, but also brings challenges such as the ordering of
messages, poison message management, idempotency, and more.
Problem Areas in the Cloud
This can make management and monitoring more difficult than an on-premises
deployment. Applications must expose runtime information that administrators and
operators can use to manage and monitor the system, as well as supporting
changing business requirements and customization without requiring the
application to be stopped or redeployed.
Cloud applications typically encounter variable workloads and peaks in activity.
Predicting these, especially in a multi-tenant scenario, is almost impossible. Instead,
applications should be able to scale out within limits to meet peaks in demand, and
scale in when demand decreases. Scalability concerns not just compute instances,
but other elements such as data storage, messaging infrastructure, and more
Problem Areas in the Cloud
the capability of a system to prevent malicious or accidental actions outside of the
designed usage, and to prevent disclosure or loss of information. Cloud
applications are exposed on the Internet outside trusted on-premises boundaries,
are often open to the public, and may serve untrusted users.
Resiliency is the ability of a system to gracefully handle and recover from failures.
The nature of cloud hosting, where applications are often multi-tenant, use shared
platform services, compete for resources and bandwidth, communicate over the
Internet, and run on commodity hardware means there is an increased likelihood
that both transient and more permanent faults will arise. Detecting failures, and
recovering quickly and efficiently, is necessary to maintain resiliency.
Cloud Design PatternsSolve Problems the right way
Design PatternsQueue-based Load Leveling Pattern
Queue-based Load Leveling Pattern
A Microsoft Azure web role stores
data by using a separate storage
service. If a large number of
instances of the web role run
concurrently, it is possible that the
storage service could be
overwhelmed and be unable to
respond to requests quickly
enough to prevent these requests
from timing out or failing
Queue-based Load Leveling Pattern
• Many solutions in the cloud involve running tasks that invoke services. In this environment, if a service is
subjected to intermittent heavy loads, it can cause performance or reliability issues
• A service could be a component that is part of the same solution as the tasks that utilize it, or it could be a
third-party service providing access to frequently used resources such as a cache or a storage service. If
the same service is utilized by a number of tasks running concurrently, it can be difficult to predict the
volume of requests to which the service might be subjected at any given point in time.
• It is possible that a service might experience peaks in demand that cause it to become overloaded and
unable to respond to requests in a timely manner. Flooding a service with a large number of concurrent
requests may also result in the service failing if it is unable to handle the contention that these requests
could cause.
Queue-based Load Leveling Pattern
Refactor the solution and introduce a queue between the task and the service. The task and the service run
asynchronously. The task posts a message containing the data required by the service to a queue. The queue
acts as a buffer, storing the message until it is retrieved by the service. The service retrieves the messages from
the queue and processes them. Requests from a number of tasks, which can be generated at a highly variable
rate, can be passed to the service through the same message queue. Figure 1 shows this structure.
Design PatternsRetry Pattern
Retry Pattern
An application that communicates with elements running in the cloud must be sensitive to the transient faults
that can occur in this environment. Such faults include the momentary loss of network connectivity to
components and services, the temporary unavailability of a service, or timeouts that arise when a service is
busy.
These faults are typically self-correcting, and if the action that triggered a fault is repeated after a suitable
delay it is likely to be successful. For example, a database service that is processing a large number of
concurrent requests may implement a throttling strategy that temporarily rejects any further requests until its
workload has eased. An application attempting to access the database may fail to connect, but if it tries again
after a suitable delay it may succeed.
Retry Pattern
In the cloud, transient faults are not uncommon and an application should be designed to handle them
elegantly and transparently, minimizing the effects that such faults might have on the business tasks that the
application is performing.
• If an application detects a failure when it attempts to send a request to a remote service, it can handle
the failure by using the following strategies:
• If the fault indicates that the failure is not transient or is unlikely to be successful if repeated (for
example, an authentication failure caused by providing invalid credentials is unlikely to succeed no
matter how many times it is attempted), the application should abort the operation and report a
suitable exception.
• If the specific fault reported is unusual or rare, it may have been caused by freak circumstances such
as a network packet becoming corrupted while it was being transmitted. In this case, the application
could retry the failing request again immediately because the same failure is unlikely to be repeated
and the request will probably be successful.
• If the fault is caused by one of the more commonplace connectivity or “busy” failures, the network or
service may require a short period while the connectivity issues are rectified or the backlog of work is
cleared. The application should wait for a suitable time before retrying the request.
Retry Pattern
Retry Pattern
Code Sample
Design PatternsStatic Content Hosting Pattern
Static Content Hosting Pattern
Web applications typically include some elements of static content. This static content may include HTML
pages and other resources such as images and documents that are available to the client, either as part of an
HTML page (such as inline images, style sheets, and client-side JavaScript files) or as separate downloads (such
as PDF documents).
Although web servers are well tuned to optimize requests through efficient dynamic page code execution and
output caching, they must still handle requests to download static content. This absorbs processing cycles that
could often be put to better use.
Static Content Hosting Pattern
In most cloud hosting environments it is possible to minimize the requirement for compute instances (for
example, to use a smaller instance or fewer instances), by locating some of an application’s resources and
static pages in a storage service. The cost for cloud-hosted storage is typically much less than for compute
instances.
When hosting some parts of an application in a storage service, the main considerations are related to
deployment of the application and to securing resources that are not intended to be available to anonymous
users
Design PatternsA Picture Can Say a Million Words
Questions