Dependable Cloud Architecture - Cloud Develop Edition

Image: xkcd.com

Dependable Cloud Architecture

@mikewo

Mike Wood

http://mvwood.com

“Failure is alwaysan option.”

Image: Discovery Channel, Fair Use

Protection From:

What are we looking for?

Check out: http://bit.ly/wazbizcontImages: Office ClipArt & Godzilla Releasing Corp (Fair Use)

Hardware Failure Data Corruption Network Failure Loss of Facilities

Image: FOX, Fair Use

Human Error

What we’re trying to achieve

1. Monitoring2. Resilient Solutions

Image: Office ClipArt

Cost vs Risk

99.999% $1, … ,000.00

To get more 9’s here add more 0’s here.

Image: NASA

Monitoring

Functional Transparency

Image: Office ClipArt

Logging Messages

Hardware Health

Dependent Services Health

Telemetry

Image: NASA

Analyze your Data

Resilience

Remember: Failure is always an option.

Common Points of Failure• Machine\application crashes• Throttling (exceeding capacity)• Connectivity\Network• External service dependencies

Focus less on the uptime of hardware and more about how the solution handles it WHEN

something fails!

Try/catch != Resilient

Image: Michael Wood

Decompose your system…

Request bufferingRetry Policies

• Wait and try again• Queue until available

Queuing Enables• Asynchronous workloads• Temporal Decoupling• Load Levelling

Check out: http://bit.ly/wazrequestbuffer

Capacity BufferingContent Delivery Networks (CDN’s)

Distributed Application Cache

Local Content Cache

Enables recovery during outages or

spikes in load

Dynamic Addressing & Configuration

Dept. of Redundancy Dept.

Have a backup, somewhere elseMore than one? Cost to benefit Ratio?

Ready StateHot = full capacityWarm = scaled down, but ready to growCold = mothballed, starts from zero

Image: Mr. White

Redundancy - Its about probability

95% uptime 95% uptime 95% uptime 95% uptime

1 box : 5% downtime or 438hrs per year

2 boxes : 5/100 * 5/100 = 25/10,000 = 0.25% downtime or 22hrs per year

4 boxes : 5/100 * 5/100 * 5/100 * 5/100 = 625/100,000,0000.000625% downtime or 3.285 MINUTES per year

(that’s 18 ½ days!)

Always carry a spare75% Capacity, half of our load 75% Capacity, half of our load

50% more capacity then needed• Can absorb of temporary spikes• Time to react if need to add capacity

100% of load, 150% Capacity0% Capacity, redirect all load

Over allocated, but still functioning• Degrade, but don’t fail

SYSTEM FAILURE!!!

Accessible vs. Available

Image: Twitter, Fair Use

Availability via Degradation

Image: Michael Wood

Total Outage duration =

Time to Detect+ Time to Diagnose+ Time to Decide+ Time to ActImage: Office ClipArt

Images: Gizmodo

Virtualization and Automation

Images: Orion Pictures owns Terminator Franchise

The “HI” Point

Check out: http://bit.ly/wazinternals

Image: NASA

“Don't be too proud of this technological terror you've constructed…”

ADMIT:• Your Solution WILL fail at some point• You can learn from others just as

well as yourself

DO:• Root cause analysis• Read other root cause analysis

DON’T:• Get cocky• Stick your head in the sand

Questions

@mikewo

Mike Wood

http://mvwood.com

http://bit.ly/CloudFailSafe

Dependable Cloud Architecture - Cloud Develop Edition

Technology

Transcript of Dependable Cloud Architecture - Cloud Develop Edition

00 DEPSKY: Dependable and Secure Storage in a Cloud …bessani/publications/tos13-depsky.pdf · cloud services, allowing data to be ... Dependable and Secure Storage in a Cloud-of-Clouds

Configure and Use MATLAB in the Cloud to Develop, Scale and … · Using MATLAB reference architecture to develop analytics on cloud stored data Use cases in the cloud: 1. Data analytics

USER MANUAL: How to develop a cloud strategy and ... · PDF fileUSER MANUAL: How to develop a cloud strategy and technology roadmap for your organisation 4 Introduction Cloud computing

DEPSKY: Dependable and Secure Storage in a …bessani/publications/eurosys11-depsky.pdfDEPSKY: Dependable and Secure Storage in a Cloud-of-Clouds Alysson Bessani Miguel Correia Bruno

Chapter 10 – Dependable systems Chapter 10 Dependable Systems1.

How To Develop Smart Android Notifications using Google Cloud … · 2013-09-08 · Develop Smart Android Notifications using Google Cloud Messaging Service | Tutorial Page 13 2.

Oracle develop in virtual box deploy to the cloud

Leading your Journey to the Cloud - SCGMIS Your Journey to t… · • Migrate/Develop–Move an existing application to the cloud or develop a new cloud native application • Iterate–measure

How to develop a multi-cloud strategy to accelerate …...How to develop a multi-cloud strategy to accelerate digital transformation Senaka Ariyasinghe Strategic Architect # MGT1763BU

SAP Cloud Inside: Develop and Run in the Cloud

Towards secure and dependable storage service in cloud

Software Fault Tolerance (SWFT) How to Design, Develop and Evaluate Robust SW and OS’s Dependable Embedded Systems & SW Group .

Oracle PLM Cloud: Innovate, Develop, Commercialize PLM Cloud: Innovate, Develop, Commercialize ... Optimized ROI Ideas ... Meet customers at the Modern Supply Chain Experience 2017

dependable, large capacity pi · dependable, large capacity pi • • • • • • • • • • • • • • • • • • • • • • • • • • • MATERIALS

IEEE TRANSACTIONS ON DEPENDABLE AND SECURE …1croreprojects.com/dotnetbasepaper/cloud-dotnet... · Abstract—Cloud storage auditing schemes for shared data refer to checking the

#GoCloudWebinar - Develop and Test faster in the Cloud

Develop & Deploy cloud-native apps as resilient Microservices Architectures

Dependable Cloud Comuting

Towards Secure and Dependable Storage Services in Cloud ...

How to Architect and Develop Cloud Native Applications