Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

62
Architecture Patterns for Building Cloud-Native Applications Align your application’s architecture with the architecture of the cloud… Or... going with the flow: using icebergs to max advantage Boston Code Camp 18 20-October-2012 (1:30 – 2:40) Boston Azure User Group http ://www.bostonazure.org @bostonazure Bill Wilder http://blog.codingoutlou d.com @codingoutloud HELLO my name is Bill Wilder

description

HELLO my name is. Architecture Patterns for Building Cloud-Native Applications Align your application’s architecture with the architecture of the cloud… O r... going with the flow: using icebergs to max advantage. Boston Code Camp 18 20-October-2012 (1:30 – 2:40). Bill Wilder. - PowerPoint PPT Presentation

Transcript of Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Page 1: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Architecture Patterns for Building Cloud-Native Applications

Align your application’s architecture with the architecture of the cloud…

Or... going with the flow: using icebergs to max advantage

Boston Code Camp 1820-October-2012

(1:30 – 2:40)

Boston Azure User Grouphttp://www.bostonazure.org@bostonazure

Bill Wilderhttp://blog.codingoutloud.com@codingoutloud

                                        

HELLOmy name isBill Wilder

Page 2: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

My name is Bill Wilder

HELLO

my name is

Bill [email protected]@codingoutloud

Page 3: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Who is Bill Wilder?

www.devpartners.com

www.bostonazure.org

www.cloudarchitecturepatterns.com

Page 4: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

I will ass-u-me…

1. You know what “the cloud” is2. You have an inkling about Amazon Web Services and

Windows Azure cloud platforms3. You understand that such cloud platforms include

compute services [like hosted virtual machines (VMs), in both IaaS and PaaS modes], SQL and NoSQL database services, file storage services, messaging, DNS, management, etc.

4. You are interested in understanding cloud-native applications

Page 5: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Roadmap for rest of talk… …

1. Give context and definition for cloud-native2. Cover three specific patterns for building

cloud-native applications3. Mention several other patterns

• Q&A during talk is okay (time permitting)• Q&A at end with any remaining time• Also feel free to join me for lunch to talk cloud

?

Page 6: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Cloud Platform Characteristics• Scaling – or “resource allocation” – is horizontal

– and ∞ (“illusion of infinite resources”)

• Resources are easily added or released– self-service portal or API; cloud scaling is automatable

• Pay only for currently allocated resources– costs are operational, granular, controllable, and transparent

• Optimized for cost-efficiency– cloud services are MT, hardware is commodity– MTTR over MTTF

• Rich, robust functionality is simply accessible– like an iceberg

Page 7: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

www.pageofphotos.com• Simple idea, simple app• Two-tiers: web tier + database• What’s the problem?

• We’ll reexamine – one tier at a time1. Scaling compute2. Scaling data3. Scaling geographically4. Handling failure… and all while maintaining User Experience (UX)

Page 8: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

1/9th above w

ater

Page 9: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Cloud-Native Application Characteristics• Application architecture is aligned with the cloud

platform architecture– uses the platform in the most natural way– lets the platform do the heavy lifting

• Are loosely coupled – for scalability, reliability, and flexibility

• Scale horizontally, automatically, bidirectionally– maintaining UX and cost-optimizing– scale operationally along with capacity

• Handle busy signals and node failures– without unnecessary UX degradation

• Use geo-distribution services– minimize network latency

Page 10: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Know the rules

“If I had asked people what they wanted, they would have said faster horses.”

- Henry Ford

Page 11: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Know the rules

“If I had asked IT departments what they wanted, they would have said IaaS.”

- Henry Cloud

Page 12: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Use the right tool for the job…

Better on water thanon land…. sorta “unreliable”when used on land.

Page 13: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Modern Application Challenges

1. Scaling compute2. Scaling data3. Scaling geographically4. Handling failure… and all while maintaining User Experience (UX)• Example patterns we will review:

a. Horizontal Scalingb. Queue-Centric Workflowc. Database Shardingd. Other patterns briefly as time permits

Page 14: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Pre-Cloud vs. Cloud-Native

Old-School vs.

Cloud-Native

Control Efficiency

Stable/Static Hardware Dynamic/∞ Resources

Fixed/CapEx Variable/OpEx

Vertical Scaling Horizontal Resourcing

Minimize MTBF Minimize MTTR

Data Storage = RDBMS Scenario-specific Storage

Manage Infrastructure Managed Infrastructure

arch

itect

ural

con

cern

s

Page 15: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Horizontal Scaling Compute Pattern

pattern 1 of 3

Page 16: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

What’s the difference between performance

and scale??

Page 17: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Common Terminology:Scaling Up/Down Vertical ScalingScaling Out/In Horizontal “Scaling” But really is Horizontal Resource Allocation

• Architectural Decision– Big decision… hard to change

Scale Up (and Scale Down??)vs. Horizontal Resourcing

Page 18: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Vertical Scaling (“Scaling Up”)

.

Resources that can be “Scaled Up”• Memory: speed, amount • CPU: speed, number of CPUs• Disk: speed, size, multiple controllers• Bandwidth: higher capacity pipe• … and it sure is EASY

Downsides of Scaling Up• Hard Upper Limit• HIGH END HARDWARE HIGH END CO$T• Lower value than “commodity hardware”• May have no other choice (architectural)

Page 19: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Scaling Horizontally: Adding Boxesautonomous nodes

for scalability(stateless web servers, shared

nothing DBs, your custom code in

QCW)

Page 20: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Load Balancer(Cloud Service)

Managed VMs(Cloud Service)

Example: Web Tier www.pageofphotos.com

Page 21: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

1. Auto-Scale • Bidirectional

2. Nodes can fail• Auto-Scale is only one cause• Handle shutdown signals• Stateless (“like a taxi”)

vs. Sticky Sessions• Stateless nodes

vs. Stateless apps• N+1 rule

vs. occasional downtime (UX)

Horizontal Scaling Considerations

Page 22: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

How many users does your cloud-native

application need before it needs to be able to

horizontally scale??

Page 23: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Queue-Centric Workflow Pattern

(QCW for short)

pattern 2 of 3

Page 24: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Extend www.pageofphotos.com example into Service Tier

• QCW enables applications where the UI and back-end services are Loosely Coupled

• (Compare to CQRS at the end)

Page 25: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

QCW Example: User Uploads Photo www.pageofphotos.com

Web Server

Compute ServiceReliable Queue

Reliable Storage

Page 26: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

QCW

WE NEED:• Compute (VM) resources to run our code

• Reliable Queue to communicate

• Durable/Persistent Storage

Page 27: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Where does Windows Azure fit?

Page 28: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

QCW [on Windows Azure]

WE NEED:• Compute (VM) resources to run our code

Web Roles (IIS) and Worker Roles (w/o IIS)• Reliable Queue to communicate

Azure Storage Queues• Durable/Persistent Storage

Azure Storage Blobs & Tables; WASD

Page 29: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

QCW on Azure: User Uploads a Photo

WebRole(IIS)

WorkerRoleAzure Queue

Azure Blob

UX implications: user does not wait for thumbnail(architecture!)

ww

w.p

ageo

fpho

tos.

com

push pull

Page 30: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

QCW enables Responsive UX

• Response to interactive users is as fast as a work request can be persisted

• Time consuming work done asynchronously• Comparable total resource consumption, arguably

better subjective UX• UX challenge – how to express Async to users?

– Communicate Progress– Display Final results– Long Polling/Web Sockets (e.g., SignalR or Node.io)

Page 31: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

QCW enables Scalable App

• Decoupled front/back provides insulation– Blocking is Bane of Scalability– Order processing partner doing maintenance– Twitter down– Email server unreachable– Internet connectivity interruption

• Loosely coupled, concern-independent scaling– (see next slide)– Get Scale Units right

Page 32: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

General Case: Many Roles, Many Queues

WebRole(IIS)

WorkerRole

WebRole(IIS)

WebRole

(Public)

WorkerRoleWorker

RoleWorker

Role Type 1

WorkerRoleWorker

RoleWorkerRoleWorker

Role Type 2

Queue Type 1

Queue Type 2

Queue Type 1

Queue Type 2

Queue Type 3

• Scaling best when Investment α Benefit• Optimize for CO$T EFFICIENCY

• Logical vs. Physical Architecture

WorkerRole

Type 2

WorkerRole

Type 2

WorkerRole

Type 2

WebRole

(Admin)

Page 33: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Reliable Queue & 2-step Delete

(IIS)WebRole

WorkerRole

var url = “http://pageofphotos.blob.core.windows.net/up/<guid>.png”;queue.AddMessage( new CloudQueueMessage( url ) );

var invisibilityWindow = TimeSpan.FromSeconds( 10 );CloudQueueMessage msg = queue.GetMessage( invisibilityWindow );

(… do some processing then …)queue.DeleteMessage( msg );

Queue

Page 34: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

QCW requires Idempotent

• Perform idempotent operation more than once, end result same as if we did it once

• Example with Thumbnailing (easy case)• App-specific concerns dictate approaches

– Compensating action, Last write wins, etc.• PARTNERSHIP: division of responsibility

between cloud platform & app– Far cry from database transaction

Page 35: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

QCW expects Poison Messages

• A Poison Message cannot be processed– Error condition for non-transient reason– Use dequeue count property

• Be proactive– Falling off the queue may kill your system

• Determine a Max Retry policy per queue– Delete, put on “bad” queue, alert human, …

Page 36: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

QCW requires “Plan for Failure”

• VM restarts will happen– Hardware failure, O/S patching, crash (bug)

• Bake in handling of restarts into our apps– Restarts are routine: system “just keeps working”– Idempotent support needed important– Event Sourcing (commonly seen with CQRS) may

help• Not an exception case! Expect it!• Consider N+1 Rule

Page 37: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Typical Site Any 1 Role Inst Overall System

Operating System Upgrade

Application Code Update

Scale Up, Down, or In

Hardware Failure

Software Failure (Bug)

Security Patch

What’s Up? Reliability as EMERGENT PROPERTY

Page 38: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Aside: Is QCW same as CQRS?

• Short answer: “no”• CQRS

– Command Query Responsibility Segregation• Commands change state• Queries ask for current state• Any operation is one or the other• Sometimes includes Event Sourcing• Sometimes modeled using Domain Driven

Design (DDD)

Page 39: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

What about the DATA?

• You: Azure Web Roles and Azure Worker Roles– Taking user input, dispatching work, doing work– Follow a decoupled queue-in-the-middle pattern– Stateless compute nodes

• Cloud: “Hard Part”: persistent, scalable data– Azure Queue & Blob Services– Three copies of each byte– Blobs are geo-replicated– Busy Signal Pattern

Page 40: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Database Sharding Pattern

pattern 3 of 3

Page 41: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Extend www.pageofphotos.com example into Data Tier

• What happens when demands on data tier grow?

• The Database Sharding Pattern a little about reliability – a lot about scale and performance

Page 42: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Foursquare is a Social Network

Page 43: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Foursquare #Fail

• October 4, 2010 – trouble begins…• After 17 hours of downtime over two days…

“Oct. 5 10:28 p.m.: Running on pizza and Red Bull. Another long night.”

WHAT WENT WRONG?

Page 44: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

What is Sharding?

• Problem: one database can’t handle all the data– Too big, not performant, needs geo distribution, …

• Solution: split data across multiple databases– One Logical Database, multiple Physical Databases

• Each Physical Database Node is a Shard• Most scalable is Shared Nothing design

– May require some denormalization (duplication)

Page 45: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

All shard have same schema

SHARDS

Page 46: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Sharding is Difficult

• What defines a shard? (Where to put stuff?)– Example – use country of origin: customer_us,

customer_fr, customer_cn, customer_ie, …– Use same approach to find records (can use lookup)

• What happens if a shard gets too big?– Rebalancing shards can get complex (esp roll-your-own)– Foursquare case study is interesting

• Query / join / transact across shards• Cache coherence, connection pool management

– Roll-your-own challenge

Page 47: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Where does Windows Azure fit?

Page 48: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Windows Azure SQL Database (WASD)is SQL Server Except…

Common

SQL ServerSpecific(for now)

WASDSpecific

“Just change the connection

string…”

• Full Text Search• Native Encryption• Many more…

Limitations• 150 GB size limit• Busy Signal Pattern• Colocation PatternNew Capabilities• Managed Service• Highly Available• Rental model• Federations

http://msdn.microsoft.com/en-us/library/ff394115.aspxAdditional information on Differences:

Page 49: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Windows Azure SQL Databse Federations for Sharding

• Single “master” database– “Query Fanout” makes partitions transparent– Instead of customer_us, customer_fr, etc… we are back to

customer database• Handles redistributing shards• Handles cache coherence• Simplifies connection pooling• No MERGE, only SPLIT currently

• http://blogs.msdn.com/b/cbiyikoglu/archive/2011/01/18/sql-azure-federations-robust-connectivity-model-for-federated-data.aspx

Page 50: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Foursquare #Fail

Foursquare was implementing database sharding in the application layer. WASD Federations makes this unnecessary.

WHAT WENT WRONG?

Page 51: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

My database instance is limited to 150 GB.

∞ ∞ ∞Does that mean the

cloud doesn’t really offer the illusion of infinite

resources??

Page 52: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Pre-Cloud vs. Cloud-Native

Lessons: being

Cloud-Native

1:15,000 Efficiency

Auto-Scaling via API Dynamic/∞ Resources

Pay-As-You-Go Variable/OpEx

Stateless, Autonomous Horizontal Resourcing

N+1, Idempotent Minimize MTTR

SQL, NoSQL, Blob Scenario-specific Storage

VM, Storage, LB, DR Managed Infrastructure

Page 53: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Know the rules

“Know the rules well, so you can break them effectively.”

- Dalai Lama XIV

Page 54: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Cloud Architecture Patterns bookPrimer Chapters

1. Scalability2. Eventual Consistency3. Multitenancy and Commodity Hardware4. Network Latency

Page 55: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Cloud Architecture Patterns book Pattern Chapters

1. Horizontally Scaling Compute Pattern2. Queue-Centric Workflow Pattern3. Auto-Scaling Pattern4. MapReduce Pattern5. Database Sharding Pattern6. Busy Signal Pattern7. Node Failure Pattern8. Colocate Pattern9. Valet Key Pattern10. CDN Pattern11. Multisite Deployment Pattern

Page 56: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Questions?Comments?

More information?

?

Page 57: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Business Card

Page 58: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

BostonAzure.org

• Boston Azure cloud user group• Focused on Microsoft’s PaaS cloud platform

• Monthly, 6:00-8:30 PM in Boston area– Food; wifi; free; great topics; growing community

• Follow on Twitter: @bostonazure • More info or to join our Meetup.com group:

http://www.bostonazure.org

Page 59: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

Contact MeLooking for …• consulting help with Windows Azure Platform? • someone to bounce Azure or cloud questions off?• a speaker for your user group or company technology event?

Just Ask!

Bill Wilder@codingoutloudhttp://blog.codingoutloud.comcommunity inquiries: [email protected] inquiries: www.devpartners.com

Page 60: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)
Page 61: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)

DONE

Page 62: Boston Code Camp 18 20-October-2012 (1:30 – 2:40)