High Availability and Scalability: Too Expensive! Architectures for Future Enterprise Systems

Post on 14-Jan-2015

1.541 views 2 download

Tags:

description

High availability and scalability used to be solved in hardware - but that is quite expensive. This presentation shows how modern technologies like virtualization, cloud, NoSQL and new software architectures provide new and cheaper solutions - that are probably also even better than the traditional approaches.

Transcript of High Availability and Scalability: Too Expensive! Architectures for Future Enterprise Systems

Eberhard Wolff - @ewolff

High Availability and Scalability: Too Expensive!–

Architectures for Future Enterprise Systems

Eberhard Wolff Freelance Consultant / Trainer

Head Technolocy Advisory Board adesso AG

Eberhard Wolff - @ewolff

The Dream

Foto: http://www.vaxman.de/

Eberhard Wolff - @ewolff

Eberhard Wolff - @ewolff

Eberhard Wolff - @ewolff

Eberhard Wolff - @ewolff

Where Are We?

Eberhard Wolff - @ewolff

Non-functional Requirements

Eberhard Wolff - @ewolff

Availability

Performance

Eberhard Wolff - @ewolff

Performance

Availability

Eberhard Wolff - @ewolff

Availability: Traditional Approach

Eberhard Wolff - @ewolff

•  Buy highly reliable hardware

•  Built a small cluster •  2 machines

•  Maybe add a stand-by data center

Eberhard Wolff - @ewolff

•  Eventually system will fail

•  …and you are in real trouble

Eberhard Wolff - @ewolff

True Story •  “Machine rebooted over night.” •  “Several times.” •  “No idea how often.” •  “No idea why…”

Eberhard Wolff - @ewolff

Let’s look at an example

Eberhard Wolff - @ewolff

Eberhard Wolff - @ewolff

•  Server fails •  Application fails •  No service to the customer

•  Can we do better?

Eberhard Wolff - @ewolff

Eberhard Wolff - @ewolff

What You Have Just Seen

Eberhard Wolff - @ewolff

•  Failing systems do not impact user •  Failing systems are just restarted •  Restarts happen automatically

•  System run in different data centers •  i.e. eu-west-1a / b / c

Eberhard Wolff - @ewolff

Elastic Load

Balancer

System EU West 1a

System EU West 1b

System EU West 1c

Eberhard Wolff - @ewolff

What It Takes… •  Virtualization •  +API to start new servers

•  Watchdog to detect failed servers •  Redundant data centers if needed

Eberhard Wolff - @ewolff

Can be implemented in your datacenter!

I have none.

So I used the Amazon Cloud

Eberhard Wolff - @ewolff

Alternatives

Eberhard Wolff - @ewolff

Hardware •  As cheap as it gets

•  Not highly available

•  Availability in Software

Eberhard Wolff - @ewolff

Traditional Servers

Eberhard Wolff - @ewolff

Traditional Servers

Eberhard Wolff - @ewolff

Highly customized

Hard to reproduce

Eberhard Wolff - @ewolff

•  Depends on details •  True story: •  Order of patch

installations matter

Eberhard Wolff - @ewolff

Stateful

Eberhard Wolff - @ewolff

Redundancy in Hardware

Eberhard Wolff - @ewolff

Traditional Servers

Eberhard Wolff - @ewolff

Phoenix Servers

Eberhard Wolff - @ewolff

Easy to create a new server

Eberhard Wolff - @ewolff

Reliably reproducible

Eberhard Wolff - @ewolff

Stateless

Eberhard Wolff - @ewolff

Stateless

•  No data is lost •  New server can take load

immediately

Eberhard Wolff - @ewolff

Redundancy in Software

Eberhard Wolff - @ewolff

Implementations •  Might use a VM image •  …or a PaaS •  …or provisioning tools

Eberhard Wolff - @ewolff

Provisioning Tools

Eberhard Wolff - @ewolff

•  Easy to create test environments •  …with other software version

Eberhard Wolff - @ewolff

Chaos Monkey

•  Tool by Netflix •  Video streaming •  #1 in Internet usage in the US

Eberhard Wolff - @ewolff

Chaos Monkey

•  Kill random machines •  To ensure system survives

hardware failures

Eberhard Wolff - @ewolff

Would you rather rely on…

…highly available hardware

…or a Chaos Monkey tested system?

Eberhard Wolff - @ewolff

Resilience

Eberhard Wolff - @ewolff

Performance

Availability

Eberhard Wolff - @ewolff

Availability

Performance

Eberhard Wolff - @ewolff

Performance: Traditional Approach

Eberhard Wolff - @ewolff

•  Estimate •  #Users •  Use Cases •  Data volume •  Etc.

•  Add a little bit

•  Order servers

Eberhard Wolff - @ewolff

Performance: Problems

Eberhard Wolff - @ewolff

Problem: Estimate & Scaling •  Performance hard to estimate •  Coarse grained scaling •  Backfires

Eberhard Wolff - @ewolff

True Story •  Initial estimate wrong •  Just need a little more •  Cluster: two servers •  Add one •  About 50% higher costs •  Order / install server takes time •  Bad performance until server

delivered

Eberhard Wolff - @ewolff

Problem: Load Peak •  Business has load peaks •  i.e. events that people register for

•  Need to have enough hardware for load peaks

•  Costly

Eberhard Wolff - @ewolff

Problem: Testing •  Testing •  Need production-like infrastructure

•  Prohibitive costs •  Only needed during tests

Eberhard Wolff - @ewolff

Eberhard Wolff - @ewolff

Elastic Load

Balancer

System EU West 1b

System EU West 1c

System EU West 1c

System EU West 1c

Eberhard Wolff - @ewolff

What You Have Just Seen •  System tunes itself depending on

load •  Same approach as for availability •  +Watchdog for load

Eberhard Wolff - @ewolff

Easy to create a new server

Reliably reproducible

Redundancy in Software

Stateless

?

Eberhard Wolff - @ewolff

Stateless •  Stateless web servers: best practice •  Some Java framework don’t follow

the approach

•  Can store HTTP session externally •  i.e. RDBMS, NoSQL, Cache

Eberhard Wolff - @ewolff

What about Databases?

Eberhard Wolff - @ewolff

Databases •  Often assumed to be

just “fast and scalable” •  Large scale doable i.e.

Data Warehouse •  Often use traditional

approach •  Cluster with two nodes •  Highly available

hardware

Eberhard Wolff - @ewolff

Database: Problems •  Availability •  Highly available hardware

•  Performance •  Limited scaling

•  Costly

Eberhard Wolff - @ewolff

Databases •  New approaches

•  Used by NoSQL databases

•  But also i.e. MySQL •  …or in system architecture

Eberhard Wolff - @ewolff

Databases •  Replication •  Read performance •  Availability

•  Sharding •  Spread data across servers •  Write performance

Eberhard Wolff - @ewolff

Scaling MongoDB

Replica 1

Shard 1

Replica 2

Replica 3

Shard 2

Replica 1

Replica 2

Replica 3

Eberhard Wolff - @ewolff

Availability

Replica 1

Shard 1

Replica 2

Replica 3

Shard 2

Replica 1

Replica 2

Replica 3

Eberhard Wolff - @ewolff

Scaling MongoDB

Replica 1

Shard 1

Replica 2

Replica 3

Replica 1

Shard 2

Replica 2

Replica 3

Replica 1

Shard 3

Replica 2

Replica 3

Eberhard Wolff - @ewolff

Scaling MongoDB

Replica 1

Shard 1

Replica 2

Replica 3

Shard 2

Replica 1

Replica 2

Replica 3

?

Eberhard Wolff - @ewolff

Replicas & Shards •  Easy to understand

•  But: Coarse grained scaling

•  Adding another shard means •  Moving lots of data •  Add quite some servers

Eberhard Wolff - @ewolff

Amazon Dynamo Model Server A

Shard1 Shard3

Shard4

Server B Shard2 Shard1

Shard4

Server D Shard4 Shard2

Shard3

Server C Shard3 Shard2

Shard1

Eberhard Wolff - @ewolff

Amazon Dynamo Model Server A

Shard1 Shard3

Shard4

Server B Shard2 Shard1

Shard4

Server D Shard4 Shard2

Shard3

Server C Shard3 Shard2

Shard1

Eberhard Wolff - @ewolff

Amazon Dynamo Model Server A

Shard1 Shard3

Shard4

Server B Shard2 Shard1

Shard4

Server D Shard4 Shard2

Shard3

Server C Shard3 Shard2

Shard1

New Server

Eberhard Wolff - @ewolff

Amazon Dynamo Model •  Published in the Dynamo paper •  Implementations:

Riak, Cassandra etc

•  Fine grained scaling •  Can immediately write to new node

Eberhard Wolff - @ewolff

Hardware •  Not highly reliable

•  Scales by distributing load across servers

•  No NAS, SAN, RAID…

•  As cheap as it gets

Eberhard Wolff - @ewolff

Sum Up •  Virtualization •  + Phoenix server •  = Better availability •  = Better performance •  = Lower costs •  Stateless servers •  NoSQL

Eberhard Wolff - @ewolff

Thank You!