Eugene Ciurana [email protected] - Amazon Web...

Letʼs move the Java world!

High-Availability,Fault Tolerance,

and Resource Oriented Computing

Eugene [email protected] - pr3d4t0r ##java, irc.freenode.net

This presentation is available from:

http://ciurana.eu/GeeCON-2010


About Eugene...

• 15+ years building mission-critical, high-availability systems

• 14+ years of Java work

• Open source evangelist

• Official adoption of open source/Linux at Walmart worldwide

• State of the art main line of business at the largest companies in the world - not a web guy!


What You’ll Learn...

• Decoupled, event-driven, resource-oriented systems are more flexible

• Avoid tight, point-to-point integration

• Enhance JVM-based apps with better domain-specific languages

• How to move away from monolithic app servers and architectures

• How to implement event-driven systems based by leveraging existing infrastructure and SOA investment

• Treat computational resources as addressable entities

• Balance open source vs. commercial products


Very Important!

Please Ask Questions!(don’t be shy)


What is Scalability?

• Scalability is the property of a system to:

• handle bigger amounts of work; or

• to be easily expanded in response to increased demand

• network, processing, database, file resources

• Types of scalability

• Horizontal (out): add more nodes with identical functionality as existing ones and redistribute the load

• Vertical (up): expand by adding more cores, main memory, storage, or network interfaces


Horizontal Scalability

Node Node Node Node

Load Balancer

Node Node Node

Load Balancer

Scales out

Clustering!


Vertical Scalability

Dual CoreDual Processor

32 MB RAM

Virtual Node 0

Virtual Node 3

Virtual Node 2

Virtual Node 1

Dual CoreSingle Processor

16 MB RAM

Virtual Node 0

Virtual Node 2

Virtual Node 1

Scales up


What is Availability?

• How well a system provides useful resources over a set period of time

• High availability guarantees an absolute degree of functional continuity within a time window

• Expressed as a relationship between uptime and unplanned downtime

• A = 100 - (100*D/U); D, U expressed in minutes

• Beware: uptime != available


The Nines GameAvailability % Downtime (minutes) Downtime/year Vendor jargon

90 52560.00 36.5 days one nine

99 5256.00 3.7 days two nines

99.9 526.60 8.8 hours three nines

99.99 52.56 53 minutes four nines

99.999 5.26 5.3 minutes five nines

99.9999 0.53 32 seconds six nines


Service Level Agreements• SLAs are negotiated terms that outline the obligations of the two

parties delivering and using a system

• System type - not all systems require the same SLA

• Levels of availability

• Minimum

• Target

• Uptime

• Network

• Power

• Maintenance windows

• Serviceability

• Performance and metrics

• Billing

SLAs helpdetermine if you scale up

or out


Load Balancers• They work by spreading requests among two or more resources

• Implemented in hardware or in software

• Multiple machines

• Multiple processes

• Multiple threads

• Resources appear as a single device to consumers

• Can be stateless (web services), or stateful (applications that require session management)

• Algorithms determine the distribution

• 1/n == all systems equally likely to service

• Special requests (e.g. music store) some servers get hit more than others


Load Balancers

Node192.168.202.55

Node192.168.202.66

Node192.168.202.67

Node192.168.202.69

Load Balancer74.0.125.28

RnR = requestn = sequence number

Consumer

R2R3R1


Persistent Load Balancers

Node192.168.202.55 Node

192.168.202.66Node

192.168.202.67

Node192.168.202.69

Sticky Load Balancer

74.0.125.28

Consumer ConsumerConsumer


Load Balancing and Databases

Node192.168.202.55

Node192.168.202.66

Node192.168.202.67

Node192.168.202.69


Consumer

SessionData


Caching Strategies

• Stateful load balancing requires data sharing

• Caching distributes popular, shared read-only data

• Think of them as a giant hash map

• If the data isn’t in the cache, fetch it from database

• Write policies:

• write-through: write to the cache AND database

• write-behind: cache is marked “dirty” and updated only if a dirty datum is requested

• no-write allocation: only read requests are cached; assumes data never changes


Caching Usage Pattern

• Application caching

• Little or no programmer participation (e.g. Terracotta)

• Explicit API calls (memcached, Coherence, etc.)

• Web caching - stores full documents, or fragments (‘particles’) on the server or client and are invisible to the client

• Web accelerators - distribute the load (e.g. CDN like S3, Akamai, etc.)

• Proxy caches - distribute requests to same resources and may provide filtering/query (e.g. Squid, Apache, ISA servers)


Caching Usage PatternBegin

Fetch datum from

cache

datum is None

Query datum from database

Add datum to cache

Use datum in app

End

yes

no

Query?

Update datum in database

Invalidate cache

Add or update datum to cache

query update


Distributed Caching

Load Balanced Configuration or Datagram

Node192.168.202.55

Node192.168.202.66

Node192.168.202.67

Node192.168.202.69


Consumer

Cache 0 Cache 1 Cache 2 Cache 3

Database


Clustering

• Cluster - two or more systems that appear to users as a single system

• A cluster (horizontally scalable) system is more cost-effective than a monolithic single system (vertically scalable) with the same performance characteristics

• Systems are connected in the cluster over high-speed LANs like Gb Ethernet, FDDI, Infiniband, Myrinet, etc.


A/A Clustering

• A/A == Active/Active

• Distribute the load evenly among multiple nodes

• All nodes offer the same capabilities

• All nodes are active at the same time

Node192.168.202.55

Node192.168.202.66

Node192.168.202.67

Node192.168.202.69


Consumer


High-Availability A/P Cluster

• A/P == Active/Passive

• Provides uninterrupted service through redundant nodes

• Eliminates single-point-of-failure

• Two nodes minimum, and “heartbeat” detection

• Automatic traffic switch for fail-over

Active Node192.168.202.55

Failover Node192.168.202.69

Router74.0.125.28

Consumer

State DataCache

FailoverDatabaseDatabase

heartbeat

replication or clustered database


Grid

• Process loads as independent jobs

• Nodes don’t require data sharing

• Storage, network may be shared by all nodes

• Intermediate results have no bearing on other jobs progress

• Each node is independent

• Map/Reduce (Hadoop)

Node Node Node Node

Load Balancer

Node Node Node Node

Load Balancer

Consumer

Master


Computational Cluster

• Used for operations that require raw computational power

• Not good for transactional operations (web, database)

• Tightly coupled nodes, homogeneous, close proximity

• Meant to replace supercomputers

Node Node Node Node

Node Node Node Node

Consumer

Master


Redundancy and Fault Tolerance

• Redundancy - the expectation that any system component failure is independent of failure in other components

• Fault tolerance - the system continues to operate in the event of component failure

• May have decreased throughput

Fault toleranceresults from

SLAs


Fault Tolerance SLA Requirements

• No single point of failure - redundant components ensure continuous operation

• Allow repairs without disruption of service

• Fault isolation - problem detection must pinpoint the specific faulty component

• Fault propagation containment - problems in one component must not cascade to others

• Reversion mode - the system can be set back to a known state on command


A/A Cluster Fault Tolerance

• Uninterruptible, scalable service (stateless, web services)

• Failure transparency - though maybe degraded service

• Ideal for event-based web services (SOAP, REST, JMS, etc.)

• No dependencies between nodes

Node192.168.202.55

Node192.168.202.66

Node192.168.202.67

Node192.168.202.69


Consumer

ReplacementNode

192.168.202.53


A/P Cluster Fault Tolerance

• High availability through redundancy and failure detection

• Higher cost - used for stateful systems

• May require active sys- or netadmin participation

• More moving parts - more things to coordinate

Node192.168.202.55

Failover Node192.168.202.69

Router74.0.125.28

Consumer

State DataCache

FailoverDatabaseDatabase

heartbeat


Putting It All Together


ROC Architecture

• ROC = Resource-Oriented Computing

• Everything is a resource (computational, data, other)

Internet

ServiceProvider

(UPS, FedEx)

Mule ESB

Single Sign-On

Active

Directory

Legacy

Auth

LDAP, SOAP Mainframe / RACF

CRMProduct

Catalogue

ProductSupportPages

ProductSupportPages

ProductSupportPages

HTTP, XMLJDBCSOAP

TCP pass-through

Remedy

Service Object

business logicWeb app

Web browser

GUIAppDedicated API

Transformer

Transformer

Transformer

JMS, SOAP, etc.


SOA and Computational Network


Real-Life Example - LeapFrogEnd-User System (Mac, Windows)

LeapFrog

Connect

Web

Browser

USB

Internet

S3

Content

RepositoryThird-party

Partner Site

www.leapfrog.comconnected

productsLearningPath

Firewall

Mule ESB backboneHTTP, SOAP (CXF), REST, etc. routing, filtering, and dispatching; ActiveMQ JMS broker; dedicated LeapFrog services

Mule ESB tailboneConnected products SOAP, REST web

services

Mule ESB funnyboneDevice log upload, processing, servlet

container

Content

Management

System

REST, JCR

Device

Logs

Crowd SSO

Customer

Data

Game

play

Data

Content

Authoring

User

Credentials

Servlets

App Logic


Real-Life Example - LeapFrog

Backbone - message filtering, routing, dispatching, queuing, events

Internet

Load Balancer

Application

Server

Tomcat 6

Services Proxy

Application

Server

Tomcat 6

Load Balancer - Backbone

Mule ESB

1.6.2

Mule ESB

1.6.2

Mule ESB

1.6.2

Mule ESB

1.6.2

Load Balancer - Tailbone

Mule ESB

SOAP, RESTMule ESB

SOAP, REST

Database

Load Balancer - Funnybone

Mule ESB

servlet, MTOMMule ESB

servlet, MTOM

NFS

share

Load Balancer - Message Broker

ActiveMQ ActiveMQ

NFS

share


Mule SOA Applied Clustering

Mule ESB as Application Container 1

Service 1 Service 2 Service 3



Load Balancer

External Applications

http://server.mycompany.com/service_call

http://mule_server_2/service_callhttp://mule_server_1/service_call

* Two or more Mule instances can provide services, for scalability if there is high demand* Load balanced configuration has built-in fail-over* External apps see a single point of entry: the service endpoint name* Load balancer or proxy sends the request to any available Mule server* Increased demand - add another Mule server without interrupting the existing ones* Decreased demand - remove Mule servers without interrupting other servers* This is an active/active configuration - any server can handle a request at any time* Assumes that the service application components are stateless


Mule SOA - ESB App Failover





Load Balancer

External Applications

http://server.mycompany.com/service_call

http://mule_server_2/service_callhttp://mule_server_1/service_call

* A/A configuration uses the load balancer to dispatch service calls* The load balancer takes a failing service out of rotation automatically* Failure reason no. 1: network connectivity* Failure reason no. 2: Mule container* Failure reason no. 3: Service application bug


Uninterrupted Application Updates

Mule ESB as Application version 1.4 Mule ESB as Application version 1.4

Load Balancer


Load Balancer


Load Balancer


Load Balancer

* Allow stopping and deploying new application functionality without stopping services* Allow upgrades to a country's configuration without affecting other countries or stopping services

time


Database Replication

Partition 0

Primary Cluster

Node 0 Node 1

DB 0

DB 0b

Partition 1

DB 1

DB 1b

ESB as app services provider


Application Deployment

Mule 1 Mule 2 Mule 3 Mule 4

Load Balancer Load Balancer

Mule 5Failover

JMS Queuing Active JMS Queuing Active



Linux

Virtual Machine

Java 6

JBoss

Application 1 Application 2

Linux

Virtual Machine

Java 6

Mule ESB Container

Web Service 1 Web Service 2

Linux

Virtual Machine

Java 6

MQ

Multi-Core Intel or AMD Processors

Simplify the architecture by having a common platform for all systems. This platform can be replicated across multiple data centers.

* Virtual Machine: VMware or Xen hosted on Windows; consider Amazon EC2 as a viable, low-cost alternative* Linux: Ubuntu Server* PowerBuilder applications (end-user) migrate to JBoss + Wicket or a similar configuration* All web services are hosted by Mule ESB* The Mule ESB and JBoss servers are separate from one another* MQ clusters have a similar architecture; JBoss messaging and Websphere MQ* Java 6 as a minimum

This architecture has a lower cost of operation and simplifies power consumption and administration.



DiskDisk

SAN

Virtual Host (Intel, AMD) Virtual Host (Intel, AMD)

App Balancer

ServicesBalancer

Internet

Web ServicesActive

ApplicationActive

MQMaster

DistributedCache

Web ServicesActive

ApplicationActive

MQSlave

DistributedCache

Each data center will have a cluster of two or more physical systems.

Each system will virtually host two or more applications/environments deployed as described in the previous diagram.

The system is designed for horizontal scalability (more traffic, more virtual or physical servers.

The system has inherent fail-over built in.

App and service requestsmay come from the open Internet

Use physicalload balancers;can be Linux systemsor dedicated F5balancers - separate fromcluseter


Application DeploymentData Center Europe

App Cluster

App Cluster

Data Center US

App Cluster

App Cluster

Internet

Data Center Japan

App Cluster

App Cluster

Claims Mgmt

Claims Mgmt

Informix

Expert

Each data center has an application cluster

The app clusters have identical configurations; only the app itself may vary by locale

Designated data center also functions as the global services processing hub; all applications talk to this cluster (e.g. Claims Management) regardless of where the app calling them is from.

The global services clusters are separate physically and logically from the application clusters which may include locale-specific web services and data stores.

Legacy System

Legacy System

Legacy System



Partition 0

Primary Cluster

Node 0 Node 1

DB 0

DB 0b

Partition 1

DB 1

DB 1b


Partition 0

Secondary Cluster

Node 0 Node 1

DB 0

DB 0b

Partition 1

DB 1

DB 1b


Enterprise Service Bus (routing, queuing, transformation, transactions, dispatching)

q u e u e


Q&AComments?

Anything else?

Eugene [email protected] - pr3d4t0r ##java, irc.freenode.net

http://ciurana.eu/scalablesystems

This presentation is available from: http://ciurana.eu/GeeCON-2010

Twitter: ciurana

Eugene Ciurana [email protected] - Amazon Web...

Documents

Transcript of Eugene Ciurana [email protected] - Amazon Web...