HA & Scalable Performancemedia.techtarget.com/tss/static/articles/content/J... · web applications...

HA & Scalable Performance

Building High Availability and Scalable Perform ance into your J2 EE Applicat ions

About: Overview

J2 EE applicat ions are expected to provide High

Availability ( HA) , predictable scalability, and extrem e

perform ance, and to do it a ll on a t ight budget and

schedule. Achieving any one of these goals can be

difficult , and achieving all three on t im e and under

budget can seem near im possible, but there are sim ple

concepts and proven design pat terns that can be

ut ilized to im plem ent even the m ost ext rem e scale

applicat ions, and to do it in such a w ay that Single

Points Of Failure ( SPOFs) are com pletely elim inated.

About: Overview

W ith large- scale com m ercial w eb sites as case studies, real-w orld design pat terns are exam ined for building J2 EE- based w eb applicat ions and w eb services that can achieve near-1 0 0 % upt im e and linear scalability on com m odity hardw are, a ll w hile exhibit ing ext rem e perform ance.

W ithin the context of a scalable applicat ion infrast ructure, various approaches are analyzed for achieving HA, fault tolerance, localizat ion of fa ilure, and t ransparent fa ilover.

Different approaches to configuring an operat ional infrast ructure are exam ined, w ith an em phasis on reliability, m anageability, capacity on dem and, and cont inuancyplanning.

About: Cameron Purdy and Tangosol

Cam eron Purdy is President of Tangosol, and is a cont r ibutor to Java and XML specificat ions

Tangosol is the JCache ( JSR1 0 7 ) specificat ion lead and a m em ber of the W ork Manager ( JSR1 3 7 ) expert group.

Tangosol Coherence is the leading clustered caching and data gr id product for Java and J2 EE environm ents. Coherence enables in- m em ory data m anagem ent for clustered J2 EE applicat ions and applicat ion servers, and m akes sharing, m anaging and caching data in a cluster as sim ple as on a single server .

Goals

Understanding applicat ion availabilityMeasuring availability

How to build availability into an architecture

Understanding Scalable Perform anceW hat is it? How does it differ from faster ?

W hat are the necessary ingredients?

Achieving Availability and Scalable

Perform ance

Reliabilit y and Availabilit y

Terms: RAS & RASP

RAS

Reliability

Availability

Serviceability

RASP

Reliability

Availability

Scalability

Perform ance

Terms: RAS vs. RASP

RAS term s center ent irely around upt im eOriginally used by m ainfram e vendors

RASP term s describe both upt im e and scalable

perform ance

Terms: Reliabilit y

Defined as the probability of a fa ilure w ithin a given t im e period ( or, how frequent ly a fa ilure occurs)

Failure rate is often notated as ( lam bda)

Typically m easured asMTBF: Mean Tim e Betw een FailureFI T Rate: Failures I n Tim e ( fa ilures per 1 ,0 0 0 ,0 0 0 ,0 0 0 hours)

Terms: Availabilit y

Defined as the percentage of t im e that an applicat ion is processing requests

FactorsMTBF: Mean Tim e Betw een FailureMTTR: Mean Tim e To Recovery

Measured in term s of upt im etypically nines9 9 .9 9 9 % is five nines

Terms: Reliability vs. Availability

Good reliability, bad availability

I nfrequent , but potent ia lly m ajor, dow nt im es

e.g. electr ical pow er grid

Bad reliability, good availability

Frequent , but m inor, fa ilures

e.g. m obile com m unicat ions


Reliability generally has the greater im pact on

end- user percept ion: Frequent fa ilures are

irr itat ing

Availability generally has the greater im pact

on operat ions: Extended dow nt im e can cripple

a business


I f the applicat ion is fault - tolerant , and failover

is instantaneous, then Reliability and

Availability can generally be t reated as a

single object iveThe applicat ion is reliable as long as there is a server to fa il

over to

The applicat ion is available as long as there is at least one

server up

High Availabilit y

High Availabilit y

Measuring HA: W hat percentage of the t im e is the applicat ion usable?

For HA, it s m easured as the num ber of nines ; e.g. 9 9 .9 9 9 % is five nines

Calculated using sim ple probability

HA is the product of redundancy: or the nodes in each t ier

HA is sacrificed by the w eakness of each link: and the t iers

High Availabilit y

I f a t ier is com posed of m achines that have an

individual up- t im e of 9 9 % , w hat s the t ier s

up- t im e for one node? Tw o nodes?

( Measure probability as being dow n )

p1 = .0 1 ( 9 9 % upt im e = tw o nines )

p2 = .0 1 * .0 1 ( 9 9 .9 9 % upt im e = four nines )

High Availabilit y - Quiz

Given:

Linear scale for both SMP and Cluster ing

Cluster A: Tw o 8 - CPU servers, 9 9 % Upt im e each

Cluster B: Eight 2 - CPU servers, 9 9 % Upt im e each

W hat is the availability of each configurat ion if a tota l of 8

CPUs are required to service user requests w ithin SLA

requirem ents?



I f a tota l of 8 CPUs are required, this im plies one m achine is sufficient to service applicat ion requests.

For the applicat ion to fa il, both servers m ust fa il sim ultaneously: 0 .0 1 * 0 .0 1 = 0 .0 0 0 1

Predicted applicat ion availability is 9 9 .9 9 %

( annual dow nt im e of ~ 1 hour)



I f a total of 8 CPUs are required, this im plies four servers aresufficient to service applicat ion requests.

For the applicat ion to fail, five of the eight servers m ust fa ilsim ultaneously: 0 .0 1 ^ 5 = 1 e- 1 0

Predicted applicat ion availability is 9 9 .9 9 9 9 9 9 9 9 %

( annual dow nt im e of ~ 3 m s)

But

High Availabilit y

Applicat ion availability is m ore com plicated than it appears at first glance.

Hum an error m ay be the biggest cont r ibutor to applicat ion dow nt im e

Servers are rarely t ruly independent

A server fa ilure m ay increase load on the rem aining servers, t r iggering a cascade effect

Errors in shared com ponents ( netw ork sw itches, cluster ing, pow er system s) can im pact m ult iple servers


Given:

Linear scale for both SMP and Clustering



W hat is the perform ance im pact of a server fa ilure in each scenario?


Cluster A: Tw o 8 - CPU serversLoss of a server doubles the load on the rem aining

server

Cluster B: Eight 2 - CPU serversLoss of a server increases load on the rem aining

servers by only 1 4 %

High Availabilit y

I f an applicat ion has three t iers w ith 9 9 % upt im e each, w hat s the up- t im e for the applicat ion?

( Measure probability as being up )

p3 = .9 9 * .9 9 * .9 9

( 9 7 % upt im e = not even tw o nines! )

The applicat ion availability is not even as good as the w eakest link adding a t ier , even if it is m ore reliable than the other t iers, w ill a lw ays reduce applicat ion availability


Given:

Tw o- t ier applicat ion; Applicat ion and Database t iers each have 9 9 % up- t im e

W hat is the expected up- t im e for an applicat ion that can service all requests from the Applicat ion t ier? For an applicat ion that depends on the Database t ier?


Tw o- t ier applicat ion; Applicat ion and Database

t iers each have 9 9 % upt im e

Requests w hich require only the Applicat ion t ier

w ill have 9 9 % upt im e

Requests w hich require both t iers w ill have 9 8 %

upt im e ( 0 .9 9 * 0 .9 9 )

Eliminating SPOFs

SPOF = Single Point Of FailureWhenever a single server can die and take down the application (or

part of an application), that server is a SPOF

Eliminating SPOFs increases application availability

When a working system can take over for a failed

system, that is called failover

A system that can fail over is not a SPOF

Eliminating SPOFs: Load Balancers

This working load-balancer is a SPOF


Bart Simpson: I didn t do it!


The second one cost a lot but it doesn t do anything


until the first one dies!


Local HA Load BalancersTypically a master/slave configuration

Both Load Balancers receive all the traffic

The Load Balancers communicate directly over a dedicated cable

When the slave detects failure of the master , it assumes all

responsibility for the current connections

May even be able to fail over stateful connections, including HTTPS


Global Load BalancersUsed to direct traffic to a particular data center

Use an Authoritative Name Server e.g. to resolve www to particular

data center

For disaster recovery, the www resolves to the primary data

center unless it is down, in which case it resolves to the backup

For regional load-balancing, the www is resolved to the

geographically closest data center

Modern Global Load Balancers do both Global and Local balancing

Eliminating SPOFs: HA Databases

The second database doesn t do anything either


until the first one dies.


HA Databases do not normally require additional

programming in the application tierOften implemented in the JDBC driver level or below

Failover may cause current pending transactions to roll back, but with

a real HA database, no previously committed transactions are lost

The most reliable HA Database config is master/slaveThe slave server is always ready for the master to die

One-way replication may even work across datacenters

Eliminating SPOFs: The JEE Tier

Java application tiers can be statelessStateless tiers (e.g. web servers) are HA using simple redundancy

Only problem is that statelessness in one tier usually just passes the

buck to the next tier, which is almost always more expensive

Java application tiers are almost always statefulOnly two things can be lost: State and in-flight requests

To achieve HA, the Java tier must either manage its state resiliently

(e.g. in a clustered coherent cache) or back it up to a central store

Idempotent actions can be replayed by the web tier when a server fails

Eliminating SPOFs: HA JEE

Normal flow through a multi-tier enterprise app


if a server dies, its current requests can be lost.

Web Tier to App Tier Interconnects

The previous pictures may look like a mess but load balancers slow way down if the load balancing is sticky

Best approach is for the load balancer to round-robin or randomize its

load-balancing across all available web servers

but there s a good reason:Web servers (e.g. Apache, IIS, JES) can handle lots of concurrent

connections, serve static content, and route requests to app servers

The web server plug-in for routing to the app server can do the sticky

load balancing, guaranteeing that HTTP Sessions stick !

Eliminating SPOFs: Idempotency in JEE

Idempotency is potent!Under normal conditions, requests are processed exactly once.

Idempotency allows the same request to be processed more than

once, with the possibility that those requests were partially processed,

and without any side-effects from being run more than once.

Allows blind retries of any request without knowing for certain the

outcome of previous attempts to process that request

Requires great forethought: Every potentially state-mutating request

must have a plan for how it can be run 1 time, 1.5 times, 2 times or 200

times without corrupting the application state


Normal request/response before a server dies


and with idempotency, requests can re-route!


Idempotency by predicateAll actions must have one-way non-destructive state transitions

Conceptually similar to optimistic concurrency with a database

e.g. Perform this account transfer of $100 from account 123 to

account 456, but only if account 123 contains exactly $1000 and

account 456 contains exactly $500.


Idempotency by identityEasy pattern: Uniquely identify each possible action before it occurs

It s like the command pattern, but every command instance has an UID

e.g. Throughout the order verification and payment process, include a

hidden field with a UID identifying the ecommerce order

e.g. Place this order for these goods, but only if order UID

1234567890 has not previously been submitted.

Bonus: Allows the user to click submit twice without their credit card

getting charged twice!

Eliminating SPOFs: Miscellaneous

Almost everything supports redundancy:Power systems, server power supplies and air conditioning

Networks: Firewalls, routers, switches and cabling

CPUs, RAM, hard drives, NICs

Data Centers (using global load balancers)

Goal is to understand and mitigate the potential for

failures, based on business requirements

Risk is never entirely eliminated; always have a plan!

High Availabilit y

Sum m aryUse redundancy w ith fa ilover to increase availability by

elim inat ing Single Points Of Failure ( SPOFs)

Decouple t iers ( e .g. JMS) and as m uch as possible m ake

each t ier self- sufficient ( or at least fa il gracefully)

Com m on SenseNot all applicat ions need HA it s expensive!

There is st ill room for hum an error and unavoidable

dow nt im e ( e.g. certain upgrades)

Terms: Scalabilit y

Defined in term s of the im pact on throughput

as addit ional hardw are resources are addedAdding CPUs/ RAM to a server: Scaling Up

Adding servers to a cluster: Scaling Out

Terms: Scaling Factor, Linear Scalabilit y

Scalability m easured w ith Scaling FactorThe rat io of new capacity to old capacity as resources

are increased

I f doubling CPUs results in 1 .9 x throughput , then SF is

1 .9 , and the adjusted SF/ CPU is 0 .9 5

The ideal SF/ CPU is 1 .0 , a .k .a. linear scalability

Terms: Super-Linear Scalabilit y

I t is possible to exceed an SF of 1 .0

Exam plesW ith tw o disks, reduced head content ion can increase the throughput of sequent ial I / O by even 1 0 0 x. Sim ilar effects can occur w ith CPU caches and context sw itches.Large cluster- aggregated data caches can offer super-linear scale by significant ly increasing the hit rate, reducing the average data access cost

Can be explained as a super- linear slow dow n as resources are reduced ( i.e . the converse)

Exam ple: Super-Linear Scalabilit y

Assum pt ionsTiered data sources ( local cache, cluster cache, db)Cache Lim its: Local= 5 0 0 MB, Cluster= 1 GB per m em berLatencies: Local= 0 m s, cluster= 1 m s, db= 1 0 m sHas an act ive dataset of 4 GB w ith random access

Results: 1 0 x Perform ance w ith 4 x Hardw are1 Server: .1 2 5 local, .1 2 5 clustered, .7 5 0 db = ~ 8 .5 m s

2 Server: .1 2 5 local, .3 7 5 clustered, .5 0 0 db = ~ 6 m s

3 Server: .1 2 5 local, .6 2 5 clustered, .2 5 0 db = ~ 3 .5 m s

4 Server: .1 2 5 local, .8 7 5 clustered, 0 .0 db = 0 .8 7 5 m s

Terms: Performance

Defined as how fast operat ions com plete

Typically m easured as t im e ( w all clock )

e lapsed betw een request and response

Elapsed t im e also know n as latency

W eb apps often m easured on the server side

as t im e to last byte ( TTLB)

Terms: Scalabilit y vs. Performance

Sending data across the country

Great scalability, poor perform anceTruck- loads of disks

Great Perform ance, poor scalability5 6 Kbps dial- up m odem

Scalable Perform anceA bundle of fiber opt ic cables

Terms: Scalabilit y vs. Performance

Users are affected by poor perform ance

Poor perform ance is usually a result of poor

scalability

Operat ing costs and capacity lim itat ions are

caused by poor scalability

Designing for scalability often has a negat ive

im pact on single- user perform ance

Scalable Performance


Defining Scalable Perform anceScalable Perform ance refers to overall response t im es for

an applicat ion that are w ithin defined tolerances for

norm al use, rem ain w ithin those tolerances up to the

expected peak user load, and for w hich a clear

understanding exists as to the resources that w ould be

required to support addit ional load w ithout exceeding

those tolerances


Defining Scalable Perform anceScalable Perform ance is NOT focused on m aking an

applicat ion faster; rather, it is focused on insuring that the

applicat ion perform ance does not degrade beyond defined

boundaries as the applicat ion gains addit ional users, how

resources m ust grow to ensure that , and how one can be

certain that addit ional resources w ill solve the problem

Scalable Performance - Quiz

W hich applicat ion exhibits bet ter scalable

perform ance:a) Perform s total of 1 0 requests per second on one 2 - CPU

server and 1 9 requests per second on tw o 2 - CPU servers

b) Perform s total of 1 5 requests per second on one 2 - CPU



W hich applicat ion exhibits bet ter scalable

perform ance:a) Perform s total of 1 0 requests per second on one 2 - CPU


b) Perform s total of 1 5 requests per second on one 2 - CPU


Answ er: ( a) exhibits bet ter scalable perf


The second app ( b) did exhibit bet ter

perform ance

Adding a second server show ed that the first

app ( a) scaled at 9 0 % , w hile the second app

( b) scaled at only 6 7 %

Not enough data points to know for sure,

though; there could be a natural lim it

caused by an external resource!

Performance vs. Scalable Performance

Perform anceMeasures latency w ith a single- threaded, single- client test

ScalabilityMeasures the raw delta in m axim um sustainable request

throughput ( e .g. successful requests per second) under

heavy concurrent load, given a delta in hardw are resources

Scalable Perform anceMeasures scalability subject to perform ance requirem ents

Performance vs. Scalable Performance

Perform ancePredicts end- user experience w ith a single- user load level

ScalabilityPredicts the am ount of hardw are required to achieve a

certain level of throughput

Scalable Perform ancePredicts the am ount of hardw are required to achieve a

certain level of throughput w ith perform ance SLAs

I mpact of Scalabilit y on Performance

Designing for scalability can negat ively im pact single- user perform ance

Building in the ability to scale out has overhead

But single- user perform ance doesn t often m at ter!

Once the m axim um sustainable request rate is exceeded, perform ance w ill degrade

End user apps w ill degrade in a linear fashion as the request queue backs up

Autom ated applicat ions w ill degrade exponent ially

Designing Non-Scalable Applicat ions

Evil Rule 1 : Create SPOBsA Single Points Of Bot t leneck ( SPOB) is any server, service,

etc. that a ll ( or m any) requests have to go through, and

that has any load- associated latency

The sim plest w ay to create a SPOB is to read data from a

shared database as part of request processing

W eb services, m ainfram es, enterprise applicat ions such as

Peoplesoft and SAP, singleton dist r ibuted services, etc. all

provide great opportunit ies for int roducing SPOBs


Evil Rule 2 : I nt roduce concurrency control

bot t lenecksDefault to pessim ist ic concurrency, and hold locks on the

database w henever possible

Default to serializable t ransact ions

I n a m ult i- threaded applicat ion, m ake sure to synchronize

on som e shared object before m aking a database call or a

w eb service invocat ion

Use synchronous logging


Evil Rule 3 : Build in extra t iers and rem ote

invocat ions w henever possibleNever m iss an opportunity to split som ething out as a

rem ote w eb service

Make sure that the JSPs and Servlets m ake rem ote calls to

the EJB t ier

Treat rem ote objects as if they w ere local

Never do in a single SQL statem ent w hat you could spread

across a w hole bunch of individual statem ents


Evil Rule 4 : Push m ore w ork onto the

expensive parts of the infrastructureMake sure that the database is a SPOB, since it has a

typical SF betw een 0 .7 0 and 0 .9 0 and an exponent ially

increasing cost factor for CPU scaling

Mainfram es are big and fast , so don t w orry about calling

the sam e service m ore than once w ith the sam e request

param eters

Amdahl s Law

The increase in perform ance of a paralle l a lgorithm is lim ited by the num ber of operat ions w hich m ust be perform ed sequent ia lly ( in seria l)

The equivalent concept , in large- scale system s servicing concurrent requests: Scalable perform ance is lim ited by the num ber of operat ions w hich m ust ut ilize any shared resource that does not exhibit linear scalability:

Databases, Messaging System s

Replicated data st ructures ( Full Replicat ion or Hub & Spoke)

Shared services, W eb Services, m ainfram e services

Obstacles to Scalable Performance

Database TierApplicat ions that go to the database for each request likely

w ill have scalability problem s

The Database t ier is difficult and expensive to scale; it is

difficult to scale a database server to m ore than a single

host , and it becom es exponent ially m ore expensive to add

CPUs

Database servers scale sub- linearly at best w ith addit ional

CPUs, and there is a CPU lim it

Golden Rules of Scalable Performance

Answ er every request in the current t ier w henever possible ( caching, collocat ion of t iers)

Alw ays assum e that invocat ions of later t iers are expensive operat ions ( ser ia lizat ion, netw ork, etc.)

Treat every single use of a potent ia l SPOB as if you w ere paying for it w ith your ow n m oney, and look for w ays to unload w ork from the SPOBs ( caching, db replicat ion, federat ion, part it ioning)

Elim inate concurrency bot t lenecks by design, by using thread- locality and by queuing sequent ia l operat ions to be perform ed asynchronously if possible

Rem em ber that each t ier is m ore expensive to scale than the one in front of it , and the cost of vert ical scale increases in an exponent ia l fashion and quickly becom es prohibit ively expensive ( $ 1 0 MM DBs)

Predict ing Cost of Scalable Performance

Predict ing costs is a m ajor goal of Scalable Perform ance

Linear scalability im plies a constant cost per addit ional user

The low er the applicat ion SF, the faster the cost per addit ional user goes up

The goal of m easuring scalable perform ance is to build a cost m odel that quant ifies the resource usage of a user, and the SF for each resource as m ore users are added

Horizontal Scale-Out to Reduce Cost

Use com m odity hardw are at com m odity pr ices

Q4 / 0 5 : 2 - core/ 4 GB= $ 3 k, 4 - core/ 1 6 GB= $ 8 k, 8 - core/ 3 2 GB= $ 2 2 k

Cost per GHz and cost per GB are constant

As a bonus, the com m odity CPUs are a lso the fastest

The bigger the server , the slow er and m ore expensive the CPUs

I m pact of server fa ilure is m inim ized

Each server handles a sm aller overall port ion of the load

Supports a cost - effect ive n+ 1 architecture

Horizontal scale- out requires the elim inat ion of SPOBs


Sum m aryArchitect so that the applicat ion is CPU- or m em ory- bound,

and that the bot t leneck is in the applicat ion t ier at the

latest

For high- scale applicat ions, m ake sure that the bot t leneck

w ill never be the database

Benefit : You can use server farm s and server clusters to

scale an applicat ion alm ost linearly and w ith a predictable

cost per user

Cluster ing

Terms: Cluster ingCluster ing enables m ult iple servers or server

processes to w ork together

Cluster ing can be used to horizontally scale a

t ier , i.e . scale by adding servers

Cluster ing usually costs m uch less than buying

a bigger server ( vert ical scaling)

Cluster ing also typically provide fa ilover and

other reliability benefits

Terms: Cluster ing

Prim ary benefits of Clustering:

I f the applicat ion has been built correct ly, it supports

a predictable scaling m odel

Cluster ing allow s relat ively inexpensive CPU and

m em ory resources to be added to a product ion

applicat ion in order to handle m ore concurrent users

and/ or m ore data

Provides redundancy

Sim ple ( n+ 1 ) m odel

Scalabilit y of Cluster ing

The Potent ia l for Negat ive Scale

Single server m odel a llow s unrestr icted caching

Cluster ing m ay require the disabling of caching

Tw o servers often slow er than one!

Data I ntegrity Challenges in a Cluster

How to m aintain the data in sync am ong servers

How to keep in sync w ith the data t ier

How to fa ilover and fa ilback servers w ithout im pact

Cluster ing Concepts

The less com m unicat ion required, the bet ter

Alw ays bet ter to be stateless in a t ier if it does not

cause a bot t leneck in the next t ier

Server farm s: a stateless cluster ing m odel

The less coordinat ion required, the bet ter

I ndependence: Don t go to the com m it tee

Concurrency Control: Reduces scalability, so use

only as necessary


Concurrency Control Opt im izat ions

Opt im ist ic concurrency a llow s m eans that a ll nodes

can w ork independent ly, and only need to use

concurrency cont rol w hen they t ry to com m it their

w ork

By deferr ing concurrency cont rols ( locks,

t ransact ional consistency checks) servers can

process blindly em pow ered w orkers

Applicat ion m ust be designed to handle the

inevitable reject ion of the w ork


Cache Coherency

Caches of read- only data are autom at ically coherent !

The choice for clustered caching of read/ w rite data:

Accept a certain am ount of data staleness

Maintain cache data coherency across the cluster

Clustered data coherency im plies a m eans to synchronize :

Clustered concurrency cont rol ( like Java synchronized )

Part it ioned Transact ional Caching

I nterposing the data caches betw een the applicat ion logic and

the data source prevents loss of consistency


Cluster ing Categories

Stateless: For scalability, e .g. w eb server farm s

Master/ Slave: For availability

Represents a SPOB

Centra lized: Single server for coordinat ion

Represents both a SPOF and SPOB

Hierarchical: Mult i- t iered centralized m odel

Peer- to- Peer: Servers w ork independent ly, but

have know ledge of and direct access to the ent ire

cluster ( cooperat ive w orker m odel)


Protocols

TCP/ I P provides reliable point - to- point

Datagram protocols ( UDP/ I P) include point - to-

point ( unicast ) and group ( m ult icast ) ; not reliable

w ithout a higher level protocol

J2 EE app servers often use m ult icast

Tangosol Coherence uses TCMP ( dynam ically

sw itched uni- & m ult i- cast UDP, TCP/ I P, I B)

JMS is an API , not a protocol; different

im plem entat ions use different protocols


Tangosol Coherence

Transact ional coherent caching w ith concurrency

cont rol, read- through, w rite- through, read- ahead

and w rite- behind caches

TCMP clustering:

Com pletely peer- to- peer: no SPOFs, no SPOBs

Uses sw itched uni/ m ult icast datagram s w ith

reliable in- order delivery

Supports W AN failover

TCMP v2 uses TCP/ I P for death detect ion

TCMP v3 adds TCP/ I P and I B for data

Cluster ing for Scale

Com m on uses for Clustered Caching

HTTP Session Caching

W eb Page and Segm ent Caching

Applicat ion Data Caching

Load Balancing of Data Operat ions

W eb Services result caching

Dram at ically reduce database load by using read-

through/ w rite- behind caching

There is no bet ter w ay to increase scalability than to

use caching to unload later t iers!

Case Studies

Case Study: Travel Sites, Web Services

w w w .random - travel- site.comI f a custom er searches on a flight arr iving in a city on

W ednesday, autom at ically check hotel availability for

Tuesday, W ednesday and Thursday to display w ith the

flight search results

w w w .random - hospitality- chain.comExposes availability as a w eb service, w ith date and

locat ion as param eters

Load w as several t im es ( e.g. 3 x) higher than expected!

http://www.random-travel-site.com

http://www.random-hospitality-chain.com

Case Study: Travel Sites, Web Services

Problem : The hospitality w eb service doesn t support m ult iple dates as input , w hich results in m ore load than predicted due to som e random t ravel site

Solut ion: Caching w eb service results dram at ically reduces load on back end services that provide availability data

I m plem entat ion: Store each w eb service result in a cache, ident ified by the request and its param eters. Use an auto-expiry cache and provide the ability for the back- end system to evict data from the cache, ensuring freshness of data.

Result : Significant ly faster w eb services w ith significant ly less load on expensive back- end system s.

Case Study: MPRPG (Online Gam ing)

w w w .brand- nam e- com pany.comUses m assively parallel role playing gam es to build brand

w ith young audiences

Gam es are t ied t ight ly to other parts of the business,

including various points and rew ard program s

Security is very t ight , and all integrat ion w ith other

business units is through services

I m pressive am ounts of concurrent load: Num ber of players

online, am ount of player act ivity

http://www.brand-name-company.com

Case Study: MPRPG (Online Gam ing)

Problem : Vert ical scale im possible due to socket - based architecture and HA requirem ents. Online gam ers can interact , and the interact ions are expected to be occurr ing in real t im e.

Solut ion: Gam e state and player state are m anaged in- m em ory across the cluster, even though the system - of- record is m anaged via a service elsew here w ithin the com pany.

I m plem entat ion: Use part it ioning of areas w ithin the gam e, and part it ioning of data and responsibilit ies w ithin the cluster . Data services can be invoked and their results can be shared across all servers.

Result : A successful real- t im e gam e engine that scales out on com m odity hardw are and provides HA.

Case Study: Financial Analysis

ht tp:/ / som e- int ranet - site/ analysisA large financial services firm uses an internal analysis

applicat ion to display equit ies posit ions, prices changes,

t rends, and a large num ber of other inform at ion.

The inform at ion is available from a shared database, but

cannot fit into m em ory.

Huge am ounts of data from m any database tables are

required to assem ble even a single w eb page.

http://some-intranet-site/analysis

Case Study: Financial Analysis

Problem : Even w ith stat ic data cached, page t im es are over 1 5 seconds! The data set is so large that it cannot fit into m em ory, and other w ork occurr ing against the database renders the ent ire applicat ion unusable at certa in t im es throughout the day.

Solut ion: Pre- load the ent ire data set , part it ioning it across m ult iple servers, thus ent irely rem oving the database from its involvem ent in page generat ion.

I m plem entat ion: Create a data m odel that closely reflects the needs of the w eb applicat ion. Create a bulk loading process that loads all data into that m odel, stor ing the result ing objects in a large- scale part it ioned cache that is load- balanced across all servers.

Result : Page t im es dropped to sub- second, show ing up to 1 0 0 0 x im provem ent in TTLB. Addit ionally, a ll SPOFs are elim inated.

Case Study: Online Book-Making

w w w .random - gam bling.comCustom ers view the status of various bets and can place

their ow n bets, but w hen things get busy, half of the load

on the ent ire site is related to a single item to bet on.

Various guarantees are provided related to the freshness

of data and the ordering of bet m atching.

http://www.random-gambling.com

Case Study: Online Book-Making

Problem : The site s popular ity can result in tens of thousands of threads looking for the sam e data at the sam e t im e. This effect is caused by in- gam e bet t ing, w hich causes half the load on the ent ire site to be directed against a single bet .

Solut ion: Delegate database accesses to occur only once w ithin the ent ire applicat ion cluster.

I m plem entat ion: W hen a server requires a piece of data that is m issing from its near cache, it directs that request to the server that ow ns that part icular piece of data. I f that server does not contain that data in its L2 cache, it queries the database.

Result : W ith hundreds of threads on each server asking for the sam e data, only one request goes to the data ow ner, and if several hundred servers send a request to that server, only one request is m ade to the database. Only one database query is m ade, yet tens of thousands of concurrent requests can be sat isfied from that one query.

Case Study: Online Broker

w w w .random - brokerage.comApplicat ion built on a relat ively scalable architecture

Custom er grow th has pushed the applicat ion past the lim its

of w hat can be accom plished w ith vert ical scale

User t ransact ion latencies are increasing as a result

( requests are backing up, or t ransact ions are queuing)

http://www.random-brokerage.com

Case Study: Online Broker

Problem : Trade volum e is high enough that the database is saturated by too m any individual t ransact ions at tem pt ing to concurrent ly w rite data; this causes SLAs to be broken, w hich results in penalt ies and lost revenue

Solut ion: Batch w rites together by using w rite- behind caching

I m plem entat ion: Data is w rit ten to a cluster- durable w rite-behind cache, w hich then asynchronously w rites batches of data to the underlying database.

Result : The latency of the cache w rite is in the low m illiseconds, allow ing the applicat ion to achieve the required SLAs. As a bonus, database load is significant ly reduced.

Audience Response

Quest ion?

HA & Scalable Performancemedia.techtarget.com/tss/static/articles/content/J... · web applications...

Documents

Transcript of HA & Scalable Performancemedia.techtarget.com/tss/static/articles/content/J... · web applications...