Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability...

41
1 Advanced Distributed Software Architectures and Technology group ADSaT Scalability & Availability Paul Greenfield CSIRO
  • date post

    22-Dec-2015
  • Category

    Documents

  • view

    214
  • download

    0

Transcript of Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability...

Page 1: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

1Advanced Distributed Software Architectures and Technology group

ADSaT

Scalability & Availability

Paul GreenfieldCSIRO

Page 2: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

2Advanced Distributed Software Architectures and Technology group

ADSaT

Building Real Systems

• Scalable– Fast enough to handle expected load– Grow easily when load grows

• Available– Available enough of the time

• Performance and availability cost– Aim for ‘enough’ of each but not more

Page 3: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

3Advanced Distributed Software Architectures and Technology group

ADSaT

Scalable

• Scale-up– Bigger and faster systems

• Scale-out– Systems working to handle load– Server farms– Clusters

• Implications for application design

Page 4: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

4Advanced Distributed Software Architectures and Technology group

ADSaT

Available

• Goal is 100% availability– 24x7 operations

• Redundancy is the key– No single points of failure– Spare everything

• Disks, disk channels, processors, power supplies, fans, memory, ..

• Automated fail-over and recovery

Page 5: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

5Advanced Distributed Software Architectures and Technology group

ADSaT

Performance

• How fast is this system? – Not the same as scalability but related

• Scalability is concerned with the limits to possible performance

– Measured by response time and throughput

– Aim for enough performance• Have a performance target• Tune and add hardware until target hit• Then worry about tomorrow…

Page 6: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

6Advanced Distributed Software Architectures and Technology group

ADSaT

Performance Measures

• Response time– What delay does the user see?– Instantaneous is good but 95%

under 2 seconds is acceptable– Response time varies with

‘heaviness’ of transactions• Fast read-only transactions• Slower update transactions• Effects of database contention

Page 7: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

7Advanced Distributed Software Architectures and Technology group

ADSaT

Response TimesKeytable performance

0

2000

4000

6000

8000

10000

12000

14000

1 5 10 20 50 100 200 400 600 800 1000

Clients

Res

po

nse

tim

e (m

s)

Buy

Create

Get HS

Query C

Query ID

Sell

Update

Page 8: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

8Advanced Distributed Software Architectures and Technology group

ADSaT

Response TimesIdentity performance

0

500

1000

1500

2000

2500

3000

1 5 10 20 50 100 200 400 600 800 1000

Clients

Res

po

nse

tim

e (m

s)

Buy

Create

Get HS

Query C

Query ID

Sell

Update

Page 9: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

9Advanced Distributed Software Architectures and Technology group

ADSaT

Response TimesC++ response times

remote db - identity & keytable

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

10000

0 200 400 600 800 1000 1200

Clients

Res

po

nse

tim

e (m

s)

Read ident

Update ident

Average ident

Read key

Update key

Average key

Page 10: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

10Advanced Distributed Software Architectures and Technology group

ADSaT

Throughput

• How many transactions can be handled in some period of time– Transactions/second or tpm, tph or tpd– A measure of overall capacity

• Transaction Processing Council– Standard benchmarks for TP systems– TPCC for typical transaction system– www.tpc.org– Current record is 227,000 tpmc

Page 11: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

11Advanced Distributed Software Architectures and Technology group

ADSaT

Throughput

• Throughput increases until some resource limit is hit– Adding more clients just increases

the response time– Run out of processor, disk

bandwidth, network bandwidth– Some resources overload badly

• Ethernet network performance degrades

Page 12: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

12Advanced Distributed Software Architectures and Technology group

ADSaT

ThroughputC++ transaction rates

0

50

100

150

200

250

300

350

400

450

500

0 200 400 600 800 1000 1200

Client threads

TP

S

Local keytable

Local Identity

Remote identity 10M

Remote identity 100M

Remote keytable 100M

Page 13: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

13Advanced Distributed Software Architectures and Technology group

ADSaT

System Capacity

• How many clients can you support?– Name an acceptable response time– Average 95% under 2 secs is common

• And what is ‘average’?

– Plot response time vs # of clients• Great if you can run benchmarks

– Reason for prototyping and proving proposed architectures before leaping into full-scale implementation

Page 14: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

14Advanced Distributed Software Architectures and Technology group

ADSaT

System CapacityC++ average response times

0

1000

2000

3000

4000

5000

6000

7000

8000

9000

0 200 400 600 800 1000 1200

Client threads

Res

po

nse

tim

e (m

s)

Local keytable

Remote keytable

Local identity

Remote identity

Page 15: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

15Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing I

• A few different but related meanings• 1. Balancing across server processes

– CORBA-style where clients use objects that live inside server processes

– Want all server processes to be busy– Client calls have to go to the process

containing their object, even if this process is busy and others are idle

Page 16: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

16Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing I

Simple Load balancing

02468

101214

0 10 20 30

Servers

%Load

No LoadBalancing

Load Balanced

Load Balanced

Page 17: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

17Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing I

• Client calls on name server to find the location of a suitable server

• Name server can spread client objects across multiple servers– Often ‘round robin’

• Client is bound to server and stays bound forever– Can lead to performance problems

Page 18: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

18Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing I 

Server Object Reference

Client Numbers

Total Clients per server object

1 1-100 100

2 101-200 100

3 201-300 100

4 301-400 100

5 401-500 100

 

Server Object Reference

Client Numbers

Total Clients per server object

1 1-100, 201, 206, 211, ….496

160

2 101-200, 202, 207, 212, …, 497

160

3 203, 208, 213, …, 498

60

4 204, 209, 214, …, 499

60

5 205, 210, 215, …, 500

60

Initial Later

Page 19: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

19Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing I

• Solution to static allocation problem is for clients to throw away their server objects and get new ones every now and again

• Application coding problem– And can be objects be discarded?– What kind of ‘objects’ are they if

they can be discarded?

Page 20: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

20Advanced Distributed Software Architectures and Technology group

ADSaT

Name Servers

• Server processes call name server when they come up– Advertising their services

• Clients call name server to find the location of a server process– Up to the name server to match

clients to servers• Client calls server process to

create objects

Page 21: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

21Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing I

Client

Client

Client

Name Server

Server process

Server process

Advertise service

Request server reference

Return server reference

Call server object’s methods

Get server object reference

Load balancing across processes within a server

Page 22: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

22Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

• What happens when our single system is full?– Use faster systems

• Scale-up

– Use additional systems• Scale-out• Now load-balancing is used to spread

load across systems

Page 23: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

23Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

• CORBA world…– Name server can distribute across

server processes running on different systems

– Scales well…• Name server only involved when

handing out a reference to a server, not on every method call

Page 24: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

24Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

Client

Client

Client

Name Server

Server process

Server process

Advertise service

Request server reference

Return server reference

Call server object’s methods

Get server object reference

Load balancing across multiple systems

Page 25: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

25Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

• COM+ world…– No need for load-balancing within a

system• Multithreaded server process• All objects live in a single process space

– Component load balancing across systems• Client calls router when creating object• Router returns reference to an object in a

COM+ server process• Load balanced at time of object creation

Page 26: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

26Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

Client

Client

Client

App

DLL

DCOM/

MTS

MTS process

Thread pool

Shared object space

Application code

COM+/MTS using thread pools rather than load balancing within a single system

Page 27: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

27Advanced Distributed Software Architectures and Technology group

ADSaT

COM+ Component Load Balancing

Client

Client

Client

Response time tracker

RouterCreate object

Call object’s methods

Pass request to server

Create object and pass back reference

COM + CLB balancing load across multiple systems

Page 28: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

28Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

• COM+ scales well…– Router only involved when object is

created• May change in later release to support

dynamic re-balancing as server load changes

– Method calls direct from client to server– Allocation based on response time

rather than round-robin• Allocate to least-loaded server

Page 29: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

29Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

• No name server in COM world?– COM/MTS clients ‘know’ the name

of the server• Set at client installation time• Can change using GUI tools• Admin problem if server app is moved

– COM+ uses Active Directory to find services

Page 30: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

30Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

• Some systems involve the router in every method call/request– Request goes to router process who

then passes it on to a server process– Scales poorly as the router can be a

major bottle-neck– Some availability concerns as well

• What happens if the router fails?

Page 31: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

31Advanced Distributed Software Architectures and Technology group

ADSaT

Load Balancing II

Client

Client

Client

Router

Server process

Server process

Load balancing with router in main call path

Page 32: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

32Advanced Distributed Software Architectures and Technology group

ADSaT

Scale-up

• No need for load-balancing across systems

• Just use a bigger box– Add processors, memory, ….– SMP (symmetric multiprocessing)

• Runs into limits eventually• Could be less available

Page 33: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

33Advanced Distributed Software Architectures and Technology group

ADSaT

Scale-up

• Example from the Web– Large auction site– Server farm of NT boxes (scale-out)– Single database server (scale-up)

• 64-processor SUN box

– More capacity needed?• Add more NT boxes easily• SUN box is full so have to shift some

databases to another box

Page 34: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

34Advanced Distributed Software Architectures and Technology group

ADSaT

Clusters

• A group of independent computers acting like a single system– Shared disks– Single IP address– Single set of services– Fail-over to other members of cluster– Load sharing within the cluster– DEC, IBM, MS, …

Page 35: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

35Advanced Distributed Software Architectures and Technology group

ADSaT

ClustersClient PCsClient PCs

Server AServer A Server BServer B

Disk cabinet ADisk cabinet A

Disk cabinet BDisk cabinet B

HeartbeatHeartbeat

Cluster managementCluster management

Page 36: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

36Advanced Distributed Software Architectures and Technology group

ADSaT

Clusters

• Address scalability– Add more boxes to the cluster

• Address availability– Fail-over– Add & remove boxes from the

cluster for upgrades and maintenance

• Can be used as one element of a highly-available system

Page 37: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

37Advanced Distributed Software Architectures and Technology group

ADSaT

Web Server Farms

• Web servers are highly scalable– Web applications are normally stateless

• Next request can go to any Web server• State comes from client or database

– Just need to spread incoming requests• IP sprayers (hardware, software)• >1 Web server looking at same IP address

with some coordination (see MS WLB docs)

– Same technique for other network apps

Page 38: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

38Advanced Distributed Software Architectures and Technology group

ADSaT

Available SystemWeb Clients

Web Servers Load balanced using Convoy

App Servers use COM+ LB

Database is installed on Wolfpack cluster for high availability

COM+ LBS router node

Page 39: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

39Advanced Distributed Software Architectures and Technology group

ADSaT

Availability

• How much?– 99% 87.6 hours a year– 99.9% 8.76 hours a year– 99.99% 0.876 hours a year

• Need to consider operations as well– Maintenance, software upgrades,

backups, application changes– Not just faults and recovery time

Page 40: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

40Advanced Distributed Software Architectures and Technology group

ADSaT

Availability and Scalability• Often a question of application design

– Stateful vs stateless• What happens if a server fails?• Can requests go to any server?

– What language and database API• Balance cost vs speed – VB/C++ - ODBC/ADO

– Synchronous method calls or asynchronous messaging?• Reduce dependency between components• Failure tolerant designs

Page 41: Advanced Distributed Software Architectures and Technology group ADSaT 1 Scalability & Availability Paul Greenfield CSIRO.

41Advanced Distributed Software Architectures and Technology group

ADSaT

Next Week

• Distributed application architectures– How to design systems that will

work, scale and be available– Web-based systems– Web technology