Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some...

71
© 2009 IBM Corporation Clustering for Disaster Recovery Susan Bulloch IBM Monday, June 18, 12

Transcript of Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some...

Page 1: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation

Clustering for Disaster RecoverySusan BullochIBM

Monday, June 18, 12

Page 2: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation2

In This Session …

Clustering is one of the best Domino features– Introduced with Release 4.5 in December of 1996

• We have seen an increased adoption in clustering Domino servers over the past two yearsMuch of that was for disaster recovery

Some enterprises have been using Domino clustering for disaster recovery for years– This session shares as much information as possible about the topic based on what I’ve

seen, heard about, and discovered creating Domino-based disaster recovery solutions• And most importantly, the session speaks to the all important issues about managing

failovers

Monday, June 18, 12

Page 3: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation3

First, let’s make sure that everyone knows when you see the letters DR, we’re talking about disaster recovery

Disaster recovery is part of a larger concept referred to as business continuity– That’s what we plan to do to keep our businesses running while disruptive events occur

• These events could be extremely local, like a power outage in a building• Or they could be regional, like an earthquake or tornado, or a disaster caused by

humans doing their thing

Disaster recovery has its focus on the technology that supports business operations

Setting a Baseline for Concepts

Monday, June 18, 12

Page 4: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation4

Why Use Domino for Disaster Recovery?

Domino clustering has been accepted by many organizations as an important part of their DR infrastructure– Although that’s not a total endorsement, as corporations do wacky things

Clustering works with just about all things Lotus Notes/Domino, such as:– Email, calendaring, Traveler, BlackBerry, and roaming user

Clustering is included as part of your enterprise server licenses– Clusters clearly should be exploited in every enterprise

• And they should have a solid role in any DR solution for messaging and collaboration

Monday, June 18, 12

Page 5: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation5

Important Facts to Help You Sell Clustering for DR

Here are some facts to back you up when you start making plans to use Domino clusters for DR– Clustering is a shrink wrapped solution

• It’s not new. As a matter of fact, it’s been burned in for years.– It works really well

• It’s automatic• It’s easy to set up and maintain• It’s easy to test

The biggest drawback for most companies using clusters is the increase in storage requirements since their data size is double– Domino Attachment and Object Services (DAOS) can reduce that size by 30% to 40%

Monday, June 18, 12

Page 6: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation

Requirements and Architecture

Monday, June 18, 12

Page 7: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation7

Basic Clustering Requirements

All servers in the cluster must:– Run on Domino Enterprise or Domino Utility server– Be connected using high-speed local area network (LAN) or a high-speed wide area

network (WAN)– Use TCP/IP and be on the same Notes named network– Be in the same Domino domain– Share a common Domino directory– Have plenty of CPU power and memory

• It’s safe to say that clustered servers need more power and more disk resources than unclustered servers

A server can be a member of only one cluster at a time

Monday, June 18, 12

Page 8: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation8

Technical Considerations for Disaster Recovery Clusters

A few guidelines regarding configuring clusters, especially for clusters used for disaster recovery– Servers in a cluster should not use the same power source– Nor should they use the same network subnet or routers– They should not use the same disk storage array– They should not be in the same building

Your clustered DR solution should ideally be in different cities– Never in the same room

Monday, June 18, 12

Page 9: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation9

Keep Your Distance

Good DR cluster designs should take into account both local and regional problems

Consider a financial company who had clustered servers in two separate buildings across the street from each other in Manhattan– This firm now has primary servers in offices in New York City

• And failover servers are thousands of miles away

Another firm has primary servers in Chicago– With a failover server in the UK

A college has primary and failover servers separated by 200 miles

Another company we know is just starting out with DR and has servers 25 miles from each other– A good start, but they really want more distance

Monday, June 18, 12

Page 10: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation10

Servers Accessed Over a Wide Area Network

If servers are as far away as they are supposed to be, there might be some latency in the synchronization of the data– If this is the case, users might not find the failover server is up to date with their primary

server during a failover

Everyone must be aware that this is a possibility– Expectations must be set

• Or management needs to provide budgets for better networking

Work out all of these details in advance with management– Get them written down and approved so there are no surprises

Monday, June 18, 12

Page 11: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation11

Most Common DR Cluster Configuration

The most common DR cluster configuration is active/passive– Servers on site are active– Servers at the failover site are passive, waiting for failover

• Sometimes domains use this failover server as a place to do backups or journaling• The number of servers in the cluster usually vary

There could be 1 active and 1 passiveOr 2 or 3 active and 1 passive

Monday, June 18, 12

Page 12: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation12

The Active/Passive Clustered Servers

Active/Passive servers– Cluster has two servers: one active, the other not generally used or used for backing up

• Very common disaster recovery setup

Each server holds replicas of all the files from the other server– During failover, all users flip to the other server

Monday, June 18, 12

Page 13: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation13

The 3 Server Cluster for DR

Three or more servers in the cluster– There are two replicas for each cluster-supported application or mail file

Each primary server holds the mail files of the users assigned to that server– Replicas of mail file from both primary servers are on the failover

Monday, June 18, 12

Page 14: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation14

3 Server Cluster with One Primary Server Down

If a primary server goes down, users from that server go to the failover server– Easy to understand, and you save yourself a server

• You’ll still need twice the total disk space of Mail1 and Mail2What happens when both Mail1 and Mail2 are unavailable

Monday, June 18, 12

Page 15: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation15

Both Primary Servers Are Down

If both primary servers are down, the last server in the cluster has to support everyone– Remember that generally speaking, only about 60% to 70% of users assigned to a

server are on there concurrently• Still that has to be a pretty strong server with fast disks

Some sites have remote servers as primary and failover happens at the home office data center

Monday, June 18, 12

Page 16: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation

Mastering Cluster Replication

Monday, June 18, 12

Page 17: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation17

Understanding Cluster Replication

Cluster replication is event-driven– It doesn’t run on a schedule

• The cluster replicator detects a change in a database and immediately pushes the change to other replicas in the cluster

If a server is down or there is significant network latency, the cluster replicator stores changes in memory so it can push them out when it can– If a change to the same application happens before a previous change has been sent,

the CLREPL gathers them and sends them all together

Monday, June 18, 12

Page 18: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation18

Streaming Cluster Replication

Domino 8 introduced Streaming Cluster Replication (SCR)– This newer functionality reduces replicator overhead

• Provides reduction in cluster replicator latency

As changes occur to databases, they are captured and immediately queued to other replicas in the same cluster– This makes cluster replication more efficient

Monday, June 18, 12

Page 19: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation19

SCR Only Works with Domino 8 and newer

If one server in the cluster is Domino 8.x another is not, Domino will attempt SCR first– When that doesn’t work, it will fall back on standard cluster replication

If you’d like to turn off SCR entirely to ensure compatibility, use this parameter– DEBUG_SCR_DISABLED=1

• This must be used on all cluster mates

Monday, June 18, 12

Page 20: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation20

Only One Cluster Replicator by Default

When a cluster is created, each server has only a single cluster replicator instance– If there have been a significant number of changes to many applications, a single

cluster replicator can fall behind• Databases synchronization won’t be up to date

If a server fails when database sync has fallen behind, users will think their mail file or app is “missing data”– They won’t understand why all the meetings they made this morning are not there

• They think their information is gone forever!Users need their cluster insurance!

Monday, June 18, 12

Page 21: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation21

Condition Is Completely Manageable

Adding a cluster replicator will help fix this problem

You can load cluster replicators manually using the following console command– Load CLREPL

• Note that a manually loaded cluster replicator will not be there if the server is restarted after manually loading a cluster replicator

Add cluster replicators permanently to a server– Use this parameter in the NOTES.INI

• CLUSTER_REPLICATORS=#

I always use at least two cluster replicators

Monday, June 18, 12

Page 22: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation22

When to Add Cluster Replicators

But how do you tell if there’s a potential problem?– Do you let it fail and then wait for the phone to ring?

• No!

You look at the cluster stats and get the data you need to make an intelligent decision– Adding too many will have a negative effect on server performance

Here are some important statistics to watch

Monday, June 18, 12

Page 23: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation23

Key Stats for Vital Information About Cluster Replication

Statistic What It Tells You Acceptable values

Replica.Cluster.SecondsOnQueue

Total seconds that last DB replicated spent on work queue

< 15 sec – light load

< 30 sec – heavyReplica.Cluster.SecondsOnQueue.Avg

Average seconds a DB spent on work queue

Use for trending

Replica.Cluster.SecondsOnQueue.Max

Maximum seconds a DB spent on work queue

Use for trending

Replica.Cluster.WorkQueueDepth

Current number of databases awaiting cluster replication

Usually zero

Replica.Cluster.WorkQueueDepth.Avg

Average work queue depth since the server started

Use for trending

Replica.Cluster.WorkQueueDepth.Max

Maximum work queue depth since the server started

Use for trending

Monday, June 18, 12

Page 24: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation24

What to Do About Stats Over the Limit

Acceptable Replica.Cluster.SecondsOnQueue– Queue is checked every 15 seconds, so under light load, should be less than 15

• Under heavy load, if the number is larger than 30, another cluster replicator should be added

If the above statistic is low and Replica.Cluster.WorkQueueDepth is constantly higher than 10 …– Perhaps your network bandwidth is too low

• Consider setting up a private LAN for cluster replication traffic

Monday, June 18, 12

Page 25: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation25

Stats That Have Meaning but Have Gone Missing

There aren’t any views in Lotus version of Statrep that let you see these important statistics– Matter of fact, the Cluster view is pretty worthless

Monday, June 18, 12

Page 26: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation26

The Documents Have More Information

The cluster documents have much better information– You can actually use the data in the docs

• But they still lack key stats, though they’re in each doc

Monday, June 18, 12

Page 27: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation27 30

Stats That Have Meaning but Have Gone Missing

But there is a view like that in the Technotics R8.5 Statrep.NTF– It shows the key stats you need

• To help track and adjust your clusters• Link is included for this conference

Monday, June 18, 12

Page 28: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation28

Technotics Column Additions to Statrep

Column Title Formula Formatting

Min on Q Replica.Cluster.SecondsOnQueue / 60 Fixed (One Decimal Place)

Min/Q Av Replica.Cluster.SecondsOnQueue.Avg / 60 Fixed (One Decimal Place)

Min/Q Mx Replica.Cluster.SecondsOnQueue.Max / 60 Fixed (One Decimal Place)

WkrDpth Replica.Cluster.WorkQueueDepth General

WD Av Replica.Cluster.WorkQueueDepth.Avg General

WD Mx Replica.Cluster.WorkQueueDepth.Max General

Monday, June 18, 12

Page 29: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation29

Use a Scheduled Connection Document Also

Back up your clustered replication with a scheduled connection document between servers– Have it replicate at least once per hour

• You’ll always be assured to have your servers in sync even if one has been down for a few daysAnd it replicates deletion stubs too!

Monday, June 18, 12

Page 30: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation

Setting up a Private LAN for Clustering

Monday, June 18, 12

Page 31: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation31

Busy Clusters Might Require a Private LAN

A private LAN separates the network traffic the cluster creates for replication and server probes– And will probably leave more room on your primary LAN

Start by installing an additional network interface card (NIC) for each server in the cluster– Connect the NICs through a private hub or switch

Monday, June 18, 12

Page 32: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation32

Setting Up the Private LAN

Assign second IP address to additional NIC

Assign host names to the addresses in the local HOSTS file on each server– Using DNS is a best practice

• 10.200.100.1 mail1_clu.domlab.com• 10.200.100.2 mail2_clu.domlab.com

Test by pinging the new hosts from each server

Monday, June 18, 12

Page 33: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation33

Modify Server Document

For each server in the cluster, edit the server document to enable the new port

Monday, June 18, 12

Page 34: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation34

Set Parameters so Servers Use Private LAN

Make your clusters use the private LAN for cluster traffic by establishing the ports in the server NOTES.INI with these parameters– CLUSTER=TCP,0,15,0– PORTS=TCPIP,CLUSTER– CLUSTER_TCPIPADDRESS=0,10.200.100.2:1352

• You will use the address of your NIC card

Monday, June 18, 12

Page 35: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation35

Parameters to Make the Cluster Use the Port

Use the following parameter to ensure Domino uses the port for cluster traffic– SERVER_CLUSTER_DEFAULT_PORT=CLUSTER

Use this parameter just in case the CLUSTER port you’ve configured isn’t available– SERVER_CLUSTER_AUXILIARY_PORTS=*

• This allows clustering to use any port if the one you’ve defined isn’t available

Monday, June 18, 12

Page 36: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation36

Keep Users Off the Private LAN

To keep users from grabbing on to the private LAN port, take the following steps– Create a group called ClusterServers

• Add the servers in the cluster to this group

Add the following parameter to the NOTES.INI of both servers – It will keep users from connecting through the CLUSTER port

• Allow_Access_Cluster=ClusterServers

Monday, June 18, 12

Page 37: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation

Managing Cluster Failover and Load Balancing

Monday, June 18, 12

Page 38: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation38

Respect the Users

Clustering provides outstanding service levels for users– But the process of failing over is sometimes hard on users

• Failover is actually the most difficult moment for users

And sometimes errors in network configuration might prevent successful failover– For example, the common name of the server should be listed as an alias in DNS to

ensure users can easily open their application on the servers• If the server is not in DNS, the clients won’t know how to get to the failover servers

Monday, June 18, 12

Page 39: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation39

Best Practice for Cluster Management

Best Practice:– Don’t take a clustered server down during working hours unless it is absolutely

necessary• A non-planned server outage, such as a crash or power failure, is a legitimate reason

to fail over

Resist the urge to take a server down because you know it’s clustered– You could probably do it, but the risk of a hard failover will probably cause unwanted

help desk calls

Monday, June 18, 12

Page 40: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation40 43

Easiest Cluster Configuration to Manage

The Active/Passive model of clustering is by far the easiest to manage

Use parameters in the NOTES.INI file on the servers in the cluster that allow users on the primary one server– But don’t allow them on the failover server

Monday, June 18, 12

Page 41: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation41

Check for Server Availability

The parameters we use are thresholds that check the cluster statistic Server.AvailabilityIndex (SAI)– This statistic shows how busy the server is

• 100 means it’s not busy at all• 0 means it’s crazy busy

Monday, June 18, 12

Page 42: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation42

Adjusting the Threshold

Setting the parameter Server_availablity_threshold controls whether users can access the server– 50 means if the SAI is above 50, then failover users to another server

• A setting like this can be used for load balancing– 100 means the SAI must be 100, which means the server must be 100% available

• This translates into “nobody is allowed on the server”– 0 means that load balancing and checking the SAI is turned off

These thresholds can come in handy

Monday, June 18, 12

Page 43: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation43

Setting Up Active/Passive Servers in a Cluster

Let’s look at the following scenario– Mail1 is the active primary; Mail2 is the passive failover

To allow users to access their primary server, use this parameter in the NOTES.INI of Mail1– Server_availablity_threshold = 0

• Use this console command:Set config server_availability_threshold=0

Prevent users from accessing failover server Mail2; use this parameter– Server_availablity_threshold = 100

• Use this console command:Set config server_availability_threshold=100

Administrators will still be able to access this server

Monday, June 18, 12

Page 44: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation44

Mail1 Is Crashing

If Mail1 crashes, the Notes client will disregard our setting of 100 on Mail2 and users will be permitted

To help stabilize the system use this parameter on Mail2– Server_availability_threshold = 0

• Let all users aboard

While Mail1 is down, enter this parameter into the NOTES.INI to prevent users from connecting:– Server_restricted=2

• 2 will keep the setting after a server restart• Setting it to 1 also keeps them off, but the setting will be disabled with a 0 after

restarting the server

Monday, June 18, 12

Page 45: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation45

Recovering After a Crash

When Mail1 is brought back up after the crash, no one will be permitted to access it except administrators– That’s because of the Server_restricted=2 setting

• Leave it that way until the end of the day

The ugliest part about failing over is the client part– Clients are working just fine on Mail2

• By the way, iMap and POP3 users still have access to Mail1

At the day’s end, switch the Server_availability_threshold back to 100 on Mail2 and 0 on Mail1

Issue this console command on Mail2– Drop all

Monday, June 18, 12

Page 46: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation46

Taking a Clustered Server Down Voluntarily

If you must take clustered Mail1 server down, set Server_restricted=2, then drop all at the console– Remember that POP3 and iMap users won’t like you very much

Set Mail2 to Server_availability_threshold to 0

Don’t forget to set the Server_Restricted back to 0 when the crisis has passed– I know someone who forgot to do this a couple of times

• This person could access the server and work because he was in an administrator’s groupHowever, nobody else could get on the server and he made all the users very angry

Monday, June 18, 12

Page 47: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation47

Triggering Failover

You can set the maximum number of concurrent NRPC users allowed to connect to a server– Server_MaxUsers NOTES.INI variable

• Set variable to a number determined in planning stage

Set variable using console command– Or use NOTES.INI tab in server configuration document– Set config Server_MaxUsers = desired maximum number of active concurrent users

• Additional users will fail over to the other members of the cluster

Monday, June 18, 12

Page 48: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation48

Load Balancing

If you’d like to load balance your servers, determine what your comfort range is for how busy your servers are and set the Server_availablity_threshold accordingly– Perhaps start with a value of 60

• Users should fail over when the SAI goes below 60

Pay close attention to the SAI in the STATREP.NSF, which is listed under the Av Inx column– Some hardware can produce inaccurate SAI reading and cause users to fail over when

it’s not necessary

Monday, June 18, 12

Page 49: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation49

SAI Is Unreliable in Some Cases

Note how in this case, the Server Availability Index never seems to get much above 50 consistently– Users would be failing over constantly

• And if both servers had the issue, users would be bouncing back and forth between the clustered servers

Monday, June 18, 12

Page 50: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation50

The Expansion Factor

• Cluster members determine their workload based on the expansion factor– This is calculated based on response times for recent requests– Server compares recent response time to minimum response time that the server has

completed• Example: Server currently averages 12ms for DBOpen requests; minimum time was

4msExpansion factor = 3 (current time/fastest time)

– This is averaged over different types of transactions• Fastest time is stored in memory and in LOADMON.NCF

LOADMON.NCF is read each time server starts

Monday, June 18, 12

Page 51: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation51

The Expansion Factor (cont.)

But sometimes Domino has a difficult time calculating the expansion factor– The result is that the Server_AvailabilityIndex is not a reliable measure of how busy the

server is• This can happen with extremely high performing servers

If you see a very low Server_AvailabilityIndex at a time you know servers are supposed to be idle and you are trying to load balance, there is something you can do to correct it– And Domino can help!

Monday, June 18, 12

Page 52: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation52

Changing Expansion Factor Calculation

Use this parameter to change how the Expansion Factor is calculated– SERVER_TRANSINFO_RANGE=n

To determine the optimal value for this variable:– After the server has experienced heavy usage use this console command:

• Show AI• This means, show the availability index calculation

Monday, June 18, 12

Page 53: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation53

An Easy Way to Find the Parameter Value

Show AI is a console command that has been around since Domino Release 6– It runs some computations on the server

• And suggests a SERVER_TRANSINFO_RANGE for you

Monday, June 18, 12

Page 54: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation54

Events to Monitor When Using Clusters

Use event monitoring to look for certain problems that could ruin your clustering by preventing replication

– Look for the following phrases• “cannot replicate database”• “folder is larger than supported”

They both mean that users are going to hate you in the event of a failover» Because their databases will not be in sync

Monday, June 18, 12

Page 55: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation55

Disaster Recovery Is a Special Case

Many enterprises configure their clusters for manual failover– They don’t want the user to fail over unless they permit it

To keep users off of the failover servers, use the following parameters in the NOTES.INI of the server– Server_restricted=2

• This will keep off all users until you set it to a zero– Server_availability_threshold=100

• The server looks busy and keeps everyone on the primary• Set it to zero to allow users on the server

Keep in mind, if the primary servers are down, the users won’t have a server to work on unless you manually reset these parameters

Monday, June 18, 12

Page 56: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation

The role of iNotes in DR

Monday, June 18, 12

Page 57: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation57

Things That Do Not Cluster with Domino Clustering

Only traffic over NRPC port 1352 will fail over

Other TCP/IP services will not fail over, including:– IMAP– POP3– HTTP

This means iNotes users will not fail over– But that doesn’t mean you can’t make it work

• In the aftermath of Katrina, some companies found that browser-based email was a savior

Monday, June 18, 12

Page 58: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation58

High Availability Is Not Disaster Recovery

IBM/Lotus has several recommendations for achieving high availability with iNotes

For example, Internet Cluster Manager– This can redirect the initial URL for an application to one of several back-end mail

servers• However, if the mail server becomes unavailable after a session is started, there is a

way to recover and switch to another server that has a replica of the mail file

IBM/Lotus also recommends an HTTP load balancer to help with high availability, and with DR to a certain extent– The issue is, what happens when the load balancer is unavailable?

Monday, June 18, 12

Page 59: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation59

iNotes Users Accessing a Hub Server

The IWAREDIR.NSF is an application that helps direct browser requests to a user’s mail server– Any user on either server Mail1 or Mail2 connects to either iNotes server that is

deployed using the IWAREDIR.NSF• The IWAREDIR.NSF looks up their name in the Domino directory and “redirects”

them to their mail serverBut it is not designed to take them to a failover server

However, it is possible to have a second version of this file with code changes that will take a user to their failover mail server

Monday, June 18, 12

Page 60: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation60

Open the IWAREDIR.NSF Using the Designer Client

Open the AutoLogin form using the Notes Designer Client– At the very bottom, you’ll find a field called $$HTMLHead

• This contains the code that discovers the user’s mail server and mail file name

Monday, June 18, 12

Page 61: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation61

Section of Code That Can Be Modified

In this field is code that can be modified to point the user to the failover server rather than to their primary mail server– There’s a link at the end of this presentation to some extreme programming you can do

with this field

Monday, June 18, 12

Page 62: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation62

Code to Point Users to Failover Server

If the address book says the user’s mail is on Mail1, but you want to take them to server Mail1F, the failover box, the code can be changed to something like this– This is an @if statement that essentially says if the directory states that the MailServer

for the user is Mail1, take them to Mail1F, the failover server• This failover version of the IWAREDIR.NSF that could be called IWAREDIRF.NSF

Monday, June 18, 12

Page 63: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation63

During Normal Operation

During a normal day with no need for failover, the IWAREDIR.NSF would be specified as the home URL for the server– Users would be taken to their home servers and their mail files automatically

Monday, June 18, 12

Page 64: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation64

During a Failover Situation

When required, this configuration can be changed to use the failover version, the IWAREDIRF.NSF– After this change is made, HTTP must be restarted

Monday, June 18, 12

Page 65: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation65

Restarting the HTTP Task

• Although the HTTP task can be restarted, the most reliable method is to shut down HTTP and load it again

• From the Domino console, use these commands• Tell HTTP quit• Load HTTP

Monday, June 18, 12

Page 66: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation66

iNotes Is There When You Need It

When there are no fat Notes clients, iNotes can have extreme value and maintain communications in your organization– It’s worth planning it out so that when users call the help desk during an emergency,

they can point the users to any DR server that is configured to use the IWAREDIRF.NSF• That’s the failover version of your redirector

Monday, June 18, 12

Page 67: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation67

7 Rules for Clustering Domino Servers for Disaster Recovery

1. Get agreement from all parties before you start configuring

2. Be sure everyone agrees on what failover means and what kind of disaster you are preparing for, such as local or regional outages

3. Keep primary and failover servers as far apart as possible

4. Actually test the failover scenarios so that there is no doubt that the configuration works

5. Consider manual failover to prevent users from accessing servers over a wide area network or slow connections

6. Review cluster statistics regularly to ensure there are enough cluster replicators

7. Review the CLDBDIR.NSF to make sure there is a failover replica for each database on the primary server

Monday, June 18, 12

Page 68: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation68

Resources

Achieving high availability with IBM Lotus iNotes– www-10.lotus.com/ldd/dominowiki.nsf/dx/

Achieving_high_availability_with_IBM_Lotus_iNotes

ServersLookup Form for IWAReder to return User replicas for Load Balancers and Reverse Proxies to use– www-10.lotus.com/ldd/dominowiki.nsf/dx/

ServersLookup_Form_for_IWAReder_to_return_User_replicas_for_Load_Balancers_and_Reverse_Proxies_to_use

Understanding IBM Lotus Domino server clustering– www.ibm.com/developerworks/lotus/documentation/d-ls-dominoclustering/index.html

How to test cluster failover on one Domino database (application)– www-304.ibm.com/support/docview.wss?rs=899&uid=swg21280021

Monday, June 18, 12

Page 69: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation69

Resources

How to test cluster failover on one Domino database (application)– www-304.ibm.com/support/docview.wss?rs=899&uid=swg21280021

How to test cluster failover on one Domino database (application)– www-304.ibm.com/support/docview.wss?rs=899&uid=swg21280021

Link to Andy’s statrep database– http://www.andypedisich.com/blogs/andysblog.nsf/dx/statrep85Technotics.ntf/$file/

statrep85Technotics.ntf– http://www.andypedisich.com/blogs/andysblog.nsf/dx/resources.htm

How to build a cluster– http://www.turtleweb.com/turtleblog.nsf/dx/Build%20A%20Domino%20Cluster.pdf/$file/

Build%20A%20Domino%20Cluster.pdf

Monday, June 18, 12

Page 70: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation70

Summarizing

Disaster recovery is part of the bigger concept known as business continuity

The biggest drawback when using DR clusters is disk space consumption, which can be decreased by implementing Domino Attachment and Object Service

A three-server cluster, with two primary servers and one failover they share, is an excellent way to conserve server resources and licensing costs

Make sure your clustered servers are configured with scheduled connection documents because deletion stubs don’t replicate via cluster replication

Use the Allow_Access_Portname parameter to prevent users from accidently using the private LAN between clustered servers

Failing over is the hardest part for users

iNotes is an excellent way to keep users connected during a disaster

Monday, June 18, 12

Page 71: Clustering for Disaster Recovery Susan Bulloch IBM...Much of that was for disaster recovery Some enterprises have been using Domino clustering for disaster recovery for years ... requirements

© 2009 IBM Corporation

Questions?contact me at

[email protected] (notesgoddess on

twitter)

Monday, June 18, 12