Advanced Clustering Techniques for Maximizing Web Site

© Copyright IBM Corporation 2003

Advanced Clustering Techniques for Maximizing Web Site Availability with

WebSphere® Application Server, Version 5

Authors: High-Volume Web Site Team

Web address: ibm.com/websphere/developer/zones/hvws Management contact: Ursula Richter [email protected] Technical contact: Luis Ostdiek [email protected] Date: July 7, 2003 Status: Version 1.0

®

Abstract: Server clustering is critical to the on-demand operating environment and, in particular, to the Web infrastructure. Server clustering can be used to help achieve continuous availability of Web sites in the always-on global marketplace. This paper discusses advanced techniques for Web and application server clustering using IBM WebSphere Application Server, Version 5. The information comes from our experiences with High-Volume Web Sites (HVWS) customer projects of the last few years.

© Copyright IBM Corporation 2003 2 Advanced Clustering Scenarios with WebSphere Application Server

Executive summary The always-on global marketplace makes continuous availability a requirement for an increasing number of Web sites. Over the last few years, IBM’s High-Volume Web Sites (HVWS) team has worked on many Web infrastructure projects that required continuous availability in addition to groundbreaking scalability. To address the availability requirement, the HVWS team developed several advanced techniques for clustering Web and application servers.

Many techniques for server clustering provide continuous availability as long as the consideration of other requirements is limited or removed. If scalability is not considered properly, performance bottlenecks may develop. Maintaining availability during system maintenance tasks, such as upgrades to hardware and applications, remains a challenge. With the proper techniques and the right Java™ 2 Enterprise Edition (J2EE) application server it is possible to achieve continuous availability and address the other requirements.

This paper discusses advanced techniques for clustering Web and application servers to achieve continuous availability using IBM WebSphere Application Server, Version 5. Several clustering scenarios are used to illustrate the techniques. The scenarios are approached from the perspective of demonstrating continuous availability for highly scalable Web sites, including addressing system maintenance techniques. Sample scripts that use the new WebSphere Application Server administration scripting language are provided.


Contents Executive summary ........................................................................................................................2 Contents..........................................................................................................................................3 Introduction ....................................................................................................................................4 The Web server tier ........................................................................................................................5

Clustering the Web server tier....................................................................................................5 Maintenance ...............................................................................................................................6

The application server tier ..............................................................................................................7 Clustering the application server tier .........................................................................................7

Techniques common to all scenarios .....................................................................................7 Clustering using the Web server plug-in ...............................................................................8 Clustering using an external load balancer ..........................................................................10 Clustering using the Dispatcher and Content Based Routing components..........................12 Super scalable clustering scenarios......................................................................................13

Maintenance and failover scenarios .........................................................................................14 Adding a new application server process ............................................................................15 Removing an application server in a stateful environment..................................................15 Application upgrade and rollback........................................................................................15 Application server, network, or hardware failure ................................................................17

Summary of best practices............................................................................................................17 Conclusion and future directions..................................................................................................18 Appendix A. Sample scripts ........................................................................................................19 References ....................................................................................................................................22

WebSphere InfoCenters ...........................................................................................................22 White papers ............................................................................................................................22

Acknowledgements ......................................................................................................................23 Notices..........................................................................................................................................23


Introduction Many factors are important to achieving continuous availability in the Web infrastructure. General availability concepts and practices are covered in the HVWS paper Maximize Web Site Availability. This paper discusses design scenarios for clustering Web and application servers to achieve continuous availability. The scenarios include many of the new and enhanced features of IBM WebSphere Application Server Network Deployment, Version 5. The clustering scenarios cover the features in the WebSphere Web server plug-in and the WebSphere Edge components (formerly Edge Server).

Much is written about clustering Web and application servers. This paper presents design scenarios presented from the perspective of demonstrating continuous availability, not just during normal operations, but also during system maintenance. This means maintaining continuous, or near continuous, availability, while hardware is replaced and applications are upgraded. Typical clustering scenarios work well during steady state conditions. The real challenge comes in achieving near continuous availability during system maintenance, especially for high-volume Web sites that do not have a true off-peak time, such as global sites.

A consideration in architecting the Web infrastructure is the physical placement of the Web server, the J2EE Web container, and J2EE EJB container processes. It is possible to have each process run on its own physical machine or node, or grouped on one or two nodes. The scenarios described in this paper follow the practice of placing the Web server process on its own node and the Web container and the EJB container on a second node. When the J2EE containers are placed on the same node, it is also possible to deploy them in the same Java Virtual Machine (JVM) process. A grouping of nodes performing a similar function is referred to as a tier and hence there are two tiers in the scenario depicted in Figure 1, the Web tier and the application server tier. For the purposes of this document the Web tier generally serves static content while the application server tier serves dynamic content.

WebSphere Clustering Functions

Internet Traffic

Web Tier(Web Servers,Cache/Proxy

Servers, Loadbalancers)

Static Content

ApplicationServer Tier

Dynamic Content

BackendSystems

Data or othertransaction

processing system

Figure 1. Physical separation of Web and Application server tiers


This architecture is optimal and popular for Web infrastructures for several reasons. Two of the main reasons are:

Placing the Web server process on a physically separate machine provides the option of deploying the Web server in a demilitarized zone (DMZ), while protecting the application server processes behind a firewall.

Running the Web container and the EJB container in the same process provides optimal

performance by removing the overhead of serialization and deserialization of parameters and results that are exchanged between the Web container and the EJB container. Complex and large object types can result in significant performance costs when these containers run in separate processes.

There are exceptions that require a different configuration than Figure 1. For example, EJB clients other than Web components are not typically deployed in the same process as the EJB modules. Discussion of these exceptions is beyond the scope of this paper.

The Web server tier

Clustering the Web server tier The Web server tier provides several functions including serving static content, caching, and load balancing for the application server tier. This section describes how to use clustering to achieve continuous availability in the Web tier.

The key element to achieving continuous availability during application upgrades is virtualization of the tiers. Here virtualization means that individual servers are isolated from the requesters of service provided by this tier. The Web server tier is virtualized by allowing access only through a load balancer. For the Web server tier, the WebSphere Application Server Edge component Dispatcher acts as the load balancer as depicted in Figure 2. Dispatcher distributes all Web server requests to a cluster of Web servers. The Web site is known by the cluster IP address of Dispatcher.

Web Server Node 1

Primary & BackupDispatcher Nodes

Web Server Node 2

Internet Traffic

To ApplicationServer Tier

To ApplicationServer Tier

Figure 2. Using Dispatcher to cluster the Web server tier


To avoid a single point of failure, the cluster must have at least two Web server nodes, where the term node denotes a physical machine. This provides hardware redundancy and process redundancy. The load balancer also provides the following functionality essential to continuous availability at the Web server tier.

Detects if the HTTP server process fails to respond. If the process fails to respond to a request, the load balancer marks the server down and optionally retries the request with another server. Failure of a process is transparent to the requester.

Enables a system administrator to dynamically increase or decrease capacity by adding or

removing Web server nodes, transparently to users. Because the load balancer, or Dispatcher in this case, can also be a single point of failure, this component needs to be deployed in pairs. The backup dispatcher can be configured in standby mode, constantly polling the “heartbeat” of the primary dispatcher. If there is no response to a poll, the backup dispatcher takes over. There is also a mutual mode where each dispatcher node load balances for a different cluster. If either dispatcher process fails, the available dispatcher takes over load balancing for its respective cluster. This is especially useful for larger Web sites with multiple clusters.

WebSphere Application Server Network Deployment, Version 5 includes enhanced Edge load balancing components, which are also available in the WebSphere Edge components. In addition to the scenario in Figure 2, Dispatcher can be deployed with the Content Based Routing (CBR) and Site Selector components for more sophisticated load-balancing scenarios.

The load balancing algorithms available using these components are:

Random and round robin, as provided by Dispatcher. This load-balancing algorithm gives equal weight to all servers in a cluster.

Intelligent rules based algorithms based on weighting of active connections, CPU for each load balanced server, new connections, memory, port-specific (input from advisors listening on the port) and/or system metric (using the metric server component running on each load balanced server).

Content based routing based on the content of the incoming user request For more information on Edge components shipped with WebSphere refer to www.ibm.com/software/webservers/appserv/doc/v50/ec/infocenter/index.html.

Maintenance To keep maintenance simple, static content can be deployed with the application enterprise archive files (EARs) and edge side caching (ESI) features in the Web server, in the Edge component Caching Proxy, or an external service provider, such as Akamai. Maintenance is simple because static content is deployed in the same step as the application and the association between static and dynamic content is preserved.

To deploy static content on the Web servers, either:

1. Keep it simple by copying the new content to the document directory of the Web server replacing the old content.


2. When both the old content and new content must be available concurrently, and each version is referred to by a different version of the dynamic content in the application server tier, the new content must be deployed with different names than the old files being replaced. One way to achieve this is to have a new subdirectory structure. References in both the static and dynamic content must be updated to reflect the new location. For the dynamic content either:

Use a script to update the JSPs or XSLT templates during build Use property files to map to resources in JSPs or XSLT templates to their appropriate

version or location of the static content

The application server tier This section discusses clustering and maintenance scenarios for the application server tier. With the possibility of the need to keep session state, the application server tier offers new challenges compared to the Web tier. In discussing the application server tier, the terms stateless and stateful refer to whether the application server tier is used to store some or all user conversation state for interaction between transactions.

In designing the application server tier, consider keeping the application server tier stateless. The advantages of stateless over stateful are:

Better horizontal scalability Higher availability in maintenance and failover scenarios Overall ease of system management

Clustering the application server tier This section describes several clustering scenarios. The techniques common to all scenarios are covered before the specific scenarios and techniques.

Techniques common to all scenarios Like the Web server tier, configure the application server tier with at least two physical machines, or nodes. Each node runs at least one application server instance with the same installed application. With the minimal requirement of two nodes, each node must be capable of handling the site’s peak load in the event of a failure of a node.

Pay special attention to the application server process. For Web sites with dynamic content, most of the processing required to respond to a Web request typically occurs in the application server. Consequently, this process has the greatest probability of a failure. Within the application server process, the execution of a transaction is handled mostly by the application code as opposed to application server code. Compared to the code in the operating system and the application server, the code in the application is typically the least tested and most prone to error, increasing the risk of failure. To mitigate the risk of failure, deploy two identical application server processes on each node. Deploying two redundant application server processes per node ensures the availability of the greatest capacity of the hardware in the event a process fails.

There are other benefits to having at least two server processes. Multiple processes ensure the highest availability during application upgrades. When the application server nodes are 4-way or greater, having multiple JVMs may provide higher throughput per node. This depends heavily on the nature of the application and the operating system.


Clustering using the Web server plug-in This section discusses clustering using the functions available in the WebSphere Application Server, Version 5 Web server plug-in. Figure 3 depicts the typical deployment topology for a clustering scenario using the Web server plug-in. The plug-in provides weighted load balancing using a round robin or random request dispatching policy. The plug-in provides failover functionality through the detection of a failure in the application server process. The load balancing and failover functions for the application server tier are controlled through a plug-in XML file called plugin-cfg.xml and located on the Web server nodes.

Web Server Node 1

DispatcherNodes

Web Server Node 2

InternetTraffic App Server Node 1

web server + plug-in


App Server Node 2

app server 1

app server 2

app server 3

app server 4

To BackendSystems

To BackendSystems

Figure 3. WebSphere plug-in as the load balancer for tier 2

Several elements and attributes of the plug-in file control failover and load balancing. Figure 4 shows an excerpt of the plugin-cfg.xml file to refer to during this discussion.


Web Server Node


plugin-cfg.xml

60s

<Config RefreshInterval="240"><ServerCluster Name="Cluster1" LoadBalance="Random" RetryInterval=60>

<Server Name="server1" LoadBalanceWeight ="2" ConnectTimeout="5"> <Transport Hostname="AppNode1" Port="9080" Protocol="http"/> </Server>

<Server Name="server2" LoadBalanceWeight ="2" ConnectTimeout="5"> <Transport Hostname="AppNode1" Port="9081" Protocol="http"/> </Server> <Server Name="server3" LoadBalanceWeight ="2" ConnectTimeout="5"> <Transport Hostname="AppNode2" Port="9080" Protocol="http"/> </Server> <Server Name="server4" LoadBalanceWeight ="2" ConnectTimeout="5"> <Transport Hostname="AppNode2" Port="9081" Protocol="http"/> </Server></ServerCluster>

Figure 4. Plugin-cfg.xml load balancing attributes

The ServerCluster element groups the redundant application processes as defined by the cluster and the cluster members in the system administration client. Each process, or cluster member, is represented by a Server element in the ServerCluster element. The attributes of the ServerCluster element that control load balancing and failover are LoadBalance and RetryInterval:

LoadBalance: Specifies the load balancing algorithm: can be either round-robin or random. The default algorithm is round-robin and works for most load balancing scenarios.

RetryInterval: Specifies the length of time that elapses from the time that a server process is marked down to the time that the plug-in retries a connection to that process. The default is 60 seconds.

The attributes of the server element that control load balancing and failover are ConnectTimeout and LoadBalanceWeight:

ConnectTimeout: Allows the plug-in to perform nonblocking connections with the application server by specifying a value in seconds, after which if there is no response to a connection request, the application server process is marked down. Nonblocking connections are beneficial when the plug-in is unable to contact the destination to determine if the port is available. A value between 5 and 10 should be reasonable for this attribute.

By default no ConnectTimeout value is specified, and in this case the plug-in performs a blocking connect during which time the plug-in waits until an operating system timeout. After the operating system timeout, which can be several minutes, the application server is then marked down. A value of 0 also causes the plug-in to perform a blocking connect.

LoadBalanceWeight: The weight associated with this server when the plug-in does weighted round robin load balancing. The value is set when adding a member, or duplicate application server instance, to a cluster using an administration client. The default value in the administration Web client is 2.


As the plug-in performs load balancing it selects a server and decrements the selected server’s weight. When a server's weight reaches zero, no more requests are routed to that server until all servers in the cluster have a weight of zero. After all servers reach zero, the weights for all servers in the cluster are reset and the process repeats.

LoadBalanceWeight can be used to accommodate a mismatch in hardware capacity in the application server tier.

The elements and attributes of the plug-in XML file are set from an administration client, with some exceptions. When the plug-in file generation process is invoked, the file is placed in the WASROOT\config\cells directory by default. When the Web server is on a node that is physically separate from the application server, the file needs to be transferred from the deployment manager node to the Web node. See the appendix for an excerpt from a sample script for moving and modifying the file.

After the plug-in file is updated in place, the changes can take effect without restarting the Web server as the plug-in “refreshes” itself by rereading the plug-in XML file every sixty seconds by default. The refresh interval can be changed by manually modifying the RefreshInterval of the Config element.

The Web server plug-in can be used as a load balancer in these scenarios:

For both stateless and stateful application server tiers. For a stateless application server tier, a standalone load balancer between the Web and application tiers may provide easier system management. See “Clustering using an external load balancer.” Some load balancers also support stateful application server tiers using the active cookie affinity feature.

When the number of Web servers is four or less. Using scripts to update the plug-in file

can become prone to human error when many Web servers are involved. One way to avoid this issue is to share the same file among the Web servers though a shared disk.

For dynamic content caching through the plug-in’s support for edge side fragment

caching. When it is desired to use the IBM HTTP Server as the Web server for static content

caching.

Clustering using an external load balancer There are two alternatives to using the Web server plug-in for clustering. Both alternatives use the Edge components that ship with WebSphere Application Server Network Deployment, Version 5.

The first option uses the Dispatcher Edge component, as depicted in Figure 5. Essentially Dispatcher virtualizes the application server tier in the same way it can be used to virtualize the Web site as a single server or point of presence. The external load balancer distributes the load across at least two ports on each physical application server. This option can also work with load balancers such as F5’s BigIP and Cisco’s Content Service Switch (CSS) devices.


Web Server Node 1 App Server Node 1


app server 1(port = 9080)

app server 2(port=9081)

App Server Node 2



Web Server Node 2

web server + plug-inDispatcher

Nodes

port = 8080

port = 8080

To BackendSystems

To BackendSystems

Figure 5. External load balancer for application server load balancing

Here are configuration options to consider for this scenario:

Configure Dispatcher to use the network address port translation (NAPT) capability due to multiple processes listening on unique ports for each node. For more information, refer to the Load Balancer Administration Guide at www.ibm.com/software/webservers/appserv/doc/v50/ec/infocenter/index.html.

For high-volume Web sites, consider gigabit Ethernet cards to prevent network bottlenecks at the dispatcher component.

Test to determine maximum capacity of not only the network interfaces but also the CPUs of the dispatcher nodes.

Set the ClusterAddress element of the plugin-cfg.xml file to the IP address of the cluster address of the load balancer. The ClusterAddress has the same attributes as the server element. The ClusterAddress differs in that you can only define one of them within a ServerCluster. This prevents the plug-in from performing load balancing.

The advantages of using an external load balancer for clustering are:

Modifying the configuration of the dispatcher machine can mark down individual instances of the application server. There is less chance of plug-ins being out of synchronization.

Scripts are not needed. Instead, use the administration GUI or simple commands to mark down the application servers.

More sophisticated load balancing algorithms are available than with the Web server plug-in. In combination with stateless applications, an external load balancer provides the most

scalable and manageable option of all the scenarios. The external load balancer’s monitoring facilities can be used to easily check the state

(marked up or marked down) of an application server, as opposed to “looking” inside plug-in-cfg.xml files.

Fine grain load balancing using custom advisors is an option. The disadvantage of an external load balancer is the difficulty of supporting stateful application server tiers. There are external load balancers from F5 and Cisco that support session affinity using a technique known as active cookie affinity. With active cookie affinity, the load balancer sets a cookie in the request before sending the request to a server. The cookie contains the name of the server that the load balancer selects. Subsequent requests that have this cookie indicate where the load balancer should send the request. Because inspecting cookies increases the load


on the load balancer, this type of configuration should be thoroughly tested to determine the performance limits of the load balancer.

Clustering using the Dispatcher and Content Based Routing components This section covers the second alternative to using the Web server plug-in for clustering. It uses three Edge components: Dispatcher, Content Based Routing (CBR), and Caching Proxy (Figure 6) and supports both stateful and stateless architectures.

Web Server Node 1

App Server Node 1

web server

To BackendSystems



App Server Node 2



Web Server Node 2

web server

To BackendSystems

Load Balancer Node 1

PrimaryDispatcher CBR

Load Balancer Node 2

BackupDispatcher CBR

Figure 6. Clustering scenario using Dispatcher and Content Based Routing

CBR and the required Caching Proxy component (not shown) serve as the Web server tier in this architecture. CBR uses the plugin-cfg.xml configuration file in the same format as required by the Web server plug-in. With this capability, this configuration functions well for both stateful and stateless middle tiers. CBR provides several commands for managing the plug-in XML configuration file.

The primary advantage of this scenario over the Web server plug-in is ease of administration. For a stateless application, however, a kernel based load balancer may provide better performance.

Here are configuration options to consider for this scenario:

Configure Caching Proxy the same on all of the load-balancing servers. To improve the overall accessibility to the Web pages on the back-end servers, set up Caching Proxy to do memory caching.

Define the cluster address and ports to be the same in CBR and Dispatcher. Configure CBR the same across all load-balancing servers.


Configure one load balancer as the primary high availability machine for one dispatcher component and the other as the standby.

Super scalable clustering scenarios This section reviews some high level approaches for scaling a Web site by using additional load balancers with the scenarios covered in the previous sections. The overriding strategy presented here is to segment Web site traffic by using special purpose load balancers in multi-tiered configurations. The special purpose load balancer at each tier can be the same product but configured differently to suit the function required.

Figure 7 is an aggregation of various scenarios. The selected configuration is for discussing different techniques and it’s not recommended that they be used together. The discussion follows the numbers shown in Figure 7. Each number represents a point in the logical network where load balancing can be used.

1. Several types of load balancing can be performed for multisite or content-based load balancing. For multisite load balancing, the load can be split based on client TCP/IP domain, or round robin for a stateless application. For content-based load balancing, the load can be logically split by function (URL). This means that http://hostname.mycompany.com/app1 goes to a different logical pool of servers than http://hostname.mycompany.com/app2. Dispatcher and Site Selector, working together with the ISP DNS, can provide this type of load balancing. Products from Cisco, F5, and Resonate also provide this function.

2. Load balancing provides a way to split the load for the same application, for example, app1,

into two separate server pools. Typically app1 is an extremely high-volume application. Splitting the traffic may be necessary when load balancers at level 3 cannot handle the required throughput due to either network or CPU capacity. Another valid use is for major migrations of hardware, operating system, WebSphere, or application. The load balancing function at this level can be provided by a simple DNS round robin or by configuring Dispatcher in media access control (MAC) forwarding mode.

3. The load balancers provide the function described in the section Clustering using an external

load balancer. Using multiple instances of this load balancer in combination with the load balancing at level 2 provides horizontal scalability and migration flexibility.

4. The load balancers are as described in the section Clustering using the Dispatcher and

Content Based Routing components. The configuration using the WebSphere Application Server HTTP Server plug-in can also be substituted at this level.


AppServerPool 1A

1

2

3

Web orCaching

Proxy Servers

AppServerPool 1B

LB

LB

LB

LB

app1

app1 and app2

app2

AppServerPool 2

WebServerPool

Web or CBRwith CachingProxy Servers

LB

Disaster Recovery Site

4

Figure 7. Multi-tier load balancing for scalability and availability

Maintenance and failover scenarios This section covers maintenance scenarios for achieving continuous availability. Sample scenarios highlight the WebSphere Application Server HTTP Server plug-in scenario and capability, but the same can be accomplished for the other scenarios and load balancers. The following scenarios are covered:

Adding a new application server process Removing an application server in a stateful environment Application upgrade and roll-back Application server, network, and hardware failure

Sample scripts are provided in the appendix.


Adding a new application server process The discussion applies to adding a new application server process, but can be extended to a node because the procedure doesn’t care whether it is on an existing node or a new node.

To add an application server:

1. Add a Server element within the ServerCluster element of the plug-in XML file, either by regenerating the plug-in from an administration client, or by manually updating the XML file.

2. Redeploy the new plug-in XML file on the Web server nodes. The Web server plug-in refreshes its configuration from the file. When the configuration is refreshed the Web server starts distributing work to the new application server process.

This process can be automated using a script. See the sample script provided in the appendix.

Like most load balancers, Dispatcher and CBR have a feature to dynamically add a new application server instance to a cluster without restarting the load balancer.

Removing an application server in a stateful environment One way to remove an application server process is to stop the application process using an administrative client or script. This causes the plug-in to fail to make a new connection to the process and, therefore, the plug-in marks the server down. Transactions in progress complete before the server stops. Simply stopping the process causes the plug-in to keep checking the server, which can increase response time.

A better approach to stopping an application server is to drain it:

Mark the application server instance down by setting the LoadBalanceWeight to 0 for the Web server plugin-cfg.xml file. The Web server plug-in stops load balancing to this server but continues to send users with existing sessions to the server.

Monitor the number of active sessions using IBM Tivoli® Performance Viewer, included with WebSphere, until all active sessions have ended.

Stop the application server. A sample script is provided in the appendix.

The same can be accomplished with Dispatcher by using the command dscontrol manager quiesce.

This procedure can be useful when there is a need to perform hardware maintenance, for example replacing or upgrading CPU/memory. This procedure should be followed for each application process running on the node.

Application upgrade and rollback In the experience of HVWS, application upgrade and rollback is the most frequent maintenance scenario. A generic scenario that can be customized as the environment and requirements vary is discussed. Note that if there is user session state either in the back end (tier 3) or in the browser, and the application server tier is stateless, care must be used when implementing this procedure. In this case the application must provide backward compatibility between releases as multiple requests from the same user could potentially fall across two releases of the application during the switch over. The alternative it to have a small window, say about one hour, for application upgrades when all users are forced off the system.


The process of deploying and restarting processes puts a load on the hardware. Therefore, to reduce risk of affecting response time, the procedure should be performed during off peak hours.

One advantage of this procedure is that it can be performed while maintaining most of the system capacity. To keep full capacity of the system, the recommended deployment architecture for the application server should resemble Figure 8.

Web Server Node

App Server Node 1web server + plug-in



App Server Node 2



To BackendSystems

To BackendSystems

Figure 8. Detail configuration to support application upgrade and rollback

The configuration details are as follows:

The application is deployed on a cluster running on at least two application server nodes There must be at least two cluster members defined on each application server node The memory capacity of the application server nodes must be capable of supporting two

concurrently running instances of the application and its server The manual steps for rolling out the new application are as follows, but many can be automated based on the scripts provided in the appendix:

1. Mark the even numbered servers down 2. Stop the even numbered cluster members 3. Deploy the new application to the cluster 4. Deploy static content if necessary on the Web server tier (separate process for this) 5. Disable the old application on the cluster 6. Start the even numbered cluster members 7. Test the application by going through an optional staging Web server 8. Mark the odd numbered servers down and the even numbered servers up 9. Drain the users from odd numbered application servers (see appendix for script) 10. Restart the odd numbered application servers after they’ve been drained 11. Mark the odd numbered application servers up

If a roll back is necessary:

1. Enable the older version of the application 2. Disable the new version of the application 3. Mark the even numbered application servers down 4. Restart the even numbered application servers


5. Mark the even numbered application servers up and the odd numbered application servers down

6. Restart the odd numbered application servers 7. Mark the odd numbered application servers up

Application server, network, or hardware failure If the plug-in fails to make a connection to an application server process, it marks the application server as unavailable and redirects the request to another application server in the cluster. The plug-in retries the unavailable server after the RetryInterval expires. The plug-in can fail to connect to the server due to a network, hardware, or process failure. For the CBR Edge component, the WLMServlet advisor should be used to provide server up or down status. A similar feature exists in most external HTTP load balancers.

Summary of best practices In designing the application server tier, consider keeping the application server tier stateless,

that is, no state management by the application server tier. This means the state could be handled on the client or in a back end database. Advantages of stateless over stateful are:

Best horizontal scalability Higher availability in maintenance and failover scenarios Overall ease of system management, for example adding and remove servers to a

cluster

Run multiple instances of an application per node. There are two possible benefits:

If one instance fails, the hardware capacity is not totally lost For large SMP machines (four CPUs or higher), higher transaction throughput may

be realized

Deploy critical applications in their own application server instance and, for small machines, keep applications on physically separate nodes. This configuration provides higher application reliability.

For performance reasons, do not allow application servers to serve static content. However,

to keep maintenance simple, deploy static content with the application EARs and use edge side caching (ESI) features either in the Web server, in Caching Proxy Edge component, or an external caching service provider to serve the content. This is appropriate when there is strong interdependence between the static and dynamic content for each application release or upgrade level. Maintenance is simple because static content is deployed in the same step as the application, keeping the static and dynamic content association.

Create scripts to automate as many processes as possible to eliminate human error during

critical procedures.

To improve response time, serve static and dynamic content from the same Web servers when the content makes up the same Web page. This makes full use of keep alive by ensuring that the browser does not have to connect to a different cluster for static and dynamic content.


Ensure that in a cluster, Web, or application, consisting of two physical nodes that each node is capable of handling the site’s peak load in the event of a failure of one of the nodes.

For better scalability and performance, configure load balancers for special purpose. For

example, multiple tiers of load balancers can be used to improve scalability. Tier one does content-based routing to multiple pools of servers while multiple load balancers at tier two perform round robin to each individual pool of servers.

Conclusion and future directions This paper covers what can be achieved with WebSphere Application Server Network Deployment Version 5. The scenarios and techniques presented demonstrate techniques and can be combined and enhanced for extremely sophisticated scenarios.

Maintaining continuous availability is still a fairly manual process, but the future is all about automation. One example of the future direction is the dynamic load balancing function provided by the IBM software group’s IBM Server Allocation for WebSphere Application Server. IBM Server Allocation for WebSphere technology will make it even easier to virtualize a WebSphere server environment by providing advanced automation and load balancing functions among WebSphere applications and compute intensive workloads. Version 1 supports a WebSphere application workload and a compute-intensive low priority workload. Future versions will load balance multiple WebSphere application server workloads.

IBM will continue to deliver new offerings and road maps to enable its customers to convert to an e-business on demand operating environment. IBM is ready to help enterprises transform to an e-business on demand environment regardless of where they are today. Additional information about this technology and other IBM e-business on demand offerings can be found at www.ibm.com/e-business.


Appendix A. Sample scripts This section contains procedures that can be combined for different maintenance scenarios. The “drain application server procedure” is constructed from the many subprocedures as follows:

Drain Application Server Procedure

changeServerWeightOnly regeneratePluginConfig

o getShellEnvVariable

updateWebPlugin getLiveSessionsCount

o parsePMIdataForLiveSessions

Drain Application Server Procedure

#-------------------------------------------------------------------- # Drain application server of users by manipulating server weight and # regenerating WebSphere plug-in. Wait until live user sessions reaches 0 or # timeToWait expires on the application server being drained and then stop # the server. #---------------------------------------------------------------------- changeServerWeightOnly $clusterServerName $newWeight regeneratePluginConfig $pluginCfgName $srcPluginPath updateWebPlugin $pluginCfgName $srcPluginPath $desPluginPath $webServerList puts "Waiting for plugin to read new config file." after 30000 set activeSessionCount [getLiveSessionsCount $clusterServerName] puts "Currently there are $activeSessionCount sessions active on $clusterServerName" set expireTime [clock seconds] incr expireTime $timeToWait puts "Current TimeToWait set to $timeToWait seconds" while {$activeSessionCount > 0 && $expireTime > [clock seconds]} { after 10000 set activeSessionCount [getLiveSessionsCount $clusterServerName] puts "Currently there are $activeSessionCount sessions active on $clusterServerName ([expr $expireTime - [clock seconds]]s left)" } set server [$AdminControl completeObjectName cell=[$AdminControl getCell],node=$nodeName,name=$clusterServerName,type=Server,*] if {[string compare $server ""] != 0} { set serverState [$AdminControl getAttribute $server state] puts "$appServer current state is $serverState" if {[string compare $serverState "STARTED"] == 0} { $AdminControl stopServer $appServer }


} else { puts "Server $appServer is not running on node $nodeName" } changeServerWeightOnly #-------------------------------------------------------------------- # Change the weight of the server so that plug-in will direct traffic # accordingly #-------------------------------------------------------------------- if {$newWeight >= 0} { set clusterServer [$AdminConfig getid /ClusterMember:$clusterServerName/] $AdminConfig modify $clusterServer [list [list weight $newWeight]] } regeneratePluginconfig #-------------------------------------------------------------------- # Regenerate the Web Server plugin #-------------------------------------------------------------------- lappend pluginOptionsList [getShellEnvVariable "WAS_ND_HOME"] lappend pluginOptionsList [getShellEnvVariable "WAS_ND_CONFIG"] lappend pluginOptionsList [$AdminControl getCell] lappend pluginOptionsList null lappend pluginOptionsList null lappend pluginOptionsList $pluginCfgName if {[catch {$AdminControl completeObjectName type=PluginCfgGenerator,*} pluginGenerator]} { # Appropriate error handling code } $AdminControl invoke $pluginGenerator generate $pluginOptionsList set ndName [getShellEnvVariable "ND_NAME"] set baseName [getShellEnvVariable "BASE_NAME"] exec ex -c %s/$ndName/$baseName/ -c wq $srcPluginPath/$pluginCfgName getShellEnvVariable #------------------------------------------------------------------------ # Retrieve environment variables that were set in a calling shell environment #------------------------------------------------------------------------ if {[catch {exec env} variables]} { puts "Error getting Environment Variables" puts "Error Message = $variables" return } set regExp "^$envName=(.*)" regexp -nocase -- $regExp $variables tempStr tempValue regsub -all {\"} $tempValue {} envValue return $envValue


updateWebPlugin #---------------------------------------------------------------------- # Update Web server with the new plugin #---------------------------------------------------------------------- set cfgFile $srcPluginPath append cfgFile /$pluginCfgName exec chmod a+r $cfgFile foreach webServer $webServerList { exec rcp -p $cfgFile $webServer:$desPluginPath/$pluginCfgName } puts "-----> Done updating plugins" getLiveSessionsCount #---------------------------------------------------------------------- # Determine number of live sessions for a given server #---------------------------------------------------------------------- set perfName [$AdminControl completeObjectName type=Perf,process=$clusterServerName,*] set perfObjName [$AdminControl makeObjectName $perfName] # Get the complete name of the server we are monitoring set serverName [$AdminControl completeObjectName type=Server,process=$clusterServerName,*] # set the parameters for the invoke JMX call set params [java::new {java.lang.Object[]} 2] $params set 0 [$AdminControl makeObjectName $serverName] $params set 1 [java::new java.lang.Boolean true] # set the signatures for the invoke JMX call set sigs [java::new {java.lang.String[]} 2] $sigs set 0 javax.management.ObjectName $sigs set 1 java.lang.Boolean # Invoke the call to get the PMI Stats Object set object [$AdminControl invoke_jmx $perfObjName getStatsObject $params $sigs] # Cast the object to a Stats object to be safe set stats [java::cast com.ibm.websphere.pmi.stat.Stats $object] # We have the Stats object of all the PMI data on the server. We # need to just get the SessionsModule data set sessionStats [$stats getStats "servletSessionsModule"] regexp "id=7.*current=(\\d*)" [$sessionStats toString] tempStr liveSessionsCount return $liveSessionsCount


References

WebSphere InfoCenters WebSphere InfoCenter for WebSphere Application Server Network Deployment 5.0 at: publib.boulder.ibm.com/infocenter/wasinfo/index.jsp

WebSphere InfoCenter for Edge Components at: www.ibm.com/software/webservers/appserv/doc/v50/ec/infocenter/index.html

White papers See all the HVWS white papers at www.software.ibm.com/wsdd/zones/hvws/library.html Of special interest:

Prepare Your Web Site for e-business on demand™

Maximize Web Site Availability

System Administration for WebSphere Application Server V5 at www.software.ibm.com/wsdd/techjournal/0301_williamson/williamson.html Server Clusters For High Availability in WebSphere Application Server Network Deployment Edition 5.0, Hao Wang and Mark Bransford, An IBM white paper at www.ibm.com/support/docview.wss?uid=swg27002473


Acknowledgements The major contributors to this document are David Draeger, Luis Ostdiek, John Reif, Ursula Richter, and Christopher Roach. Also thanks for the valuable feedback and input from Bill Hilf.

Notices Trademarks The following are trademarks of International Business Machines Corporation in the United States, other countries, or both: e-business on demand IBM WebSphere Tivoli Java™ and all Java-based trademarks and logos are trademarks or registered trademarks of Sun Microsystems, Inc. in the United States and other countries. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. Other company, product, and service names may be trademarks or service marks of others. Special Notice The information contained in this document has not been submitted to any formal IBM test and is distributed AS IS. The use of this information or the implementation of any of these techniques is a customer responsibility and depends on the customer’s ability to evaluate and integrate them into the customer’s operational environment. While IBM may have reviewed each item for accuracy in a specific situation, there is no guarantee that the same or similar results will be obtained elsewhere. Anyone attempting to adapt these techniques to their own environments do so at their own risk.

Advanced Clustering Techniques for Maximizing Web Site

Documents

Transcript of Advanced Clustering Techniques for Maximizing Web Site