Reducing Maintenance Downtime by 85%

14
Reducing Maintenance Downtime by 85%: Oracle’s Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite An Oracle Technical White Paper May 2012

Transcript of Reducing Maintenance Downtime by 85%

Page 1: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

An Oracle Technical White Paper May 2012

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Abstract 3

New Business Higher Volume More Patching 3

Downtime before Improvements 3

Identifying Component Causes of Downtime 4

Reducing Downtime Component Causes 5

Results 11

Future Direction 13

Conclusion 13

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Abstract

Oracle internally runs a variety of standard and custom applications Over time the maintenance for keeping these applications up to date had become both time-consuming and labor-intensive Oraclersquos acquisition of Sun Microsystems exacerbated the problem with a 50 increase in system volume and additional requirements to support the hardware business This additional load put pressure on a patching window that Oracle IT already regarded as unacceptably long

Beginning in 2009 Oracle IT began an initiative to reduce downtime by automating regular system maintenance and software patching processes for both E-Business Suite and non-E-Business Suite applications The changes made reduced downtime related to software patching by 85 Most notably Oracle was able to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

This white paper describes Oraclersquos current internal methods for patch automation and patch optimization in a heterogeneous software environment quantifies the resulting time savings and describes the tools used to streamline the internal change control processes It also recommends best practices that customers can use to improve their patching processes

New Business Higher Volume More Patching

Before acquiring Sun Microsystems in 2009 Oracle was an enterprise software company with little experience in managing a hardware business Buying Sun dropped Oracle head-first into the deep end of that pool An influx of new users increased demand on Oraclersquos internal systems by over 50 percent and an entirely new set of requirements arose out of the need to support the hardware line of business

Sun came with over 1000 internal legacy applications many focused on manufacturing and the hardware supply chain which could only be consolidated as Oraclersquos internal solutions were extended to take their place New systems needed to be implemented within Oracle in order to support the distinct requirements of the hardware business and systems already in place had to be upgraded to support the increased load As a result integrating Sun placed additional strain on a patching window that was already too long

Downtime Before Improvements

Before Oracle IT began the effort to improve downtime general maintenance took over 100 hours every quarter as shown in Figure 1 Major upgrades such as from one version of E-Business Suite to the next could alone take more than 48 hours

Although a significant amount of time and money was spent on patching the financial impact on the rest of the business was much worse than this direct cost Oracle is a 247 global company and any downtime impeded business across the enterprise

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

0

20

40

60

80

100

120

140

160

180

200

Q3 2008 Q4 2008 Q1 2009 Q2 2009 Q3 2009 Q4 2009 Q1 2010 Q2 2010

Downtime

(hrs)

Figure 1 Oracle Global Single Instance Downtimes per Quarter Prior to Downtime Reduction Initiative

This impact became much more material with the addition of the Sun hardware business When systems went down even for routine maintenance manufacturing was severely impacted Likewise field service could not operate without visibility into the supply chain Long patching windows predated Sun but the additional load and increased consequences made it clear to Oracle IT that changes needed to be quickly implemented to reduce downtime It was mandated that we bring the system maintenance window down from fifteen to three hours per week

Identifying Component Causes of Downtime

Because maintenance downtime has multiple causes Oracle IT began by identifying the factors that contributed the most to downtime and resource consumption at Oracle Table 1 below summarizes these major contributing factors

Factor Description

Cold Patching Environments had to be shut down to apply the overwhelming majority of patches

Pre and post patching Steps

Shutting down and starting up databases and mid tiers in preparation for patching took over 30 minutes of the maintenance window Post patching steps were performed sequentially

Script performance There were no official guidelines for patch developers on how quickly their patching scripts needed to execute In addition the current E-25K database server hardware was inadequate at handling the required capacity

Large Infrequent Patch Bundles

Patches were primarily applied in a quarterly bundle containing over 300 patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patch application required custom scripts and manual steps

Patch Management The approval process was complicated and the patch management tool used to track patches was inefficient

Table 1 Top Contributors to Patching Downtime at Oracle

Once the factors contributing to downtime had been identified Oracle IT began process improvements to reduce the downtime caused by each factor The following section provides details

Reducing Downtime Component Causes

Reducing Cold Patching

The largest single contributor to patching downtime was cold patching Up until 2009 more than 99 of patches were applied only after shutting down the servers running and supporting the application being patched But not all of the over 700 patches applied each quarter required cold patching If more patches could be applied hot while the affected systems continued to run then this number could be brought down ndash directly reducing required downtime

In order to help classify a patch as hot or cold guidelines were developed based on the impact that patches had on the applications and on supporting systems For example a patch that was simply delivering new reports could be applied hot while a patch that updated a database table structure or a critical PLSQL package needed to go in cold

Based on these guidelines the bulk of the hot patches could be packaged separately from the cold ones and applied on a weekly basis while systems continued to run The percentage of cold patches dropped from over 99 in 2009 to less than 60 in 2010 and 2011 Since every patch applied hot rather than cold reduced downtime downtime declined proportionately

Figure 2 Percentage of Patches Applied Hot vs Cold in 2009 and 2011

Speeding up Pre and Post Patching Steps

During a patching window supporting systems - the Concurrent Manager (CM) database instances and application servers had to be shut down before patching and then started up after adding to downtime These systems were being shut down in sequence requiring each shutdown step to wait for the previous one to complete For example there are multiple database instances and it could take as long as 10-15 minutes for each instance to shut down This meant 30 minutes or more of the maintenance windows

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

were eaten up just to shut down the database instances The same issue occurred in reverse when starting up and readying supporting systems In addition other pre- and post-patching steps had to be performed including disabling and re-enabling security triggers removing temporary database tables and recompiling invalid schema objects

To speed up pre- and post-patching steps the team identified steps that could either be shortened or performed in parallel instead of sequentially For example CM waited for all running processes to finish before it would shut down So to speed shutdown the team added a script that terminated all running processes after waiting for a few minutes The 50 plus application servers were also shut down in parallel instead of sequentially bringing total shut-down time for these servers to less than 10 minutes

Oracle IT also began doing a lsquoshutdown abortrsquo of database instances to speed up their shutdown process Each instance was forced to shut down within one minute Whereas in the past each instance was taken down only after the previous one had completed shutting down multiple instances of the databases were now shut down in parallel typically within 30 seconds of each other Similarly restarts of supporting systems and other steps needed to make these systems operational were shifted to being performed in parallel As a result of these changes the time taken to complete the pre and post patching steps dropped from over 20 hours a quarter in 2010 to less than 10 hours a quarter in 2011 a reduction that went directly to the bottom line of overall downtime Table 3 provides a detailed breakdown of the components

Area Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Pre patching and shutdown steps (hrs)

Note only combined pre and post statistics are available for these quarters

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post 27 23 10 165 95 105 9 times (hrs)

Table 2 Downtime Caused by Pre- and Post-Patching Steps by Quarter

Increasing Patching Frequency

Patching efficiency was also improved by the rather counterintuitive process change of patching more often The older maintenance process had centered on a quarterly release bundle in which over 300 patches were applied during a single patching window each quarter

Despite the apparent advantages of doing one large patching event per quarter this process had serious unintended consequences First applying such a large bundle requires a large patching window which can be more disruptive to business operations such as manufacturing and field services than several smaller ones Second a single quarterly window reduces business agility and does not allow for incremental changes Third patches would sometimes be rushed into the bundle before they were fully ready to avoid missing these infrequent quarterly windows If a rushed patch caused problems in the system emergency patches would then have to be applied to correct the problems ndash causing additional downtime Finally applying so many patches simultaneously reduced accountability and made problems difficult to trace when they did occur

Because the quarterly bundles had so many unintended consequences the team found that they could actually achieve less downtime by patching more often In the current process patches are applied during regular weekly patching windows spread over the quarter Figure 3 shows the more even spread of patches in 2011 when compared to 2010 To accommodate the increased patching frequency the testing

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

process had to be made more robust This was achieved by introducing automated testing of critical application flows using the Oracle Application Testing Suite (OATS) OATS enables definition and management of the application testing process and validates application functionality

Figure 3 Number of Patches Applied by Week 2010-2011

Improving Patching Script Performance

Downtime also resulted from the poor performance of patch application scripts which in the absence of official guidelines on tuning often ran for over 30 minutes each As part of the downtime reduction initiative guidelines were put into place requiring patching scripts to be tuned so every job within patches submitted ran in under 10 minutes

This mandate did consume some additional labor for script tuning However the team considered this labor a reasonable tradeoff since it affected only direct IT costs while downtime imposed much more significant costs across the company With scripts tuned to run faster the actual patching component of downtime was reduced In addition tuning of standard scripts benefitted customers who had to apply them later

It should be noted that some of the improvement in script processing speed did not come from script tuning but rather from faster hardware During the period of the downtime reduction initiative the servers that run Oraclersquos Global Single Instance (GSI) of E-Business Suite were upgraded from a four-node Real Application Cluster (RAC) running Sun Fire E25Ks to a three-node RAC running Sun SPARC Enterprise M9000s The new M9000 servers provided a significant performance boost compared to the previous E-25K servers The main drivers for this upgrade were the ability to handle increased load from the Sun acquisition and to improve GSI performance in normal operation However as a side benefit the M9000s did indeed process patching scripts much faster

Script tuning and faster hardware combined to dramatically reduce the time taken for the actual patch application steps In 2010 patch applications steps consumed over 50 hours per quarter In 2011 this dropped to 4 hours per quarter

Automating Custom Patching

Like most large enterprise software deployments Oraclersquos own implementation of E-Business Suite contains custom code and application customizations These in turn require custom patches A significant number of EBS patches applied at Oracle were custom Furthermore Oraclersquos internal footprint also

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

includes a number of non-EBS applications such as Seibel Agile and Oracle Application Express (APEX) The manual application process for custom patches to EBS and for any patches to non-EBS applications was both time-consuming and labor intensive

Standard patches to EBS had always been applied using an automated tool called AutoPatch Autopatch applied all bug fixes in a patch managed version checking tracked changes and allowed restart capability But no such capabilities were in place for custom EBS or non-EBS patches which had to be hand-executed or hand-placed by patching personnel into directories Aside from using up resource hours this added a layer of complexity and contributed to errors and quality issues

The team started following the same process to build custom patches as was used for standard patches so that custom patches could be applied using AutoPatch They also developed functionality similar to AutoPatch into a shell script to automate application of custom non-EBS patches These two tools allowed Oracle IT to apply custom patches in much the same way as standard ones

Automation of Patch Management

Oracle IT also improved the change control process that led up to patching Oracle had used a tool called called Automated Release Updates (ARU) for many years to automate the process of defining building packaging and distributing patches For patch management a tool called Common Patch Request Tool (CPRT) had been used prior to the downtime reduction initiative CPRT offered limited functionality to track patches and record manual deployment instructions It included a cumbersome approval process for patches involving manual steps

In addition approvers were previously designated for each of the 200 applications supported by Oracle IT and a patch containing updates to several applications required approvals from designated approvers for each of the included applications This process occurred before patch application and therefore did not contribute directly to downtime However it did consume time and resources reduce accountability and cause delays in rolling out fixes and new functionality to users

To better manage patching related processes Oracle IT built a custom Enterprise Resource Planning (ERP) tool called Patch Approval Submission System (PASS) which simplifies and automates patch tracking approval downtime management and reporting The switch from CPRT to PASS started in November 2010 and was completed in December 2011

Patch type Tools used 2008 2011

EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutopatch EBS standard patch ARU-gtCPRT -gt Autopatch ARU-gtPASS-gtAutopatch Non-EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutomated tool

that mimics Autopatch behavior Non-EBS standard patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtZip file with

instructions for patching team

Table 3 Types of Patches and Patching Tools Used

PASS automatically manages the workflow required to move a request to approval and to patching It allows developers request target environments and patching windows for each ARU patch At every step of the patching process from identification of the issue to approval to actual implementation PASS provides accountability and tracks all pieces of information on who was doing what and when Table 3 shows the tools used to automate the patching processes for EBS and non-EBS patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Create patch request

Submit patch for approval Approval

Patching Team picks up request

Patching Team applies

patch

Requester tests patch

Figure 4 Patching Workflow Steps Tracked in PASS

PASS has also streamlined the approval process Previously each of the 200 supported applications had a designated approver A patch containing files impacting different applications required approval from designees for each of those applications making the process cumbersome With PASS the number of approvers for any given patch is reduced to a handful

This reduction in approvers is possible because of another process change ndash a radical reduction in the number of people authorized to submit patches In the old process developers who requested patches also submitted them In the new PASS process a separate layer of submitters has been designated to ensure the patch quality and performance before submission This adds a layer of accountability and eliminates the need for a large number of approvers This smaller number of submitters is also required to provide more information when submitting patches Table 4 below shows the questions submitters must respond to when entering a patch into PASS

1 Describe the issues being addressed by this patch

2 Identify risks associated with the patch application

3 Indicate the tracking number of the bug being fixed

4 Indicate a target date for patching of the production environment

5 Enter the name of the developer or IT reviewer

6 Identify the files that are changed or impacted

7 Briefly describe the code changes for the files

8 Confirm that the patch has been tested either manually or using PASS

9 Indicate when the patch was last tested in a test environment

10 Note the patch execution times in each previous environment

Table 4 Information Required at Patch Submission

The new ARUPASS process also provides efficient merging of patches so that they can be applied in a single package As part of its Multi Language Support (MLS) EBS supports eleven languages and patches often need to be built for each language By merging these into one package common steps such as maintaining file versions and updating history tables can be performed once rather than multiple times

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Figure 5 Screen shot of Oraclersquos Patch Approval Submission System (PASS)

ARU and PASS are custom tools that Oracle IT continues to use and extend because of their long experience with them PASS has been customized to work well with the internal patch generation and source

control tools that are used by the Oraclersquos Applications Product Development group and its extension the Applications IT group However Oracle customers can use many of the same capabilities in the form of the Application Change Management Pack (ACMP) and the Application Management Pack (AMP) both included in Enterprise Manager ACMP is an end to end change management solution that works with a variety of source control applications and allows for the automated deployment of standard and custom patches across different environments AMP has patch management functionalities similar to PASS It is a system management solution for centralized management of multiple environments and allows for the proactive monitoring of requests and workflows for reporting on history and trends and for automation of repetitive Database Administration (DBA) tasks

Figure 6 Screen Shot of Patch Manager in Application Change Management Pack (ACMP)

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Results

Oracle ITrsquos downtime reduction initiative reduced maintenance downtime by 85 from 1045 hours in Q2 2010 to less than 15 hours in each of three consecutive quarters from Q2 2011 through Q4 2011 This reduction occurred although the number of patches remained essentially the same for most quarters and even increased in Q2 2011 Figure 6 below shows the trend of downtime reduction along with the number of patches applied each quarter

0

200

400

600

800

1000

1200

1400

0

20

40

60

80

100

120

Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Number of p

atchesa

applied

Time

(hours)

ShutdownStartup time Patching time Total time Patches

Figure 7 GSI Downtimes by Quarter

Table 5 below provides the underlying data and some additional detail including the number of planned outages and actual patching events It should be noted that tracking the detail of time consumed in pre- and post-patching steps was initiated as part of the downtime reduction initiative Therefore a breakdown of hours into pre and post is not available for all quarters

Area Q1 2010 Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Patching (hrs) 775 325 20 18 45 4 4 Pre patching and shutdown steps (hrs)

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post times (hrs)

27 23 10 165 95 105 9

Total downtime (hrs) 1725 1045 555 30 345 14 145 13

Table 5 Breakdown of GSI Downtime by Quarter (detail data not available for some quarters)

It is not simple to allocate percentages or hours of downtime reduction to all the factors addressed in Oraclersquos downtime reduction initiative Reductions in factors such as hot patching and the pre- and post-patching steps can be quantified to the minute with no dependencies to cloud the issue As shown in Figure 7 the increase in hot patching contributed to a 66 decrease in patching downtime and the improvements in pre and post patching steps to a 19 reduction Improvements in other factors

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

contributed to another 15 decrease in downtime but are much more interdependent and their benefits harder to accurately allocate For example both script performance tuning and an upgrade of Oraclersquos Global Single Instance to faster hardware reduced the downtime associated with patching script execution However since the hardware upgrade was initiated to improve overall GSI performance and was scheduled independently of the downtime reduction initiative the team could not accurately separate the effects of the hardware upgrade from those of script tuning A pure research organization would have made one of these changes at a time and quantified exactly how many hours of downtime reduction could be specifically be attributed to script tuning vs upgraded hardware Since Oracle ITrsquos primary mission is to support the business sometimes multiple improvements must be made simultaneously despite the inevitable confounds this produces

The exact impacts of process improvements such as increasing patch frequency are similarly difficult to break out Doing so would require putting exact numbers to downtime caused specifically by patches that were rushed in order to make the cutoff and by the reduction in accountability caused by very large bundles not a straightforward calculation

66

19

15

Hot Patching

Pre and Post patching steps

Patch tuning hardware upgrades and other factors

Figure 8 Contribution of Factor Improvements on Downtime Reduction Initiative

Despite the difficulties of exactly allocating a portion of the downtime reduction to each factor it is clear that all of the factors cited in Table 1 contributed substantially to Oraclersquos downtime and improvements in each factor contributed to the overall 85 reduction Table 5 below revisits the factors that contributed to Oraclersquos previously high downtime and recaps the actions taken to improve them

Factor Description

Cold Patching Percentage of patches applied hot increased from less than 1 in 2009 to over 40 in 2011

Pre and post patching Steps

Systems are shut down and started back up in parallel As a result pre and post patching times went down from over 20 hours per quarter in 2010 to less than 10 hours per quarter in 2011

Script performance Patching scripts are now tuned to run in under 10 minutes Database server hardware upgrades also helped speed up script execution times

Patch Frequency Smaller patch sets are applied weekly as opposed to large quarterly bundles This has improved patch quality and brought down instances of follow on patchings required to correct bad patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 2: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Abstract 3

New Business Higher Volume More Patching 3

Downtime before Improvements 3

Identifying Component Causes of Downtime 4

Reducing Downtime Component Causes 5

Results 11

Future Direction 13

Conclusion 13

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Abstract

Oracle internally runs a variety of standard and custom applications Over time the maintenance for keeping these applications up to date had become both time-consuming and labor-intensive Oraclersquos acquisition of Sun Microsystems exacerbated the problem with a 50 increase in system volume and additional requirements to support the hardware business This additional load put pressure on a patching window that Oracle IT already regarded as unacceptably long

Beginning in 2009 Oracle IT began an initiative to reduce downtime by automating regular system maintenance and software patching processes for both E-Business Suite and non-E-Business Suite applications The changes made reduced downtime related to software patching by 85 Most notably Oracle was able to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

This white paper describes Oraclersquos current internal methods for patch automation and patch optimization in a heterogeneous software environment quantifies the resulting time savings and describes the tools used to streamline the internal change control processes It also recommends best practices that customers can use to improve their patching processes

New Business Higher Volume More Patching

Before acquiring Sun Microsystems in 2009 Oracle was an enterprise software company with little experience in managing a hardware business Buying Sun dropped Oracle head-first into the deep end of that pool An influx of new users increased demand on Oraclersquos internal systems by over 50 percent and an entirely new set of requirements arose out of the need to support the hardware line of business

Sun came with over 1000 internal legacy applications many focused on manufacturing and the hardware supply chain which could only be consolidated as Oraclersquos internal solutions were extended to take their place New systems needed to be implemented within Oracle in order to support the distinct requirements of the hardware business and systems already in place had to be upgraded to support the increased load As a result integrating Sun placed additional strain on a patching window that was already too long

Downtime Before Improvements

Before Oracle IT began the effort to improve downtime general maintenance took over 100 hours every quarter as shown in Figure 1 Major upgrades such as from one version of E-Business Suite to the next could alone take more than 48 hours

Although a significant amount of time and money was spent on patching the financial impact on the rest of the business was much worse than this direct cost Oracle is a 247 global company and any downtime impeded business across the enterprise

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

0

20

40

60

80

100

120

140

160

180

200

Q3 2008 Q4 2008 Q1 2009 Q2 2009 Q3 2009 Q4 2009 Q1 2010 Q2 2010

Downtime

(hrs)

Figure 1 Oracle Global Single Instance Downtimes per Quarter Prior to Downtime Reduction Initiative

This impact became much more material with the addition of the Sun hardware business When systems went down even for routine maintenance manufacturing was severely impacted Likewise field service could not operate without visibility into the supply chain Long patching windows predated Sun but the additional load and increased consequences made it clear to Oracle IT that changes needed to be quickly implemented to reduce downtime It was mandated that we bring the system maintenance window down from fifteen to three hours per week

Identifying Component Causes of Downtime

Because maintenance downtime has multiple causes Oracle IT began by identifying the factors that contributed the most to downtime and resource consumption at Oracle Table 1 below summarizes these major contributing factors

Factor Description

Cold Patching Environments had to be shut down to apply the overwhelming majority of patches

Pre and post patching Steps

Shutting down and starting up databases and mid tiers in preparation for patching took over 30 minutes of the maintenance window Post patching steps were performed sequentially

Script performance There were no official guidelines for patch developers on how quickly their patching scripts needed to execute In addition the current E-25K database server hardware was inadequate at handling the required capacity

Large Infrequent Patch Bundles

Patches were primarily applied in a quarterly bundle containing over 300 patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patch application required custom scripts and manual steps

Patch Management The approval process was complicated and the patch management tool used to track patches was inefficient

Table 1 Top Contributors to Patching Downtime at Oracle

Once the factors contributing to downtime had been identified Oracle IT began process improvements to reduce the downtime caused by each factor The following section provides details

Reducing Downtime Component Causes

Reducing Cold Patching

The largest single contributor to patching downtime was cold patching Up until 2009 more than 99 of patches were applied only after shutting down the servers running and supporting the application being patched But not all of the over 700 patches applied each quarter required cold patching If more patches could be applied hot while the affected systems continued to run then this number could be brought down ndash directly reducing required downtime

In order to help classify a patch as hot or cold guidelines were developed based on the impact that patches had on the applications and on supporting systems For example a patch that was simply delivering new reports could be applied hot while a patch that updated a database table structure or a critical PLSQL package needed to go in cold

Based on these guidelines the bulk of the hot patches could be packaged separately from the cold ones and applied on a weekly basis while systems continued to run The percentage of cold patches dropped from over 99 in 2009 to less than 60 in 2010 and 2011 Since every patch applied hot rather than cold reduced downtime downtime declined proportionately

Figure 2 Percentage of Patches Applied Hot vs Cold in 2009 and 2011

Speeding up Pre and Post Patching Steps

During a patching window supporting systems - the Concurrent Manager (CM) database instances and application servers had to be shut down before patching and then started up after adding to downtime These systems were being shut down in sequence requiring each shutdown step to wait for the previous one to complete For example there are multiple database instances and it could take as long as 10-15 minutes for each instance to shut down This meant 30 minutes or more of the maintenance windows

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

were eaten up just to shut down the database instances The same issue occurred in reverse when starting up and readying supporting systems In addition other pre- and post-patching steps had to be performed including disabling and re-enabling security triggers removing temporary database tables and recompiling invalid schema objects

To speed up pre- and post-patching steps the team identified steps that could either be shortened or performed in parallel instead of sequentially For example CM waited for all running processes to finish before it would shut down So to speed shutdown the team added a script that terminated all running processes after waiting for a few minutes The 50 plus application servers were also shut down in parallel instead of sequentially bringing total shut-down time for these servers to less than 10 minutes

Oracle IT also began doing a lsquoshutdown abortrsquo of database instances to speed up their shutdown process Each instance was forced to shut down within one minute Whereas in the past each instance was taken down only after the previous one had completed shutting down multiple instances of the databases were now shut down in parallel typically within 30 seconds of each other Similarly restarts of supporting systems and other steps needed to make these systems operational were shifted to being performed in parallel As a result of these changes the time taken to complete the pre and post patching steps dropped from over 20 hours a quarter in 2010 to less than 10 hours a quarter in 2011 a reduction that went directly to the bottom line of overall downtime Table 3 provides a detailed breakdown of the components

Area Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Pre patching and shutdown steps (hrs)

Note only combined pre and post statistics are available for these quarters

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post 27 23 10 165 95 105 9 times (hrs)

Table 2 Downtime Caused by Pre- and Post-Patching Steps by Quarter

Increasing Patching Frequency

Patching efficiency was also improved by the rather counterintuitive process change of patching more often The older maintenance process had centered on a quarterly release bundle in which over 300 patches were applied during a single patching window each quarter

Despite the apparent advantages of doing one large patching event per quarter this process had serious unintended consequences First applying such a large bundle requires a large patching window which can be more disruptive to business operations such as manufacturing and field services than several smaller ones Second a single quarterly window reduces business agility and does not allow for incremental changes Third patches would sometimes be rushed into the bundle before they were fully ready to avoid missing these infrequent quarterly windows If a rushed patch caused problems in the system emergency patches would then have to be applied to correct the problems ndash causing additional downtime Finally applying so many patches simultaneously reduced accountability and made problems difficult to trace when they did occur

Because the quarterly bundles had so many unintended consequences the team found that they could actually achieve less downtime by patching more often In the current process patches are applied during regular weekly patching windows spread over the quarter Figure 3 shows the more even spread of patches in 2011 when compared to 2010 To accommodate the increased patching frequency the testing

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

process had to be made more robust This was achieved by introducing automated testing of critical application flows using the Oracle Application Testing Suite (OATS) OATS enables definition and management of the application testing process and validates application functionality

Figure 3 Number of Patches Applied by Week 2010-2011

Improving Patching Script Performance

Downtime also resulted from the poor performance of patch application scripts which in the absence of official guidelines on tuning often ran for over 30 minutes each As part of the downtime reduction initiative guidelines were put into place requiring patching scripts to be tuned so every job within patches submitted ran in under 10 minutes

This mandate did consume some additional labor for script tuning However the team considered this labor a reasonable tradeoff since it affected only direct IT costs while downtime imposed much more significant costs across the company With scripts tuned to run faster the actual patching component of downtime was reduced In addition tuning of standard scripts benefitted customers who had to apply them later

It should be noted that some of the improvement in script processing speed did not come from script tuning but rather from faster hardware During the period of the downtime reduction initiative the servers that run Oraclersquos Global Single Instance (GSI) of E-Business Suite were upgraded from a four-node Real Application Cluster (RAC) running Sun Fire E25Ks to a three-node RAC running Sun SPARC Enterprise M9000s The new M9000 servers provided a significant performance boost compared to the previous E-25K servers The main drivers for this upgrade were the ability to handle increased load from the Sun acquisition and to improve GSI performance in normal operation However as a side benefit the M9000s did indeed process patching scripts much faster

Script tuning and faster hardware combined to dramatically reduce the time taken for the actual patch application steps In 2010 patch applications steps consumed over 50 hours per quarter In 2011 this dropped to 4 hours per quarter

Automating Custom Patching

Like most large enterprise software deployments Oraclersquos own implementation of E-Business Suite contains custom code and application customizations These in turn require custom patches A significant number of EBS patches applied at Oracle were custom Furthermore Oraclersquos internal footprint also

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

includes a number of non-EBS applications such as Seibel Agile and Oracle Application Express (APEX) The manual application process for custom patches to EBS and for any patches to non-EBS applications was both time-consuming and labor intensive

Standard patches to EBS had always been applied using an automated tool called AutoPatch Autopatch applied all bug fixes in a patch managed version checking tracked changes and allowed restart capability But no such capabilities were in place for custom EBS or non-EBS patches which had to be hand-executed or hand-placed by patching personnel into directories Aside from using up resource hours this added a layer of complexity and contributed to errors and quality issues

The team started following the same process to build custom patches as was used for standard patches so that custom patches could be applied using AutoPatch They also developed functionality similar to AutoPatch into a shell script to automate application of custom non-EBS patches These two tools allowed Oracle IT to apply custom patches in much the same way as standard ones

Automation of Patch Management

Oracle IT also improved the change control process that led up to patching Oracle had used a tool called called Automated Release Updates (ARU) for many years to automate the process of defining building packaging and distributing patches For patch management a tool called Common Patch Request Tool (CPRT) had been used prior to the downtime reduction initiative CPRT offered limited functionality to track patches and record manual deployment instructions It included a cumbersome approval process for patches involving manual steps

In addition approvers were previously designated for each of the 200 applications supported by Oracle IT and a patch containing updates to several applications required approvals from designated approvers for each of the included applications This process occurred before patch application and therefore did not contribute directly to downtime However it did consume time and resources reduce accountability and cause delays in rolling out fixes and new functionality to users

To better manage patching related processes Oracle IT built a custom Enterprise Resource Planning (ERP) tool called Patch Approval Submission System (PASS) which simplifies and automates patch tracking approval downtime management and reporting The switch from CPRT to PASS started in November 2010 and was completed in December 2011

Patch type Tools used 2008 2011

EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutopatch EBS standard patch ARU-gtCPRT -gt Autopatch ARU-gtPASS-gtAutopatch Non-EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutomated tool

that mimics Autopatch behavior Non-EBS standard patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtZip file with

instructions for patching team

Table 3 Types of Patches and Patching Tools Used

PASS automatically manages the workflow required to move a request to approval and to patching It allows developers request target environments and patching windows for each ARU patch At every step of the patching process from identification of the issue to approval to actual implementation PASS provides accountability and tracks all pieces of information on who was doing what and when Table 3 shows the tools used to automate the patching processes for EBS and non-EBS patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Create patch request

Submit patch for approval Approval

Patching Team picks up request

Patching Team applies

patch

Requester tests patch

Figure 4 Patching Workflow Steps Tracked in PASS

PASS has also streamlined the approval process Previously each of the 200 supported applications had a designated approver A patch containing files impacting different applications required approval from designees for each of those applications making the process cumbersome With PASS the number of approvers for any given patch is reduced to a handful

This reduction in approvers is possible because of another process change ndash a radical reduction in the number of people authorized to submit patches In the old process developers who requested patches also submitted them In the new PASS process a separate layer of submitters has been designated to ensure the patch quality and performance before submission This adds a layer of accountability and eliminates the need for a large number of approvers This smaller number of submitters is also required to provide more information when submitting patches Table 4 below shows the questions submitters must respond to when entering a patch into PASS

1 Describe the issues being addressed by this patch

2 Identify risks associated with the patch application

3 Indicate the tracking number of the bug being fixed

4 Indicate a target date for patching of the production environment

5 Enter the name of the developer or IT reviewer

6 Identify the files that are changed or impacted

7 Briefly describe the code changes for the files

8 Confirm that the patch has been tested either manually or using PASS

9 Indicate when the patch was last tested in a test environment

10 Note the patch execution times in each previous environment

Table 4 Information Required at Patch Submission

The new ARUPASS process also provides efficient merging of patches so that they can be applied in a single package As part of its Multi Language Support (MLS) EBS supports eleven languages and patches often need to be built for each language By merging these into one package common steps such as maintaining file versions and updating history tables can be performed once rather than multiple times

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Figure 5 Screen shot of Oraclersquos Patch Approval Submission System (PASS)

ARU and PASS are custom tools that Oracle IT continues to use and extend because of their long experience with them PASS has been customized to work well with the internal patch generation and source

control tools that are used by the Oraclersquos Applications Product Development group and its extension the Applications IT group However Oracle customers can use many of the same capabilities in the form of the Application Change Management Pack (ACMP) and the Application Management Pack (AMP) both included in Enterprise Manager ACMP is an end to end change management solution that works with a variety of source control applications and allows for the automated deployment of standard and custom patches across different environments AMP has patch management functionalities similar to PASS It is a system management solution for centralized management of multiple environments and allows for the proactive monitoring of requests and workflows for reporting on history and trends and for automation of repetitive Database Administration (DBA) tasks

Figure 6 Screen Shot of Patch Manager in Application Change Management Pack (ACMP)

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Results

Oracle ITrsquos downtime reduction initiative reduced maintenance downtime by 85 from 1045 hours in Q2 2010 to less than 15 hours in each of three consecutive quarters from Q2 2011 through Q4 2011 This reduction occurred although the number of patches remained essentially the same for most quarters and even increased in Q2 2011 Figure 6 below shows the trend of downtime reduction along with the number of patches applied each quarter

0

200

400

600

800

1000

1200

1400

0

20

40

60

80

100

120

Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Number of p

atchesa

applied

Time

(hours)

ShutdownStartup time Patching time Total time Patches

Figure 7 GSI Downtimes by Quarter

Table 5 below provides the underlying data and some additional detail including the number of planned outages and actual patching events It should be noted that tracking the detail of time consumed in pre- and post-patching steps was initiated as part of the downtime reduction initiative Therefore a breakdown of hours into pre and post is not available for all quarters

Area Q1 2010 Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Patching (hrs) 775 325 20 18 45 4 4 Pre patching and shutdown steps (hrs)

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post times (hrs)

27 23 10 165 95 105 9

Total downtime (hrs) 1725 1045 555 30 345 14 145 13

Table 5 Breakdown of GSI Downtime by Quarter (detail data not available for some quarters)

It is not simple to allocate percentages or hours of downtime reduction to all the factors addressed in Oraclersquos downtime reduction initiative Reductions in factors such as hot patching and the pre- and post-patching steps can be quantified to the minute with no dependencies to cloud the issue As shown in Figure 7 the increase in hot patching contributed to a 66 decrease in patching downtime and the improvements in pre and post patching steps to a 19 reduction Improvements in other factors

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

contributed to another 15 decrease in downtime but are much more interdependent and their benefits harder to accurately allocate For example both script performance tuning and an upgrade of Oraclersquos Global Single Instance to faster hardware reduced the downtime associated with patching script execution However since the hardware upgrade was initiated to improve overall GSI performance and was scheduled independently of the downtime reduction initiative the team could not accurately separate the effects of the hardware upgrade from those of script tuning A pure research organization would have made one of these changes at a time and quantified exactly how many hours of downtime reduction could be specifically be attributed to script tuning vs upgraded hardware Since Oracle ITrsquos primary mission is to support the business sometimes multiple improvements must be made simultaneously despite the inevitable confounds this produces

The exact impacts of process improvements such as increasing patch frequency are similarly difficult to break out Doing so would require putting exact numbers to downtime caused specifically by patches that were rushed in order to make the cutoff and by the reduction in accountability caused by very large bundles not a straightforward calculation

66

19

15

Hot Patching

Pre and Post patching steps

Patch tuning hardware upgrades and other factors

Figure 8 Contribution of Factor Improvements on Downtime Reduction Initiative

Despite the difficulties of exactly allocating a portion of the downtime reduction to each factor it is clear that all of the factors cited in Table 1 contributed substantially to Oraclersquos downtime and improvements in each factor contributed to the overall 85 reduction Table 5 below revisits the factors that contributed to Oraclersquos previously high downtime and recaps the actions taken to improve them

Factor Description

Cold Patching Percentage of patches applied hot increased from less than 1 in 2009 to over 40 in 2011

Pre and post patching Steps

Systems are shut down and started back up in parallel As a result pre and post patching times went down from over 20 hours per quarter in 2010 to less than 10 hours per quarter in 2011

Script performance Patching scripts are now tuned to run in under 10 minutes Database server hardware upgrades also helped speed up script execution times

Patch Frequency Smaller patch sets are applied weekly as opposed to large quarterly bundles This has improved patch quality and brought down instances of follow on patchings required to correct bad patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 3: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Abstract

Oracle internally runs a variety of standard and custom applications Over time the maintenance for keeping these applications up to date had become both time-consuming and labor-intensive Oraclersquos acquisition of Sun Microsystems exacerbated the problem with a 50 increase in system volume and additional requirements to support the hardware business This additional load put pressure on a patching window that Oracle IT already regarded as unacceptably long

Beginning in 2009 Oracle IT began an initiative to reduce downtime by automating regular system maintenance and software patching processes for both E-Business Suite and non-E-Business Suite applications The changes made reduced downtime related to software patching by 85 Most notably Oracle was able to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

This white paper describes Oraclersquos current internal methods for patch automation and patch optimization in a heterogeneous software environment quantifies the resulting time savings and describes the tools used to streamline the internal change control processes It also recommends best practices that customers can use to improve their patching processes

New Business Higher Volume More Patching

Before acquiring Sun Microsystems in 2009 Oracle was an enterprise software company with little experience in managing a hardware business Buying Sun dropped Oracle head-first into the deep end of that pool An influx of new users increased demand on Oraclersquos internal systems by over 50 percent and an entirely new set of requirements arose out of the need to support the hardware line of business

Sun came with over 1000 internal legacy applications many focused on manufacturing and the hardware supply chain which could only be consolidated as Oraclersquos internal solutions were extended to take their place New systems needed to be implemented within Oracle in order to support the distinct requirements of the hardware business and systems already in place had to be upgraded to support the increased load As a result integrating Sun placed additional strain on a patching window that was already too long

Downtime Before Improvements

Before Oracle IT began the effort to improve downtime general maintenance took over 100 hours every quarter as shown in Figure 1 Major upgrades such as from one version of E-Business Suite to the next could alone take more than 48 hours

Although a significant amount of time and money was spent on patching the financial impact on the rest of the business was much worse than this direct cost Oracle is a 247 global company and any downtime impeded business across the enterprise

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

0

20

40

60

80

100

120

140

160

180

200

Q3 2008 Q4 2008 Q1 2009 Q2 2009 Q3 2009 Q4 2009 Q1 2010 Q2 2010

Downtime

(hrs)

Figure 1 Oracle Global Single Instance Downtimes per Quarter Prior to Downtime Reduction Initiative

This impact became much more material with the addition of the Sun hardware business When systems went down even for routine maintenance manufacturing was severely impacted Likewise field service could not operate without visibility into the supply chain Long patching windows predated Sun but the additional load and increased consequences made it clear to Oracle IT that changes needed to be quickly implemented to reduce downtime It was mandated that we bring the system maintenance window down from fifteen to three hours per week

Identifying Component Causes of Downtime

Because maintenance downtime has multiple causes Oracle IT began by identifying the factors that contributed the most to downtime and resource consumption at Oracle Table 1 below summarizes these major contributing factors

Factor Description

Cold Patching Environments had to be shut down to apply the overwhelming majority of patches

Pre and post patching Steps

Shutting down and starting up databases and mid tiers in preparation for patching took over 30 minutes of the maintenance window Post patching steps were performed sequentially

Script performance There were no official guidelines for patch developers on how quickly their patching scripts needed to execute In addition the current E-25K database server hardware was inadequate at handling the required capacity

Large Infrequent Patch Bundles

Patches were primarily applied in a quarterly bundle containing over 300 patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patch application required custom scripts and manual steps

Patch Management The approval process was complicated and the patch management tool used to track patches was inefficient

Table 1 Top Contributors to Patching Downtime at Oracle

Once the factors contributing to downtime had been identified Oracle IT began process improvements to reduce the downtime caused by each factor The following section provides details

Reducing Downtime Component Causes

Reducing Cold Patching

The largest single contributor to patching downtime was cold patching Up until 2009 more than 99 of patches were applied only after shutting down the servers running and supporting the application being patched But not all of the over 700 patches applied each quarter required cold patching If more patches could be applied hot while the affected systems continued to run then this number could be brought down ndash directly reducing required downtime

In order to help classify a patch as hot or cold guidelines were developed based on the impact that patches had on the applications and on supporting systems For example a patch that was simply delivering new reports could be applied hot while a patch that updated a database table structure or a critical PLSQL package needed to go in cold

Based on these guidelines the bulk of the hot patches could be packaged separately from the cold ones and applied on a weekly basis while systems continued to run The percentage of cold patches dropped from over 99 in 2009 to less than 60 in 2010 and 2011 Since every patch applied hot rather than cold reduced downtime downtime declined proportionately

Figure 2 Percentage of Patches Applied Hot vs Cold in 2009 and 2011

Speeding up Pre and Post Patching Steps

During a patching window supporting systems - the Concurrent Manager (CM) database instances and application servers had to be shut down before patching and then started up after adding to downtime These systems were being shut down in sequence requiring each shutdown step to wait for the previous one to complete For example there are multiple database instances and it could take as long as 10-15 minutes for each instance to shut down This meant 30 minutes or more of the maintenance windows

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

were eaten up just to shut down the database instances The same issue occurred in reverse when starting up and readying supporting systems In addition other pre- and post-patching steps had to be performed including disabling and re-enabling security triggers removing temporary database tables and recompiling invalid schema objects

To speed up pre- and post-patching steps the team identified steps that could either be shortened or performed in parallel instead of sequentially For example CM waited for all running processes to finish before it would shut down So to speed shutdown the team added a script that terminated all running processes after waiting for a few minutes The 50 plus application servers were also shut down in parallel instead of sequentially bringing total shut-down time for these servers to less than 10 minutes

Oracle IT also began doing a lsquoshutdown abortrsquo of database instances to speed up their shutdown process Each instance was forced to shut down within one minute Whereas in the past each instance was taken down only after the previous one had completed shutting down multiple instances of the databases were now shut down in parallel typically within 30 seconds of each other Similarly restarts of supporting systems and other steps needed to make these systems operational were shifted to being performed in parallel As a result of these changes the time taken to complete the pre and post patching steps dropped from over 20 hours a quarter in 2010 to less than 10 hours a quarter in 2011 a reduction that went directly to the bottom line of overall downtime Table 3 provides a detailed breakdown of the components

Area Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Pre patching and shutdown steps (hrs)

Note only combined pre and post statistics are available for these quarters

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post 27 23 10 165 95 105 9 times (hrs)

Table 2 Downtime Caused by Pre- and Post-Patching Steps by Quarter

Increasing Patching Frequency

Patching efficiency was also improved by the rather counterintuitive process change of patching more often The older maintenance process had centered on a quarterly release bundle in which over 300 patches were applied during a single patching window each quarter

Despite the apparent advantages of doing one large patching event per quarter this process had serious unintended consequences First applying such a large bundle requires a large patching window which can be more disruptive to business operations such as manufacturing and field services than several smaller ones Second a single quarterly window reduces business agility and does not allow for incremental changes Third patches would sometimes be rushed into the bundle before they were fully ready to avoid missing these infrequent quarterly windows If a rushed patch caused problems in the system emergency patches would then have to be applied to correct the problems ndash causing additional downtime Finally applying so many patches simultaneously reduced accountability and made problems difficult to trace when they did occur

Because the quarterly bundles had so many unintended consequences the team found that they could actually achieve less downtime by patching more often In the current process patches are applied during regular weekly patching windows spread over the quarter Figure 3 shows the more even spread of patches in 2011 when compared to 2010 To accommodate the increased patching frequency the testing

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

process had to be made more robust This was achieved by introducing automated testing of critical application flows using the Oracle Application Testing Suite (OATS) OATS enables definition and management of the application testing process and validates application functionality

Figure 3 Number of Patches Applied by Week 2010-2011

Improving Patching Script Performance

Downtime also resulted from the poor performance of patch application scripts which in the absence of official guidelines on tuning often ran for over 30 minutes each As part of the downtime reduction initiative guidelines were put into place requiring patching scripts to be tuned so every job within patches submitted ran in under 10 minutes

This mandate did consume some additional labor for script tuning However the team considered this labor a reasonable tradeoff since it affected only direct IT costs while downtime imposed much more significant costs across the company With scripts tuned to run faster the actual patching component of downtime was reduced In addition tuning of standard scripts benefitted customers who had to apply them later

It should be noted that some of the improvement in script processing speed did not come from script tuning but rather from faster hardware During the period of the downtime reduction initiative the servers that run Oraclersquos Global Single Instance (GSI) of E-Business Suite were upgraded from a four-node Real Application Cluster (RAC) running Sun Fire E25Ks to a three-node RAC running Sun SPARC Enterprise M9000s The new M9000 servers provided a significant performance boost compared to the previous E-25K servers The main drivers for this upgrade were the ability to handle increased load from the Sun acquisition and to improve GSI performance in normal operation However as a side benefit the M9000s did indeed process patching scripts much faster

Script tuning and faster hardware combined to dramatically reduce the time taken for the actual patch application steps In 2010 patch applications steps consumed over 50 hours per quarter In 2011 this dropped to 4 hours per quarter

Automating Custom Patching

Like most large enterprise software deployments Oraclersquos own implementation of E-Business Suite contains custom code and application customizations These in turn require custom patches A significant number of EBS patches applied at Oracle were custom Furthermore Oraclersquos internal footprint also

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

includes a number of non-EBS applications such as Seibel Agile and Oracle Application Express (APEX) The manual application process for custom patches to EBS and for any patches to non-EBS applications was both time-consuming and labor intensive

Standard patches to EBS had always been applied using an automated tool called AutoPatch Autopatch applied all bug fixes in a patch managed version checking tracked changes and allowed restart capability But no such capabilities were in place for custom EBS or non-EBS patches which had to be hand-executed or hand-placed by patching personnel into directories Aside from using up resource hours this added a layer of complexity and contributed to errors and quality issues

The team started following the same process to build custom patches as was used for standard patches so that custom patches could be applied using AutoPatch They also developed functionality similar to AutoPatch into a shell script to automate application of custom non-EBS patches These two tools allowed Oracle IT to apply custom patches in much the same way as standard ones

Automation of Patch Management

Oracle IT also improved the change control process that led up to patching Oracle had used a tool called called Automated Release Updates (ARU) for many years to automate the process of defining building packaging and distributing patches For patch management a tool called Common Patch Request Tool (CPRT) had been used prior to the downtime reduction initiative CPRT offered limited functionality to track patches and record manual deployment instructions It included a cumbersome approval process for patches involving manual steps

In addition approvers were previously designated for each of the 200 applications supported by Oracle IT and a patch containing updates to several applications required approvals from designated approvers for each of the included applications This process occurred before patch application and therefore did not contribute directly to downtime However it did consume time and resources reduce accountability and cause delays in rolling out fixes and new functionality to users

To better manage patching related processes Oracle IT built a custom Enterprise Resource Planning (ERP) tool called Patch Approval Submission System (PASS) which simplifies and automates patch tracking approval downtime management and reporting The switch from CPRT to PASS started in November 2010 and was completed in December 2011

Patch type Tools used 2008 2011

EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutopatch EBS standard patch ARU-gtCPRT -gt Autopatch ARU-gtPASS-gtAutopatch Non-EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutomated tool

that mimics Autopatch behavior Non-EBS standard patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtZip file with

instructions for patching team

Table 3 Types of Patches and Patching Tools Used

PASS automatically manages the workflow required to move a request to approval and to patching It allows developers request target environments and patching windows for each ARU patch At every step of the patching process from identification of the issue to approval to actual implementation PASS provides accountability and tracks all pieces of information on who was doing what and when Table 3 shows the tools used to automate the patching processes for EBS and non-EBS patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Create patch request

Submit patch for approval Approval

Patching Team picks up request

Patching Team applies

patch

Requester tests patch

Figure 4 Patching Workflow Steps Tracked in PASS

PASS has also streamlined the approval process Previously each of the 200 supported applications had a designated approver A patch containing files impacting different applications required approval from designees for each of those applications making the process cumbersome With PASS the number of approvers for any given patch is reduced to a handful

This reduction in approvers is possible because of another process change ndash a radical reduction in the number of people authorized to submit patches In the old process developers who requested patches also submitted them In the new PASS process a separate layer of submitters has been designated to ensure the patch quality and performance before submission This adds a layer of accountability and eliminates the need for a large number of approvers This smaller number of submitters is also required to provide more information when submitting patches Table 4 below shows the questions submitters must respond to when entering a patch into PASS

1 Describe the issues being addressed by this patch

2 Identify risks associated with the patch application

3 Indicate the tracking number of the bug being fixed

4 Indicate a target date for patching of the production environment

5 Enter the name of the developer or IT reviewer

6 Identify the files that are changed or impacted

7 Briefly describe the code changes for the files

8 Confirm that the patch has been tested either manually or using PASS

9 Indicate when the patch was last tested in a test environment

10 Note the patch execution times in each previous environment

Table 4 Information Required at Patch Submission

The new ARUPASS process also provides efficient merging of patches so that they can be applied in a single package As part of its Multi Language Support (MLS) EBS supports eleven languages and patches often need to be built for each language By merging these into one package common steps such as maintaining file versions and updating history tables can be performed once rather than multiple times

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Figure 5 Screen shot of Oraclersquos Patch Approval Submission System (PASS)

ARU and PASS are custom tools that Oracle IT continues to use and extend because of their long experience with them PASS has been customized to work well with the internal patch generation and source

control tools that are used by the Oraclersquos Applications Product Development group and its extension the Applications IT group However Oracle customers can use many of the same capabilities in the form of the Application Change Management Pack (ACMP) and the Application Management Pack (AMP) both included in Enterprise Manager ACMP is an end to end change management solution that works with a variety of source control applications and allows for the automated deployment of standard and custom patches across different environments AMP has patch management functionalities similar to PASS It is a system management solution for centralized management of multiple environments and allows for the proactive monitoring of requests and workflows for reporting on history and trends and for automation of repetitive Database Administration (DBA) tasks

Figure 6 Screen Shot of Patch Manager in Application Change Management Pack (ACMP)

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Results

Oracle ITrsquos downtime reduction initiative reduced maintenance downtime by 85 from 1045 hours in Q2 2010 to less than 15 hours in each of three consecutive quarters from Q2 2011 through Q4 2011 This reduction occurred although the number of patches remained essentially the same for most quarters and even increased in Q2 2011 Figure 6 below shows the trend of downtime reduction along with the number of patches applied each quarter

0

200

400

600

800

1000

1200

1400

0

20

40

60

80

100

120

Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Number of p

atchesa

applied

Time

(hours)

ShutdownStartup time Patching time Total time Patches

Figure 7 GSI Downtimes by Quarter

Table 5 below provides the underlying data and some additional detail including the number of planned outages and actual patching events It should be noted that tracking the detail of time consumed in pre- and post-patching steps was initiated as part of the downtime reduction initiative Therefore a breakdown of hours into pre and post is not available for all quarters

Area Q1 2010 Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Patching (hrs) 775 325 20 18 45 4 4 Pre patching and shutdown steps (hrs)

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post times (hrs)

27 23 10 165 95 105 9

Total downtime (hrs) 1725 1045 555 30 345 14 145 13

Table 5 Breakdown of GSI Downtime by Quarter (detail data not available for some quarters)

It is not simple to allocate percentages or hours of downtime reduction to all the factors addressed in Oraclersquos downtime reduction initiative Reductions in factors such as hot patching and the pre- and post-patching steps can be quantified to the minute with no dependencies to cloud the issue As shown in Figure 7 the increase in hot patching contributed to a 66 decrease in patching downtime and the improvements in pre and post patching steps to a 19 reduction Improvements in other factors

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

contributed to another 15 decrease in downtime but are much more interdependent and their benefits harder to accurately allocate For example both script performance tuning and an upgrade of Oraclersquos Global Single Instance to faster hardware reduced the downtime associated with patching script execution However since the hardware upgrade was initiated to improve overall GSI performance and was scheduled independently of the downtime reduction initiative the team could not accurately separate the effects of the hardware upgrade from those of script tuning A pure research organization would have made one of these changes at a time and quantified exactly how many hours of downtime reduction could be specifically be attributed to script tuning vs upgraded hardware Since Oracle ITrsquos primary mission is to support the business sometimes multiple improvements must be made simultaneously despite the inevitable confounds this produces

The exact impacts of process improvements such as increasing patch frequency are similarly difficult to break out Doing so would require putting exact numbers to downtime caused specifically by patches that were rushed in order to make the cutoff and by the reduction in accountability caused by very large bundles not a straightforward calculation

66

19

15

Hot Patching

Pre and Post patching steps

Patch tuning hardware upgrades and other factors

Figure 8 Contribution of Factor Improvements on Downtime Reduction Initiative

Despite the difficulties of exactly allocating a portion of the downtime reduction to each factor it is clear that all of the factors cited in Table 1 contributed substantially to Oraclersquos downtime and improvements in each factor contributed to the overall 85 reduction Table 5 below revisits the factors that contributed to Oraclersquos previously high downtime and recaps the actions taken to improve them

Factor Description

Cold Patching Percentage of patches applied hot increased from less than 1 in 2009 to over 40 in 2011

Pre and post patching Steps

Systems are shut down and started back up in parallel As a result pre and post patching times went down from over 20 hours per quarter in 2010 to less than 10 hours per quarter in 2011

Script performance Patching scripts are now tuned to run in under 10 minutes Database server hardware upgrades also helped speed up script execution times

Patch Frequency Smaller patch sets are applied weekly as opposed to large quarterly bundles This has improved patch quality and brought down instances of follow on patchings required to correct bad patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 4: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

0

20

40

60

80

100

120

140

160

180

200

Q3 2008 Q4 2008 Q1 2009 Q2 2009 Q3 2009 Q4 2009 Q1 2010 Q2 2010

Downtime

(hrs)

Figure 1 Oracle Global Single Instance Downtimes per Quarter Prior to Downtime Reduction Initiative

This impact became much more material with the addition of the Sun hardware business When systems went down even for routine maintenance manufacturing was severely impacted Likewise field service could not operate without visibility into the supply chain Long patching windows predated Sun but the additional load and increased consequences made it clear to Oracle IT that changes needed to be quickly implemented to reduce downtime It was mandated that we bring the system maintenance window down from fifteen to three hours per week

Identifying Component Causes of Downtime

Because maintenance downtime has multiple causes Oracle IT began by identifying the factors that contributed the most to downtime and resource consumption at Oracle Table 1 below summarizes these major contributing factors

Factor Description

Cold Patching Environments had to be shut down to apply the overwhelming majority of patches

Pre and post patching Steps

Shutting down and starting up databases and mid tiers in preparation for patching took over 30 minutes of the maintenance window Post patching steps were performed sequentially

Script performance There were no official guidelines for patch developers on how quickly their patching scripts needed to execute In addition the current E-25K database server hardware was inadequate at handling the required capacity

Large Infrequent Patch Bundles

Patches were primarily applied in a quarterly bundle containing over 300 patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patch application required custom scripts and manual steps

Patch Management The approval process was complicated and the patch management tool used to track patches was inefficient

Table 1 Top Contributors to Patching Downtime at Oracle

Once the factors contributing to downtime had been identified Oracle IT began process improvements to reduce the downtime caused by each factor The following section provides details

Reducing Downtime Component Causes

Reducing Cold Patching

The largest single contributor to patching downtime was cold patching Up until 2009 more than 99 of patches were applied only after shutting down the servers running and supporting the application being patched But not all of the over 700 patches applied each quarter required cold patching If more patches could be applied hot while the affected systems continued to run then this number could be brought down ndash directly reducing required downtime

In order to help classify a patch as hot or cold guidelines were developed based on the impact that patches had on the applications and on supporting systems For example a patch that was simply delivering new reports could be applied hot while a patch that updated a database table structure or a critical PLSQL package needed to go in cold

Based on these guidelines the bulk of the hot patches could be packaged separately from the cold ones and applied on a weekly basis while systems continued to run The percentage of cold patches dropped from over 99 in 2009 to less than 60 in 2010 and 2011 Since every patch applied hot rather than cold reduced downtime downtime declined proportionately

Figure 2 Percentage of Patches Applied Hot vs Cold in 2009 and 2011

Speeding up Pre and Post Patching Steps

During a patching window supporting systems - the Concurrent Manager (CM) database instances and application servers had to be shut down before patching and then started up after adding to downtime These systems were being shut down in sequence requiring each shutdown step to wait for the previous one to complete For example there are multiple database instances and it could take as long as 10-15 minutes for each instance to shut down This meant 30 minutes or more of the maintenance windows

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

were eaten up just to shut down the database instances The same issue occurred in reverse when starting up and readying supporting systems In addition other pre- and post-patching steps had to be performed including disabling and re-enabling security triggers removing temporary database tables and recompiling invalid schema objects

To speed up pre- and post-patching steps the team identified steps that could either be shortened or performed in parallel instead of sequentially For example CM waited for all running processes to finish before it would shut down So to speed shutdown the team added a script that terminated all running processes after waiting for a few minutes The 50 plus application servers were also shut down in parallel instead of sequentially bringing total shut-down time for these servers to less than 10 minutes

Oracle IT also began doing a lsquoshutdown abortrsquo of database instances to speed up their shutdown process Each instance was forced to shut down within one minute Whereas in the past each instance was taken down only after the previous one had completed shutting down multiple instances of the databases were now shut down in parallel typically within 30 seconds of each other Similarly restarts of supporting systems and other steps needed to make these systems operational were shifted to being performed in parallel As a result of these changes the time taken to complete the pre and post patching steps dropped from over 20 hours a quarter in 2010 to less than 10 hours a quarter in 2011 a reduction that went directly to the bottom line of overall downtime Table 3 provides a detailed breakdown of the components

Area Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Pre patching and shutdown steps (hrs)

Note only combined pre and post statistics are available for these quarters

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post 27 23 10 165 95 105 9 times (hrs)

Table 2 Downtime Caused by Pre- and Post-Patching Steps by Quarter

Increasing Patching Frequency

Patching efficiency was also improved by the rather counterintuitive process change of patching more often The older maintenance process had centered on a quarterly release bundle in which over 300 patches were applied during a single patching window each quarter

Despite the apparent advantages of doing one large patching event per quarter this process had serious unintended consequences First applying such a large bundle requires a large patching window which can be more disruptive to business operations such as manufacturing and field services than several smaller ones Second a single quarterly window reduces business agility and does not allow for incremental changes Third patches would sometimes be rushed into the bundle before they were fully ready to avoid missing these infrequent quarterly windows If a rushed patch caused problems in the system emergency patches would then have to be applied to correct the problems ndash causing additional downtime Finally applying so many patches simultaneously reduced accountability and made problems difficult to trace when they did occur

Because the quarterly bundles had so many unintended consequences the team found that they could actually achieve less downtime by patching more often In the current process patches are applied during regular weekly patching windows spread over the quarter Figure 3 shows the more even spread of patches in 2011 when compared to 2010 To accommodate the increased patching frequency the testing

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

process had to be made more robust This was achieved by introducing automated testing of critical application flows using the Oracle Application Testing Suite (OATS) OATS enables definition and management of the application testing process and validates application functionality

Figure 3 Number of Patches Applied by Week 2010-2011

Improving Patching Script Performance

Downtime also resulted from the poor performance of patch application scripts which in the absence of official guidelines on tuning often ran for over 30 minutes each As part of the downtime reduction initiative guidelines were put into place requiring patching scripts to be tuned so every job within patches submitted ran in under 10 minutes

This mandate did consume some additional labor for script tuning However the team considered this labor a reasonable tradeoff since it affected only direct IT costs while downtime imposed much more significant costs across the company With scripts tuned to run faster the actual patching component of downtime was reduced In addition tuning of standard scripts benefitted customers who had to apply them later

It should be noted that some of the improvement in script processing speed did not come from script tuning but rather from faster hardware During the period of the downtime reduction initiative the servers that run Oraclersquos Global Single Instance (GSI) of E-Business Suite were upgraded from a four-node Real Application Cluster (RAC) running Sun Fire E25Ks to a three-node RAC running Sun SPARC Enterprise M9000s The new M9000 servers provided a significant performance boost compared to the previous E-25K servers The main drivers for this upgrade were the ability to handle increased load from the Sun acquisition and to improve GSI performance in normal operation However as a side benefit the M9000s did indeed process patching scripts much faster

Script tuning and faster hardware combined to dramatically reduce the time taken for the actual patch application steps In 2010 patch applications steps consumed over 50 hours per quarter In 2011 this dropped to 4 hours per quarter

Automating Custom Patching

Like most large enterprise software deployments Oraclersquos own implementation of E-Business Suite contains custom code and application customizations These in turn require custom patches A significant number of EBS patches applied at Oracle were custom Furthermore Oraclersquos internal footprint also

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

includes a number of non-EBS applications such as Seibel Agile and Oracle Application Express (APEX) The manual application process for custom patches to EBS and for any patches to non-EBS applications was both time-consuming and labor intensive

Standard patches to EBS had always been applied using an automated tool called AutoPatch Autopatch applied all bug fixes in a patch managed version checking tracked changes and allowed restart capability But no such capabilities were in place for custom EBS or non-EBS patches which had to be hand-executed or hand-placed by patching personnel into directories Aside from using up resource hours this added a layer of complexity and contributed to errors and quality issues

The team started following the same process to build custom patches as was used for standard patches so that custom patches could be applied using AutoPatch They also developed functionality similar to AutoPatch into a shell script to automate application of custom non-EBS patches These two tools allowed Oracle IT to apply custom patches in much the same way as standard ones

Automation of Patch Management

Oracle IT also improved the change control process that led up to patching Oracle had used a tool called called Automated Release Updates (ARU) for many years to automate the process of defining building packaging and distributing patches For patch management a tool called Common Patch Request Tool (CPRT) had been used prior to the downtime reduction initiative CPRT offered limited functionality to track patches and record manual deployment instructions It included a cumbersome approval process for patches involving manual steps

In addition approvers were previously designated for each of the 200 applications supported by Oracle IT and a patch containing updates to several applications required approvals from designated approvers for each of the included applications This process occurred before patch application and therefore did not contribute directly to downtime However it did consume time and resources reduce accountability and cause delays in rolling out fixes and new functionality to users

To better manage patching related processes Oracle IT built a custom Enterprise Resource Planning (ERP) tool called Patch Approval Submission System (PASS) which simplifies and automates patch tracking approval downtime management and reporting The switch from CPRT to PASS started in November 2010 and was completed in December 2011

Patch type Tools used 2008 2011

EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutopatch EBS standard patch ARU-gtCPRT -gt Autopatch ARU-gtPASS-gtAutopatch Non-EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutomated tool

that mimics Autopatch behavior Non-EBS standard patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtZip file with

instructions for patching team

Table 3 Types of Patches and Patching Tools Used

PASS automatically manages the workflow required to move a request to approval and to patching It allows developers request target environments and patching windows for each ARU patch At every step of the patching process from identification of the issue to approval to actual implementation PASS provides accountability and tracks all pieces of information on who was doing what and when Table 3 shows the tools used to automate the patching processes for EBS and non-EBS patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Create patch request

Submit patch for approval Approval

Patching Team picks up request

Patching Team applies

patch

Requester tests patch

Figure 4 Patching Workflow Steps Tracked in PASS

PASS has also streamlined the approval process Previously each of the 200 supported applications had a designated approver A patch containing files impacting different applications required approval from designees for each of those applications making the process cumbersome With PASS the number of approvers for any given patch is reduced to a handful

This reduction in approvers is possible because of another process change ndash a radical reduction in the number of people authorized to submit patches In the old process developers who requested patches also submitted them In the new PASS process a separate layer of submitters has been designated to ensure the patch quality and performance before submission This adds a layer of accountability and eliminates the need for a large number of approvers This smaller number of submitters is also required to provide more information when submitting patches Table 4 below shows the questions submitters must respond to when entering a patch into PASS

1 Describe the issues being addressed by this patch

2 Identify risks associated with the patch application

3 Indicate the tracking number of the bug being fixed

4 Indicate a target date for patching of the production environment

5 Enter the name of the developer or IT reviewer

6 Identify the files that are changed or impacted

7 Briefly describe the code changes for the files

8 Confirm that the patch has been tested either manually or using PASS

9 Indicate when the patch was last tested in a test environment

10 Note the patch execution times in each previous environment

Table 4 Information Required at Patch Submission

The new ARUPASS process also provides efficient merging of patches so that they can be applied in a single package As part of its Multi Language Support (MLS) EBS supports eleven languages and patches often need to be built for each language By merging these into one package common steps such as maintaining file versions and updating history tables can be performed once rather than multiple times

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Figure 5 Screen shot of Oraclersquos Patch Approval Submission System (PASS)

ARU and PASS are custom tools that Oracle IT continues to use and extend because of their long experience with them PASS has been customized to work well with the internal patch generation and source

control tools that are used by the Oraclersquos Applications Product Development group and its extension the Applications IT group However Oracle customers can use many of the same capabilities in the form of the Application Change Management Pack (ACMP) and the Application Management Pack (AMP) both included in Enterprise Manager ACMP is an end to end change management solution that works with a variety of source control applications and allows for the automated deployment of standard and custom patches across different environments AMP has patch management functionalities similar to PASS It is a system management solution for centralized management of multiple environments and allows for the proactive monitoring of requests and workflows for reporting on history and trends and for automation of repetitive Database Administration (DBA) tasks

Figure 6 Screen Shot of Patch Manager in Application Change Management Pack (ACMP)

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Results

Oracle ITrsquos downtime reduction initiative reduced maintenance downtime by 85 from 1045 hours in Q2 2010 to less than 15 hours in each of three consecutive quarters from Q2 2011 through Q4 2011 This reduction occurred although the number of patches remained essentially the same for most quarters and even increased in Q2 2011 Figure 6 below shows the trend of downtime reduction along with the number of patches applied each quarter

0

200

400

600

800

1000

1200

1400

0

20

40

60

80

100

120

Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Number of p

atchesa

applied

Time

(hours)

ShutdownStartup time Patching time Total time Patches

Figure 7 GSI Downtimes by Quarter

Table 5 below provides the underlying data and some additional detail including the number of planned outages and actual patching events It should be noted that tracking the detail of time consumed in pre- and post-patching steps was initiated as part of the downtime reduction initiative Therefore a breakdown of hours into pre and post is not available for all quarters

Area Q1 2010 Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Patching (hrs) 775 325 20 18 45 4 4 Pre patching and shutdown steps (hrs)

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post times (hrs)

27 23 10 165 95 105 9

Total downtime (hrs) 1725 1045 555 30 345 14 145 13

Table 5 Breakdown of GSI Downtime by Quarter (detail data not available for some quarters)

It is not simple to allocate percentages or hours of downtime reduction to all the factors addressed in Oraclersquos downtime reduction initiative Reductions in factors such as hot patching and the pre- and post-patching steps can be quantified to the minute with no dependencies to cloud the issue As shown in Figure 7 the increase in hot patching contributed to a 66 decrease in patching downtime and the improvements in pre and post patching steps to a 19 reduction Improvements in other factors

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

contributed to another 15 decrease in downtime but are much more interdependent and their benefits harder to accurately allocate For example both script performance tuning and an upgrade of Oraclersquos Global Single Instance to faster hardware reduced the downtime associated with patching script execution However since the hardware upgrade was initiated to improve overall GSI performance and was scheduled independently of the downtime reduction initiative the team could not accurately separate the effects of the hardware upgrade from those of script tuning A pure research organization would have made one of these changes at a time and quantified exactly how many hours of downtime reduction could be specifically be attributed to script tuning vs upgraded hardware Since Oracle ITrsquos primary mission is to support the business sometimes multiple improvements must be made simultaneously despite the inevitable confounds this produces

The exact impacts of process improvements such as increasing patch frequency are similarly difficult to break out Doing so would require putting exact numbers to downtime caused specifically by patches that were rushed in order to make the cutoff and by the reduction in accountability caused by very large bundles not a straightforward calculation

66

19

15

Hot Patching

Pre and Post patching steps

Patch tuning hardware upgrades and other factors

Figure 8 Contribution of Factor Improvements on Downtime Reduction Initiative

Despite the difficulties of exactly allocating a portion of the downtime reduction to each factor it is clear that all of the factors cited in Table 1 contributed substantially to Oraclersquos downtime and improvements in each factor contributed to the overall 85 reduction Table 5 below revisits the factors that contributed to Oraclersquos previously high downtime and recaps the actions taken to improve them

Factor Description

Cold Patching Percentage of patches applied hot increased from less than 1 in 2009 to over 40 in 2011

Pre and post patching Steps

Systems are shut down and started back up in parallel As a result pre and post patching times went down from over 20 hours per quarter in 2010 to less than 10 hours per quarter in 2011

Script performance Patching scripts are now tuned to run in under 10 minutes Database server hardware upgrades also helped speed up script execution times

Patch Frequency Smaller patch sets are applied weekly as opposed to large quarterly bundles This has improved patch quality and brought down instances of follow on patchings required to correct bad patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 5: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patch application required custom scripts and manual steps

Patch Management The approval process was complicated and the patch management tool used to track patches was inefficient

Table 1 Top Contributors to Patching Downtime at Oracle

Once the factors contributing to downtime had been identified Oracle IT began process improvements to reduce the downtime caused by each factor The following section provides details

Reducing Downtime Component Causes

Reducing Cold Patching

The largest single contributor to patching downtime was cold patching Up until 2009 more than 99 of patches were applied only after shutting down the servers running and supporting the application being patched But not all of the over 700 patches applied each quarter required cold patching If more patches could be applied hot while the affected systems continued to run then this number could be brought down ndash directly reducing required downtime

In order to help classify a patch as hot or cold guidelines were developed based on the impact that patches had on the applications and on supporting systems For example a patch that was simply delivering new reports could be applied hot while a patch that updated a database table structure or a critical PLSQL package needed to go in cold

Based on these guidelines the bulk of the hot patches could be packaged separately from the cold ones and applied on a weekly basis while systems continued to run The percentage of cold patches dropped from over 99 in 2009 to less than 60 in 2010 and 2011 Since every patch applied hot rather than cold reduced downtime downtime declined proportionately

Figure 2 Percentage of Patches Applied Hot vs Cold in 2009 and 2011

Speeding up Pre and Post Patching Steps

During a patching window supporting systems - the Concurrent Manager (CM) database instances and application servers had to be shut down before patching and then started up after adding to downtime These systems were being shut down in sequence requiring each shutdown step to wait for the previous one to complete For example there are multiple database instances and it could take as long as 10-15 minutes for each instance to shut down This meant 30 minutes or more of the maintenance windows

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

were eaten up just to shut down the database instances The same issue occurred in reverse when starting up and readying supporting systems In addition other pre- and post-patching steps had to be performed including disabling and re-enabling security triggers removing temporary database tables and recompiling invalid schema objects

To speed up pre- and post-patching steps the team identified steps that could either be shortened or performed in parallel instead of sequentially For example CM waited for all running processes to finish before it would shut down So to speed shutdown the team added a script that terminated all running processes after waiting for a few minutes The 50 plus application servers were also shut down in parallel instead of sequentially bringing total shut-down time for these servers to less than 10 minutes

Oracle IT also began doing a lsquoshutdown abortrsquo of database instances to speed up their shutdown process Each instance was forced to shut down within one minute Whereas in the past each instance was taken down only after the previous one had completed shutting down multiple instances of the databases were now shut down in parallel typically within 30 seconds of each other Similarly restarts of supporting systems and other steps needed to make these systems operational were shifted to being performed in parallel As a result of these changes the time taken to complete the pre and post patching steps dropped from over 20 hours a quarter in 2010 to less than 10 hours a quarter in 2011 a reduction that went directly to the bottom line of overall downtime Table 3 provides a detailed breakdown of the components

Area Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Pre patching and shutdown steps (hrs)

Note only combined pre and post statistics are available for these quarters

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post 27 23 10 165 95 105 9 times (hrs)

Table 2 Downtime Caused by Pre- and Post-Patching Steps by Quarter

Increasing Patching Frequency

Patching efficiency was also improved by the rather counterintuitive process change of patching more often The older maintenance process had centered on a quarterly release bundle in which over 300 patches were applied during a single patching window each quarter

Despite the apparent advantages of doing one large patching event per quarter this process had serious unintended consequences First applying such a large bundle requires a large patching window which can be more disruptive to business operations such as manufacturing and field services than several smaller ones Second a single quarterly window reduces business agility and does not allow for incremental changes Third patches would sometimes be rushed into the bundle before they were fully ready to avoid missing these infrequent quarterly windows If a rushed patch caused problems in the system emergency patches would then have to be applied to correct the problems ndash causing additional downtime Finally applying so many patches simultaneously reduced accountability and made problems difficult to trace when they did occur

Because the quarterly bundles had so many unintended consequences the team found that they could actually achieve less downtime by patching more often In the current process patches are applied during regular weekly patching windows spread over the quarter Figure 3 shows the more even spread of patches in 2011 when compared to 2010 To accommodate the increased patching frequency the testing

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

process had to be made more robust This was achieved by introducing automated testing of critical application flows using the Oracle Application Testing Suite (OATS) OATS enables definition and management of the application testing process and validates application functionality

Figure 3 Number of Patches Applied by Week 2010-2011

Improving Patching Script Performance

Downtime also resulted from the poor performance of patch application scripts which in the absence of official guidelines on tuning often ran for over 30 minutes each As part of the downtime reduction initiative guidelines were put into place requiring patching scripts to be tuned so every job within patches submitted ran in under 10 minutes

This mandate did consume some additional labor for script tuning However the team considered this labor a reasonable tradeoff since it affected only direct IT costs while downtime imposed much more significant costs across the company With scripts tuned to run faster the actual patching component of downtime was reduced In addition tuning of standard scripts benefitted customers who had to apply them later

It should be noted that some of the improvement in script processing speed did not come from script tuning but rather from faster hardware During the period of the downtime reduction initiative the servers that run Oraclersquos Global Single Instance (GSI) of E-Business Suite were upgraded from a four-node Real Application Cluster (RAC) running Sun Fire E25Ks to a three-node RAC running Sun SPARC Enterprise M9000s The new M9000 servers provided a significant performance boost compared to the previous E-25K servers The main drivers for this upgrade were the ability to handle increased load from the Sun acquisition and to improve GSI performance in normal operation However as a side benefit the M9000s did indeed process patching scripts much faster

Script tuning and faster hardware combined to dramatically reduce the time taken for the actual patch application steps In 2010 patch applications steps consumed over 50 hours per quarter In 2011 this dropped to 4 hours per quarter

Automating Custom Patching

Like most large enterprise software deployments Oraclersquos own implementation of E-Business Suite contains custom code and application customizations These in turn require custom patches A significant number of EBS patches applied at Oracle were custom Furthermore Oraclersquos internal footprint also

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

includes a number of non-EBS applications such as Seibel Agile and Oracle Application Express (APEX) The manual application process for custom patches to EBS and for any patches to non-EBS applications was both time-consuming and labor intensive

Standard patches to EBS had always been applied using an automated tool called AutoPatch Autopatch applied all bug fixes in a patch managed version checking tracked changes and allowed restart capability But no such capabilities were in place for custom EBS or non-EBS patches which had to be hand-executed or hand-placed by patching personnel into directories Aside from using up resource hours this added a layer of complexity and contributed to errors and quality issues

The team started following the same process to build custom patches as was used for standard patches so that custom patches could be applied using AutoPatch They also developed functionality similar to AutoPatch into a shell script to automate application of custom non-EBS patches These two tools allowed Oracle IT to apply custom patches in much the same way as standard ones

Automation of Patch Management

Oracle IT also improved the change control process that led up to patching Oracle had used a tool called called Automated Release Updates (ARU) for many years to automate the process of defining building packaging and distributing patches For patch management a tool called Common Patch Request Tool (CPRT) had been used prior to the downtime reduction initiative CPRT offered limited functionality to track patches and record manual deployment instructions It included a cumbersome approval process for patches involving manual steps

In addition approvers were previously designated for each of the 200 applications supported by Oracle IT and a patch containing updates to several applications required approvals from designated approvers for each of the included applications This process occurred before patch application and therefore did not contribute directly to downtime However it did consume time and resources reduce accountability and cause delays in rolling out fixes and new functionality to users

To better manage patching related processes Oracle IT built a custom Enterprise Resource Planning (ERP) tool called Patch Approval Submission System (PASS) which simplifies and automates patch tracking approval downtime management and reporting The switch from CPRT to PASS started in November 2010 and was completed in December 2011

Patch type Tools used 2008 2011

EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutopatch EBS standard patch ARU-gtCPRT -gt Autopatch ARU-gtPASS-gtAutopatch Non-EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutomated tool

that mimics Autopatch behavior Non-EBS standard patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtZip file with

instructions for patching team

Table 3 Types of Patches and Patching Tools Used

PASS automatically manages the workflow required to move a request to approval and to patching It allows developers request target environments and patching windows for each ARU patch At every step of the patching process from identification of the issue to approval to actual implementation PASS provides accountability and tracks all pieces of information on who was doing what and when Table 3 shows the tools used to automate the patching processes for EBS and non-EBS patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Create patch request

Submit patch for approval Approval

Patching Team picks up request

Patching Team applies

patch

Requester tests patch

Figure 4 Patching Workflow Steps Tracked in PASS

PASS has also streamlined the approval process Previously each of the 200 supported applications had a designated approver A patch containing files impacting different applications required approval from designees for each of those applications making the process cumbersome With PASS the number of approvers for any given patch is reduced to a handful

This reduction in approvers is possible because of another process change ndash a radical reduction in the number of people authorized to submit patches In the old process developers who requested patches also submitted them In the new PASS process a separate layer of submitters has been designated to ensure the patch quality and performance before submission This adds a layer of accountability and eliminates the need for a large number of approvers This smaller number of submitters is also required to provide more information when submitting patches Table 4 below shows the questions submitters must respond to when entering a patch into PASS

1 Describe the issues being addressed by this patch

2 Identify risks associated with the patch application

3 Indicate the tracking number of the bug being fixed

4 Indicate a target date for patching of the production environment

5 Enter the name of the developer or IT reviewer

6 Identify the files that are changed or impacted

7 Briefly describe the code changes for the files

8 Confirm that the patch has been tested either manually or using PASS

9 Indicate when the patch was last tested in a test environment

10 Note the patch execution times in each previous environment

Table 4 Information Required at Patch Submission

The new ARUPASS process also provides efficient merging of patches so that they can be applied in a single package As part of its Multi Language Support (MLS) EBS supports eleven languages and patches often need to be built for each language By merging these into one package common steps such as maintaining file versions and updating history tables can be performed once rather than multiple times

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Figure 5 Screen shot of Oraclersquos Patch Approval Submission System (PASS)

ARU and PASS are custom tools that Oracle IT continues to use and extend because of their long experience with them PASS has been customized to work well with the internal patch generation and source

control tools that are used by the Oraclersquos Applications Product Development group and its extension the Applications IT group However Oracle customers can use many of the same capabilities in the form of the Application Change Management Pack (ACMP) and the Application Management Pack (AMP) both included in Enterprise Manager ACMP is an end to end change management solution that works with a variety of source control applications and allows for the automated deployment of standard and custom patches across different environments AMP has patch management functionalities similar to PASS It is a system management solution for centralized management of multiple environments and allows for the proactive monitoring of requests and workflows for reporting on history and trends and for automation of repetitive Database Administration (DBA) tasks

Figure 6 Screen Shot of Patch Manager in Application Change Management Pack (ACMP)

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Results

Oracle ITrsquos downtime reduction initiative reduced maintenance downtime by 85 from 1045 hours in Q2 2010 to less than 15 hours in each of three consecutive quarters from Q2 2011 through Q4 2011 This reduction occurred although the number of patches remained essentially the same for most quarters and even increased in Q2 2011 Figure 6 below shows the trend of downtime reduction along with the number of patches applied each quarter

0

200

400

600

800

1000

1200

1400

0

20

40

60

80

100

120

Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Number of p

atchesa

applied

Time

(hours)

ShutdownStartup time Patching time Total time Patches

Figure 7 GSI Downtimes by Quarter

Table 5 below provides the underlying data and some additional detail including the number of planned outages and actual patching events It should be noted that tracking the detail of time consumed in pre- and post-patching steps was initiated as part of the downtime reduction initiative Therefore a breakdown of hours into pre and post is not available for all quarters

Area Q1 2010 Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Patching (hrs) 775 325 20 18 45 4 4 Pre patching and shutdown steps (hrs)

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post times (hrs)

27 23 10 165 95 105 9

Total downtime (hrs) 1725 1045 555 30 345 14 145 13

Table 5 Breakdown of GSI Downtime by Quarter (detail data not available for some quarters)

It is not simple to allocate percentages or hours of downtime reduction to all the factors addressed in Oraclersquos downtime reduction initiative Reductions in factors such as hot patching and the pre- and post-patching steps can be quantified to the minute with no dependencies to cloud the issue As shown in Figure 7 the increase in hot patching contributed to a 66 decrease in patching downtime and the improvements in pre and post patching steps to a 19 reduction Improvements in other factors

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

contributed to another 15 decrease in downtime but are much more interdependent and their benefits harder to accurately allocate For example both script performance tuning and an upgrade of Oraclersquos Global Single Instance to faster hardware reduced the downtime associated with patching script execution However since the hardware upgrade was initiated to improve overall GSI performance and was scheduled independently of the downtime reduction initiative the team could not accurately separate the effects of the hardware upgrade from those of script tuning A pure research organization would have made one of these changes at a time and quantified exactly how many hours of downtime reduction could be specifically be attributed to script tuning vs upgraded hardware Since Oracle ITrsquos primary mission is to support the business sometimes multiple improvements must be made simultaneously despite the inevitable confounds this produces

The exact impacts of process improvements such as increasing patch frequency are similarly difficult to break out Doing so would require putting exact numbers to downtime caused specifically by patches that were rushed in order to make the cutoff and by the reduction in accountability caused by very large bundles not a straightforward calculation

66

19

15

Hot Patching

Pre and Post patching steps

Patch tuning hardware upgrades and other factors

Figure 8 Contribution of Factor Improvements on Downtime Reduction Initiative

Despite the difficulties of exactly allocating a portion of the downtime reduction to each factor it is clear that all of the factors cited in Table 1 contributed substantially to Oraclersquos downtime and improvements in each factor contributed to the overall 85 reduction Table 5 below revisits the factors that contributed to Oraclersquos previously high downtime and recaps the actions taken to improve them

Factor Description

Cold Patching Percentage of patches applied hot increased from less than 1 in 2009 to over 40 in 2011

Pre and post patching Steps

Systems are shut down and started back up in parallel As a result pre and post patching times went down from over 20 hours per quarter in 2010 to less than 10 hours per quarter in 2011

Script performance Patching scripts are now tuned to run in under 10 minutes Database server hardware upgrades also helped speed up script execution times

Patch Frequency Smaller patch sets are applied weekly as opposed to large quarterly bundles This has improved patch quality and brought down instances of follow on patchings required to correct bad patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 6: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

were eaten up just to shut down the database instances The same issue occurred in reverse when starting up and readying supporting systems In addition other pre- and post-patching steps had to be performed including disabling and re-enabling security triggers removing temporary database tables and recompiling invalid schema objects

To speed up pre- and post-patching steps the team identified steps that could either be shortened or performed in parallel instead of sequentially For example CM waited for all running processes to finish before it would shut down So to speed shutdown the team added a script that terminated all running processes after waiting for a few minutes The 50 plus application servers were also shut down in parallel instead of sequentially bringing total shut-down time for these servers to less than 10 minutes

Oracle IT also began doing a lsquoshutdown abortrsquo of database instances to speed up their shutdown process Each instance was forced to shut down within one minute Whereas in the past each instance was taken down only after the previous one had completed shutting down multiple instances of the databases were now shut down in parallel typically within 30 seconds of each other Similarly restarts of supporting systems and other steps needed to make these systems operational were shifted to being performed in parallel As a result of these changes the time taken to complete the pre and post patching steps dropped from over 20 hours a quarter in 2010 to less than 10 hours a quarter in 2011 a reduction that went directly to the bottom line of overall downtime Table 3 provides a detailed breakdown of the components

Area Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Pre patching and shutdown steps (hrs)

Note only combined pre and post statistics are available for these quarters

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post 27 23 10 165 95 105 9 times (hrs)

Table 2 Downtime Caused by Pre- and Post-Patching Steps by Quarter

Increasing Patching Frequency

Patching efficiency was also improved by the rather counterintuitive process change of patching more often The older maintenance process had centered on a quarterly release bundle in which over 300 patches were applied during a single patching window each quarter

Despite the apparent advantages of doing one large patching event per quarter this process had serious unintended consequences First applying such a large bundle requires a large patching window which can be more disruptive to business operations such as manufacturing and field services than several smaller ones Second a single quarterly window reduces business agility and does not allow for incremental changes Third patches would sometimes be rushed into the bundle before they were fully ready to avoid missing these infrequent quarterly windows If a rushed patch caused problems in the system emergency patches would then have to be applied to correct the problems ndash causing additional downtime Finally applying so many patches simultaneously reduced accountability and made problems difficult to trace when they did occur

Because the quarterly bundles had so many unintended consequences the team found that they could actually achieve less downtime by patching more often In the current process patches are applied during regular weekly patching windows spread over the quarter Figure 3 shows the more even spread of patches in 2011 when compared to 2010 To accommodate the increased patching frequency the testing

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

process had to be made more robust This was achieved by introducing automated testing of critical application flows using the Oracle Application Testing Suite (OATS) OATS enables definition and management of the application testing process and validates application functionality

Figure 3 Number of Patches Applied by Week 2010-2011

Improving Patching Script Performance

Downtime also resulted from the poor performance of patch application scripts which in the absence of official guidelines on tuning often ran for over 30 minutes each As part of the downtime reduction initiative guidelines were put into place requiring patching scripts to be tuned so every job within patches submitted ran in under 10 minutes

This mandate did consume some additional labor for script tuning However the team considered this labor a reasonable tradeoff since it affected only direct IT costs while downtime imposed much more significant costs across the company With scripts tuned to run faster the actual patching component of downtime was reduced In addition tuning of standard scripts benefitted customers who had to apply them later

It should be noted that some of the improvement in script processing speed did not come from script tuning but rather from faster hardware During the period of the downtime reduction initiative the servers that run Oraclersquos Global Single Instance (GSI) of E-Business Suite were upgraded from a four-node Real Application Cluster (RAC) running Sun Fire E25Ks to a three-node RAC running Sun SPARC Enterprise M9000s The new M9000 servers provided a significant performance boost compared to the previous E-25K servers The main drivers for this upgrade were the ability to handle increased load from the Sun acquisition and to improve GSI performance in normal operation However as a side benefit the M9000s did indeed process patching scripts much faster

Script tuning and faster hardware combined to dramatically reduce the time taken for the actual patch application steps In 2010 patch applications steps consumed over 50 hours per quarter In 2011 this dropped to 4 hours per quarter

Automating Custom Patching

Like most large enterprise software deployments Oraclersquos own implementation of E-Business Suite contains custom code and application customizations These in turn require custom patches A significant number of EBS patches applied at Oracle were custom Furthermore Oraclersquos internal footprint also

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

includes a number of non-EBS applications such as Seibel Agile and Oracle Application Express (APEX) The manual application process for custom patches to EBS and for any patches to non-EBS applications was both time-consuming and labor intensive

Standard patches to EBS had always been applied using an automated tool called AutoPatch Autopatch applied all bug fixes in a patch managed version checking tracked changes and allowed restart capability But no such capabilities were in place for custom EBS or non-EBS patches which had to be hand-executed or hand-placed by patching personnel into directories Aside from using up resource hours this added a layer of complexity and contributed to errors and quality issues

The team started following the same process to build custom patches as was used for standard patches so that custom patches could be applied using AutoPatch They also developed functionality similar to AutoPatch into a shell script to automate application of custom non-EBS patches These two tools allowed Oracle IT to apply custom patches in much the same way as standard ones

Automation of Patch Management

Oracle IT also improved the change control process that led up to patching Oracle had used a tool called called Automated Release Updates (ARU) for many years to automate the process of defining building packaging and distributing patches For patch management a tool called Common Patch Request Tool (CPRT) had been used prior to the downtime reduction initiative CPRT offered limited functionality to track patches and record manual deployment instructions It included a cumbersome approval process for patches involving manual steps

In addition approvers were previously designated for each of the 200 applications supported by Oracle IT and a patch containing updates to several applications required approvals from designated approvers for each of the included applications This process occurred before patch application and therefore did not contribute directly to downtime However it did consume time and resources reduce accountability and cause delays in rolling out fixes and new functionality to users

To better manage patching related processes Oracle IT built a custom Enterprise Resource Planning (ERP) tool called Patch Approval Submission System (PASS) which simplifies and automates patch tracking approval downtime management and reporting The switch from CPRT to PASS started in November 2010 and was completed in December 2011

Patch type Tools used 2008 2011

EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutopatch EBS standard patch ARU-gtCPRT -gt Autopatch ARU-gtPASS-gtAutopatch Non-EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutomated tool

that mimics Autopatch behavior Non-EBS standard patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtZip file with

instructions for patching team

Table 3 Types of Patches and Patching Tools Used

PASS automatically manages the workflow required to move a request to approval and to patching It allows developers request target environments and patching windows for each ARU patch At every step of the patching process from identification of the issue to approval to actual implementation PASS provides accountability and tracks all pieces of information on who was doing what and when Table 3 shows the tools used to automate the patching processes for EBS and non-EBS patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Create patch request

Submit patch for approval Approval

Patching Team picks up request

Patching Team applies

patch

Requester tests patch

Figure 4 Patching Workflow Steps Tracked in PASS

PASS has also streamlined the approval process Previously each of the 200 supported applications had a designated approver A patch containing files impacting different applications required approval from designees for each of those applications making the process cumbersome With PASS the number of approvers for any given patch is reduced to a handful

This reduction in approvers is possible because of another process change ndash a radical reduction in the number of people authorized to submit patches In the old process developers who requested patches also submitted them In the new PASS process a separate layer of submitters has been designated to ensure the patch quality and performance before submission This adds a layer of accountability and eliminates the need for a large number of approvers This smaller number of submitters is also required to provide more information when submitting patches Table 4 below shows the questions submitters must respond to when entering a patch into PASS

1 Describe the issues being addressed by this patch

2 Identify risks associated with the patch application

3 Indicate the tracking number of the bug being fixed

4 Indicate a target date for patching of the production environment

5 Enter the name of the developer or IT reviewer

6 Identify the files that are changed or impacted

7 Briefly describe the code changes for the files

8 Confirm that the patch has been tested either manually or using PASS

9 Indicate when the patch was last tested in a test environment

10 Note the patch execution times in each previous environment

Table 4 Information Required at Patch Submission

The new ARUPASS process also provides efficient merging of patches so that they can be applied in a single package As part of its Multi Language Support (MLS) EBS supports eleven languages and patches often need to be built for each language By merging these into one package common steps such as maintaining file versions and updating history tables can be performed once rather than multiple times

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Figure 5 Screen shot of Oraclersquos Patch Approval Submission System (PASS)

ARU and PASS are custom tools that Oracle IT continues to use and extend because of their long experience with them PASS has been customized to work well with the internal patch generation and source

control tools that are used by the Oraclersquos Applications Product Development group and its extension the Applications IT group However Oracle customers can use many of the same capabilities in the form of the Application Change Management Pack (ACMP) and the Application Management Pack (AMP) both included in Enterprise Manager ACMP is an end to end change management solution that works with a variety of source control applications and allows for the automated deployment of standard and custom patches across different environments AMP has patch management functionalities similar to PASS It is a system management solution for centralized management of multiple environments and allows for the proactive monitoring of requests and workflows for reporting on history and trends and for automation of repetitive Database Administration (DBA) tasks

Figure 6 Screen Shot of Patch Manager in Application Change Management Pack (ACMP)

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Results

Oracle ITrsquos downtime reduction initiative reduced maintenance downtime by 85 from 1045 hours in Q2 2010 to less than 15 hours in each of three consecutive quarters from Q2 2011 through Q4 2011 This reduction occurred although the number of patches remained essentially the same for most quarters and even increased in Q2 2011 Figure 6 below shows the trend of downtime reduction along with the number of patches applied each quarter

0

200

400

600

800

1000

1200

1400

0

20

40

60

80

100

120

Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Number of p

atchesa

applied

Time

(hours)

ShutdownStartup time Patching time Total time Patches

Figure 7 GSI Downtimes by Quarter

Table 5 below provides the underlying data and some additional detail including the number of planned outages and actual patching events It should be noted that tracking the detail of time consumed in pre- and post-patching steps was initiated as part of the downtime reduction initiative Therefore a breakdown of hours into pre and post is not available for all quarters

Area Q1 2010 Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Patching (hrs) 775 325 20 18 45 4 4 Pre patching and shutdown steps (hrs)

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post times (hrs)

27 23 10 165 95 105 9

Total downtime (hrs) 1725 1045 555 30 345 14 145 13

Table 5 Breakdown of GSI Downtime by Quarter (detail data not available for some quarters)

It is not simple to allocate percentages or hours of downtime reduction to all the factors addressed in Oraclersquos downtime reduction initiative Reductions in factors such as hot patching and the pre- and post-patching steps can be quantified to the minute with no dependencies to cloud the issue As shown in Figure 7 the increase in hot patching contributed to a 66 decrease in patching downtime and the improvements in pre and post patching steps to a 19 reduction Improvements in other factors

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

contributed to another 15 decrease in downtime but are much more interdependent and their benefits harder to accurately allocate For example both script performance tuning and an upgrade of Oraclersquos Global Single Instance to faster hardware reduced the downtime associated with patching script execution However since the hardware upgrade was initiated to improve overall GSI performance and was scheduled independently of the downtime reduction initiative the team could not accurately separate the effects of the hardware upgrade from those of script tuning A pure research organization would have made one of these changes at a time and quantified exactly how many hours of downtime reduction could be specifically be attributed to script tuning vs upgraded hardware Since Oracle ITrsquos primary mission is to support the business sometimes multiple improvements must be made simultaneously despite the inevitable confounds this produces

The exact impacts of process improvements such as increasing patch frequency are similarly difficult to break out Doing so would require putting exact numbers to downtime caused specifically by patches that were rushed in order to make the cutoff and by the reduction in accountability caused by very large bundles not a straightforward calculation

66

19

15

Hot Patching

Pre and Post patching steps

Patch tuning hardware upgrades and other factors

Figure 8 Contribution of Factor Improvements on Downtime Reduction Initiative

Despite the difficulties of exactly allocating a portion of the downtime reduction to each factor it is clear that all of the factors cited in Table 1 contributed substantially to Oraclersquos downtime and improvements in each factor contributed to the overall 85 reduction Table 5 below revisits the factors that contributed to Oraclersquos previously high downtime and recaps the actions taken to improve them

Factor Description

Cold Patching Percentage of patches applied hot increased from less than 1 in 2009 to over 40 in 2011

Pre and post patching Steps

Systems are shut down and started back up in parallel As a result pre and post patching times went down from over 20 hours per quarter in 2010 to less than 10 hours per quarter in 2011

Script performance Patching scripts are now tuned to run in under 10 minutes Database server hardware upgrades also helped speed up script execution times

Patch Frequency Smaller patch sets are applied weekly as opposed to large quarterly bundles This has improved patch quality and brought down instances of follow on patchings required to correct bad patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 7: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

process had to be made more robust This was achieved by introducing automated testing of critical application flows using the Oracle Application Testing Suite (OATS) OATS enables definition and management of the application testing process and validates application functionality

Figure 3 Number of Patches Applied by Week 2010-2011

Improving Patching Script Performance

Downtime also resulted from the poor performance of patch application scripts which in the absence of official guidelines on tuning often ran for over 30 minutes each As part of the downtime reduction initiative guidelines were put into place requiring patching scripts to be tuned so every job within patches submitted ran in under 10 minutes

This mandate did consume some additional labor for script tuning However the team considered this labor a reasonable tradeoff since it affected only direct IT costs while downtime imposed much more significant costs across the company With scripts tuned to run faster the actual patching component of downtime was reduced In addition tuning of standard scripts benefitted customers who had to apply them later

It should be noted that some of the improvement in script processing speed did not come from script tuning but rather from faster hardware During the period of the downtime reduction initiative the servers that run Oraclersquos Global Single Instance (GSI) of E-Business Suite were upgraded from a four-node Real Application Cluster (RAC) running Sun Fire E25Ks to a three-node RAC running Sun SPARC Enterprise M9000s The new M9000 servers provided a significant performance boost compared to the previous E-25K servers The main drivers for this upgrade were the ability to handle increased load from the Sun acquisition and to improve GSI performance in normal operation However as a side benefit the M9000s did indeed process patching scripts much faster

Script tuning and faster hardware combined to dramatically reduce the time taken for the actual patch application steps In 2010 patch applications steps consumed over 50 hours per quarter In 2011 this dropped to 4 hours per quarter

Automating Custom Patching

Like most large enterprise software deployments Oraclersquos own implementation of E-Business Suite contains custom code and application customizations These in turn require custom patches A significant number of EBS patches applied at Oracle were custom Furthermore Oraclersquos internal footprint also

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

includes a number of non-EBS applications such as Seibel Agile and Oracle Application Express (APEX) The manual application process for custom patches to EBS and for any patches to non-EBS applications was both time-consuming and labor intensive

Standard patches to EBS had always been applied using an automated tool called AutoPatch Autopatch applied all bug fixes in a patch managed version checking tracked changes and allowed restart capability But no such capabilities were in place for custom EBS or non-EBS patches which had to be hand-executed or hand-placed by patching personnel into directories Aside from using up resource hours this added a layer of complexity and contributed to errors and quality issues

The team started following the same process to build custom patches as was used for standard patches so that custom patches could be applied using AutoPatch They also developed functionality similar to AutoPatch into a shell script to automate application of custom non-EBS patches These two tools allowed Oracle IT to apply custom patches in much the same way as standard ones

Automation of Patch Management

Oracle IT also improved the change control process that led up to patching Oracle had used a tool called called Automated Release Updates (ARU) for many years to automate the process of defining building packaging and distributing patches For patch management a tool called Common Patch Request Tool (CPRT) had been used prior to the downtime reduction initiative CPRT offered limited functionality to track patches and record manual deployment instructions It included a cumbersome approval process for patches involving manual steps

In addition approvers were previously designated for each of the 200 applications supported by Oracle IT and a patch containing updates to several applications required approvals from designated approvers for each of the included applications This process occurred before patch application and therefore did not contribute directly to downtime However it did consume time and resources reduce accountability and cause delays in rolling out fixes and new functionality to users

To better manage patching related processes Oracle IT built a custom Enterprise Resource Planning (ERP) tool called Patch Approval Submission System (PASS) which simplifies and automates patch tracking approval downtime management and reporting The switch from CPRT to PASS started in November 2010 and was completed in December 2011

Patch type Tools used 2008 2011

EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutopatch EBS standard patch ARU-gtCPRT -gt Autopatch ARU-gtPASS-gtAutopatch Non-EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutomated tool

that mimics Autopatch behavior Non-EBS standard patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtZip file with

instructions for patching team

Table 3 Types of Patches and Patching Tools Used

PASS automatically manages the workflow required to move a request to approval and to patching It allows developers request target environments and patching windows for each ARU patch At every step of the patching process from identification of the issue to approval to actual implementation PASS provides accountability and tracks all pieces of information on who was doing what and when Table 3 shows the tools used to automate the patching processes for EBS and non-EBS patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Create patch request

Submit patch for approval Approval

Patching Team picks up request

Patching Team applies

patch

Requester tests patch

Figure 4 Patching Workflow Steps Tracked in PASS

PASS has also streamlined the approval process Previously each of the 200 supported applications had a designated approver A patch containing files impacting different applications required approval from designees for each of those applications making the process cumbersome With PASS the number of approvers for any given patch is reduced to a handful

This reduction in approvers is possible because of another process change ndash a radical reduction in the number of people authorized to submit patches In the old process developers who requested patches also submitted them In the new PASS process a separate layer of submitters has been designated to ensure the patch quality and performance before submission This adds a layer of accountability and eliminates the need for a large number of approvers This smaller number of submitters is also required to provide more information when submitting patches Table 4 below shows the questions submitters must respond to when entering a patch into PASS

1 Describe the issues being addressed by this patch

2 Identify risks associated with the patch application

3 Indicate the tracking number of the bug being fixed

4 Indicate a target date for patching of the production environment

5 Enter the name of the developer or IT reviewer

6 Identify the files that are changed or impacted

7 Briefly describe the code changes for the files

8 Confirm that the patch has been tested either manually or using PASS

9 Indicate when the patch was last tested in a test environment

10 Note the patch execution times in each previous environment

Table 4 Information Required at Patch Submission

The new ARUPASS process also provides efficient merging of patches so that they can be applied in a single package As part of its Multi Language Support (MLS) EBS supports eleven languages and patches often need to be built for each language By merging these into one package common steps such as maintaining file versions and updating history tables can be performed once rather than multiple times

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Figure 5 Screen shot of Oraclersquos Patch Approval Submission System (PASS)

ARU and PASS are custom tools that Oracle IT continues to use and extend because of their long experience with them PASS has been customized to work well with the internal patch generation and source

control tools that are used by the Oraclersquos Applications Product Development group and its extension the Applications IT group However Oracle customers can use many of the same capabilities in the form of the Application Change Management Pack (ACMP) and the Application Management Pack (AMP) both included in Enterprise Manager ACMP is an end to end change management solution that works with a variety of source control applications and allows for the automated deployment of standard and custom patches across different environments AMP has patch management functionalities similar to PASS It is a system management solution for centralized management of multiple environments and allows for the proactive monitoring of requests and workflows for reporting on history and trends and for automation of repetitive Database Administration (DBA) tasks

Figure 6 Screen Shot of Patch Manager in Application Change Management Pack (ACMP)

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Results

Oracle ITrsquos downtime reduction initiative reduced maintenance downtime by 85 from 1045 hours in Q2 2010 to less than 15 hours in each of three consecutive quarters from Q2 2011 through Q4 2011 This reduction occurred although the number of patches remained essentially the same for most quarters and even increased in Q2 2011 Figure 6 below shows the trend of downtime reduction along with the number of patches applied each quarter

0

200

400

600

800

1000

1200

1400

0

20

40

60

80

100

120

Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Number of p

atchesa

applied

Time

(hours)

ShutdownStartup time Patching time Total time Patches

Figure 7 GSI Downtimes by Quarter

Table 5 below provides the underlying data and some additional detail including the number of planned outages and actual patching events It should be noted that tracking the detail of time consumed in pre- and post-patching steps was initiated as part of the downtime reduction initiative Therefore a breakdown of hours into pre and post is not available for all quarters

Area Q1 2010 Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Patching (hrs) 775 325 20 18 45 4 4 Pre patching and shutdown steps (hrs)

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post times (hrs)

27 23 10 165 95 105 9

Total downtime (hrs) 1725 1045 555 30 345 14 145 13

Table 5 Breakdown of GSI Downtime by Quarter (detail data not available for some quarters)

It is not simple to allocate percentages or hours of downtime reduction to all the factors addressed in Oraclersquos downtime reduction initiative Reductions in factors such as hot patching and the pre- and post-patching steps can be quantified to the minute with no dependencies to cloud the issue As shown in Figure 7 the increase in hot patching contributed to a 66 decrease in patching downtime and the improvements in pre and post patching steps to a 19 reduction Improvements in other factors

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

contributed to another 15 decrease in downtime but are much more interdependent and their benefits harder to accurately allocate For example both script performance tuning and an upgrade of Oraclersquos Global Single Instance to faster hardware reduced the downtime associated with patching script execution However since the hardware upgrade was initiated to improve overall GSI performance and was scheduled independently of the downtime reduction initiative the team could not accurately separate the effects of the hardware upgrade from those of script tuning A pure research organization would have made one of these changes at a time and quantified exactly how many hours of downtime reduction could be specifically be attributed to script tuning vs upgraded hardware Since Oracle ITrsquos primary mission is to support the business sometimes multiple improvements must be made simultaneously despite the inevitable confounds this produces

The exact impacts of process improvements such as increasing patch frequency are similarly difficult to break out Doing so would require putting exact numbers to downtime caused specifically by patches that were rushed in order to make the cutoff and by the reduction in accountability caused by very large bundles not a straightforward calculation

66

19

15

Hot Patching

Pre and Post patching steps

Patch tuning hardware upgrades and other factors

Figure 8 Contribution of Factor Improvements on Downtime Reduction Initiative

Despite the difficulties of exactly allocating a portion of the downtime reduction to each factor it is clear that all of the factors cited in Table 1 contributed substantially to Oraclersquos downtime and improvements in each factor contributed to the overall 85 reduction Table 5 below revisits the factors that contributed to Oraclersquos previously high downtime and recaps the actions taken to improve them

Factor Description

Cold Patching Percentage of patches applied hot increased from less than 1 in 2009 to over 40 in 2011

Pre and post patching Steps

Systems are shut down and started back up in parallel As a result pre and post patching times went down from over 20 hours per quarter in 2010 to less than 10 hours per quarter in 2011

Script performance Patching scripts are now tuned to run in under 10 minutes Database server hardware upgrades also helped speed up script execution times

Patch Frequency Smaller patch sets are applied weekly as opposed to large quarterly bundles This has improved patch quality and brought down instances of follow on patchings required to correct bad patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 8: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

includes a number of non-EBS applications such as Seibel Agile and Oracle Application Express (APEX) The manual application process for custom patches to EBS and for any patches to non-EBS applications was both time-consuming and labor intensive

Standard patches to EBS had always been applied using an automated tool called AutoPatch Autopatch applied all bug fixes in a patch managed version checking tracked changes and allowed restart capability But no such capabilities were in place for custom EBS or non-EBS patches which had to be hand-executed or hand-placed by patching personnel into directories Aside from using up resource hours this added a layer of complexity and contributed to errors and quality issues

The team started following the same process to build custom patches as was used for standard patches so that custom patches could be applied using AutoPatch They also developed functionality similar to AutoPatch into a shell script to automate application of custom non-EBS patches These two tools allowed Oracle IT to apply custom patches in much the same way as standard ones

Automation of Patch Management

Oracle IT also improved the change control process that led up to patching Oracle had used a tool called called Automated Release Updates (ARU) for many years to automate the process of defining building packaging and distributing patches For patch management a tool called Common Patch Request Tool (CPRT) had been used prior to the downtime reduction initiative CPRT offered limited functionality to track patches and record manual deployment instructions It included a cumbersome approval process for patches involving manual steps

In addition approvers were previously designated for each of the 200 applications supported by Oracle IT and a patch containing updates to several applications required approvals from designated approvers for each of the included applications This process occurred before patch application and therefore did not contribute directly to downtime However it did consume time and resources reduce accountability and cause delays in rolling out fixes and new functionality to users

To better manage patching related processes Oracle IT built a custom Enterprise Resource Planning (ERP) tool called Patch Approval Submission System (PASS) which simplifies and automates patch tracking approval downtime management and reporting The switch from CPRT to PASS started in November 2010 and was completed in December 2011

Patch type Tools used 2008 2011

EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutopatch EBS standard patch ARU-gtCPRT -gt Autopatch ARU-gtPASS-gtAutopatch Non-EBS custom patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtAutomated tool

that mimics Autopatch behavior Non-EBS standard patch ARU-gtCPRT-gtManual patching ARU-gtPASS-gtZip file with

instructions for patching team

Table 3 Types of Patches and Patching Tools Used

PASS automatically manages the workflow required to move a request to approval and to patching It allows developers request target environments and patching windows for each ARU patch At every step of the patching process from identification of the issue to approval to actual implementation PASS provides accountability and tracks all pieces of information on who was doing what and when Table 3 shows the tools used to automate the patching processes for EBS and non-EBS patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Create patch request

Submit patch for approval Approval

Patching Team picks up request

Patching Team applies

patch

Requester tests patch

Figure 4 Patching Workflow Steps Tracked in PASS

PASS has also streamlined the approval process Previously each of the 200 supported applications had a designated approver A patch containing files impacting different applications required approval from designees for each of those applications making the process cumbersome With PASS the number of approvers for any given patch is reduced to a handful

This reduction in approvers is possible because of another process change ndash a radical reduction in the number of people authorized to submit patches In the old process developers who requested patches also submitted them In the new PASS process a separate layer of submitters has been designated to ensure the patch quality and performance before submission This adds a layer of accountability and eliminates the need for a large number of approvers This smaller number of submitters is also required to provide more information when submitting patches Table 4 below shows the questions submitters must respond to when entering a patch into PASS

1 Describe the issues being addressed by this patch

2 Identify risks associated with the patch application

3 Indicate the tracking number of the bug being fixed

4 Indicate a target date for patching of the production environment

5 Enter the name of the developer or IT reviewer

6 Identify the files that are changed or impacted

7 Briefly describe the code changes for the files

8 Confirm that the patch has been tested either manually or using PASS

9 Indicate when the patch was last tested in a test environment

10 Note the patch execution times in each previous environment

Table 4 Information Required at Patch Submission

The new ARUPASS process also provides efficient merging of patches so that they can be applied in a single package As part of its Multi Language Support (MLS) EBS supports eleven languages and patches often need to be built for each language By merging these into one package common steps such as maintaining file versions and updating history tables can be performed once rather than multiple times

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Figure 5 Screen shot of Oraclersquos Patch Approval Submission System (PASS)

ARU and PASS are custom tools that Oracle IT continues to use and extend because of their long experience with them PASS has been customized to work well with the internal patch generation and source

control tools that are used by the Oraclersquos Applications Product Development group and its extension the Applications IT group However Oracle customers can use many of the same capabilities in the form of the Application Change Management Pack (ACMP) and the Application Management Pack (AMP) both included in Enterprise Manager ACMP is an end to end change management solution that works with a variety of source control applications and allows for the automated deployment of standard and custom patches across different environments AMP has patch management functionalities similar to PASS It is a system management solution for centralized management of multiple environments and allows for the proactive monitoring of requests and workflows for reporting on history and trends and for automation of repetitive Database Administration (DBA) tasks

Figure 6 Screen Shot of Patch Manager in Application Change Management Pack (ACMP)

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Results

Oracle ITrsquos downtime reduction initiative reduced maintenance downtime by 85 from 1045 hours in Q2 2010 to less than 15 hours in each of three consecutive quarters from Q2 2011 through Q4 2011 This reduction occurred although the number of patches remained essentially the same for most quarters and even increased in Q2 2011 Figure 6 below shows the trend of downtime reduction along with the number of patches applied each quarter

0

200

400

600

800

1000

1200

1400

0

20

40

60

80

100

120

Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Number of p

atchesa

applied

Time

(hours)

ShutdownStartup time Patching time Total time Patches

Figure 7 GSI Downtimes by Quarter

Table 5 below provides the underlying data and some additional detail including the number of planned outages and actual patching events It should be noted that tracking the detail of time consumed in pre- and post-patching steps was initiated as part of the downtime reduction initiative Therefore a breakdown of hours into pre and post is not available for all quarters

Area Q1 2010 Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Patching (hrs) 775 325 20 18 45 4 4 Pre patching and shutdown steps (hrs)

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post times (hrs)

27 23 10 165 95 105 9

Total downtime (hrs) 1725 1045 555 30 345 14 145 13

Table 5 Breakdown of GSI Downtime by Quarter (detail data not available for some quarters)

It is not simple to allocate percentages or hours of downtime reduction to all the factors addressed in Oraclersquos downtime reduction initiative Reductions in factors such as hot patching and the pre- and post-patching steps can be quantified to the minute with no dependencies to cloud the issue As shown in Figure 7 the increase in hot patching contributed to a 66 decrease in patching downtime and the improvements in pre and post patching steps to a 19 reduction Improvements in other factors

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

contributed to another 15 decrease in downtime but are much more interdependent and their benefits harder to accurately allocate For example both script performance tuning and an upgrade of Oraclersquos Global Single Instance to faster hardware reduced the downtime associated with patching script execution However since the hardware upgrade was initiated to improve overall GSI performance and was scheduled independently of the downtime reduction initiative the team could not accurately separate the effects of the hardware upgrade from those of script tuning A pure research organization would have made one of these changes at a time and quantified exactly how many hours of downtime reduction could be specifically be attributed to script tuning vs upgraded hardware Since Oracle ITrsquos primary mission is to support the business sometimes multiple improvements must be made simultaneously despite the inevitable confounds this produces

The exact impacts of process improvements such as increasing patch frequency are similarly difficult to break out Doing so would require putting exact numbers to downtime caused specifically by patches that were rushed in order to make the cutoff and by the reduction in accountability caused by very large bundles not a straightforward calculation

66

19

15

Hot Patching

Pre and Post patching steps

Patch tuning hardware upgrades and other factors

Figure 8 Contribution of Factor Improvements on Downtime Reduction Initiative

Despite the difficulties of exactly allocating a portion of the downtime reduction to each factor it is clear that all of the factors cited in Table 1 contributed substantially to Oraclersquos downtime and improvements in each factor contributed to the overall 85 reduction Table 5 below revisits the factors that contributed to Oraclersquos previously high downtime and recaps the actions taken to improve them

Factor Description

Cold Patching Percentage of patches applied hot increased from less than 1 in 2009 to over 40 in 2011

Pre and post patching Steps

Systems are shut down and started back up in parallel As a result pre and post patching times went down from over 20 hours per quarter in 2010 to less than 10 hours per quarter in 2011

Script performance Patching scripts are now tuned to run in under 10 minutes Database server hardware upgrades also helped speed up script execution times

Patch Frequency Smaller patch sets are applied weekly as opposed to large quarterly bundles This has improved patch quality and brought down instances of follow on patchings required to correct bad patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 9: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Create patch request

Submit patch for approval Approval

Patching Team picks up request

Patching Team applies

patch

Requester tests patch

Figure 4 Patching Workflow Steps Tracked in PASS

PASS has also streamlined the approval process Previously each of the 200 supported applications had a designated approver A patch containing files impacting different applications required approval from designees for each of those applications making the process cumbersome With PASS the number of approvers for any given patch is reduced to a handful

This reduction in approvers is possible because of another process change ndash a radical reduction in the number of people authorized to submit patches In the old process developers who requested patches also submitted them In the new PASS process a separate layer of submitters has been designated to ensure the patch quality and performance before submission This adds a layer of accountability and eliminates the need for a large number of approvers This smaller number of submitters is also required to provide more information when submitting patches Table 4 below shows the questions submitters must respond to when entering a patch into PASS

1 Describe the issues being addressed by this patch

2 Identify risks associated with the patch application

3 Indicate the tracking number of the bug being fixed

4 Indicate a target date for patching of the production environment

5 Enter the name of the developer or IT reviewer

6 Identify the files that are changed or impacted

7 Briefly describe the code changes for the files

8 Confirm that the patch has been tested either manually or using PASS

9 Indicate when the patch was last tested in a test environment

10 Note the patch execution times in each previous environment

Table 4 Information Required at Patch Submission

The new ARUPASS process also provides efficient merging of patches so that they can be applied in a single package As part of its Multi Language Support (MLS) EBS supports eleven languages and patches often need to be built for each language By merging these into one package common steps such as maintaining file versions and updating history tables can be performed once rather than multiple times

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Figure 5 Screen shot of Oraclersquos Patch Approval Submission System (PASS)

ARU and PASS are custom tools that Oracle IT continues to use and extend because of their long experience with them PASS has been customized to work well with the internal patch generation and source

control tools that are used by the Oraclersquos Applications Product Development group and its extension the Applications IT group However Oracle customers can use many of the same capabilities in the form of the Application Change Management Pack (ACMP) and the Application Management Pack (AMP) both included in Enterprise Manager ACMP is an end to end change management solution that works with a variety of source control applications and allows for the automated deployment of standard and custom patches across different environments AMP has patch management functionalities similar to PASS It is a system management solution for centralized management of multiple environments and allows for the proactive monitoring of requests and workflows for reporting on history and trends and for automation of repetitive Database Administration (DBA) tasks

Figure 6 Screen Shot of Patch Manager in Application Change Management Pack (ACMP)

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Results

Oracle ITrsquos downtime reduction initiative reduced maintenance downtime by 85 from 1045 hours in Q2 2010 to less than 15 hours in each of three consecutive quarters from Q2 2011 through Q4 2011 This reduction occurred although the number of patches remained essentially the same for most quarters and even increased in Q2 2011 Figure 6 below shows the trend of downtime reduction along with the number of patches applied each quarter

0

200

400

600

800

1000

1200

1400

0

20

40

60

80

100

120

Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Number of p

atchesa

applied

Time

(hours)

ShutdownStartup time Patching time Total time Patches

Figure 7 GSI Downtimes by Quarter

Table 5 below provides the underlying data and some additional detail including the number of planned outages and actual patching events It should be noted that tracking the detail of time consumed in pre- and post-patching steps was initiated as part of the downtime reduction initiative Therefore a breakdown of hours into pre and post is not available for all quarters

Area Q1 2010 Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Patching (hrs) 775 325 20 18 45 4 4 Pre patching and shutdown steps (hrs)

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post times (hrs)

27 23 10 165 95 105 9

Total downtime (hrs) 1725 1045 555 30 345 14 145 13

Table 5 Breakdown of GSI Downtime by Quarter (detail data not available for some quarters)

It is not simple to allocate percentages or hours of downtime reduction to all the factors addressed in Oraclersquos downtime reduction initiative Reductions in factors such as hot patching and the pre- and post-patching steps can be quantified to the minute with no dependencies to cloud the issue As shown in Figure 7 the increase in hot patching contributed to a 66 decrease in patching downtime and the improvements in pre and post patching steps to a 19 reduction Improvements in other factors

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

contributed to another 15 decrease in downtime but are much more interdependent and their benefits harder to accurately allocate For example both script performance tuning and an upgrade of Oraclersquos Global Single Instance to faster hardware reduced the downtime associated with patching script execution However since the hardware upgrade was initiated to improve overall GSI performance and was scheduled independently of the downtime reduction initiative the team could not accurately separate the effects of the hardware upgrade from those of script tuning A pure research organization would have made one of these changes at a time and quantified exactly how many hours of downtime reduction could be specifically be attributed to script tuning vs upgraded hardware Since Oracle ITrsquos primary mission is to support the business sometimes multiple improvements must be made simultaneously despite the inevitable confounds this produces

The exact impacts of process improvements such as increasing patch frequency are similarly difficult to break out Doing so would require putting exact numbers to downtime caused specifically by patches that were rushed in order to make the cutoff and by the reduction in accountability caused by very large bundles not a straightforward calculation

66

19

15

Hot Patching

Pre and Post patching steps

Patch tuning hardware upgrades and other factors

Figure 8 Contribution of Factor Improvements on Downtime Reduction Initiative

Despite the difficulties of exactly allocating a portion of the downtime reduction to each factor it is clear that all of the factors cited in Table 1 contributed substantially to Oraclersquos downtime and improvements in each factor contributed to the overall 85 reduction Table 5 below revisits the factors that contributed to Oraclersquos previously high downtime and recaps the actions taken to improve them

Factor Description

Cold Patching Percentage of patches applied hot increased from less than 1 in 2009 to over 40 in 2011

Pre and post patching Steps

Systems are shut down and started back up in parallel As a result pre and post patching times went down from over 20 hours per quarter in 2010 to less than 10 hours per quarter in 2011

Script performance Patching scripts are now tuned to run in under 10 minutes Database server hardware upgrades also helped speed up script execution times

Patch Frequency Smaller patch sets are applied weekly as opposed to large quarterly bundles This has improved patch quality and brought down instances of follow on patchings required to correct bad patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 10: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Figure 5 Screen shot of Oraclersquos Patch Approval Submission System (PASS)

ARU and PASS are custom tools that Oracle IT continues to use and extend because of their long experience with them PASS has been customized to work well with the internal patch generation and source

control tools that are used by the Oraclersquos Applications Product Development group and its extension the Applications IT group However Oracle customers can use many of the same capabilities in the form of the Application Change Management Pack (ACMP) and the Application Management Pack (AMP) both included in Enterprise Manager ACMP is an end to end change management solution that works with a variety of source control applications and allows for the automated deployment of standard and custom patches across different environments AMP has patch management functionalities similar to PASS It is a system management solution for centralized management of multiple environments and allows for the proactive monitoring of requests and workflows for reporting on history and trends and for automation of repetitive Database Administration (DBA) tasks

Figure 6 Screen Shot of Patch Manager in Application Change Management Pack (ACMP)

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Results

Oracle ITrsquos downtime reduction initiative reduced maintenance downtime by 85 from 1045 hours in Q2 2010 to less than 15 hours in each of three consecutive quarters from Q2 2011 through Q4 2011 This reduction occurred although the number of patches remained essentially the same for most quarters and even increased in Q2 2011 Figure 6 below shows the trend of downtime reduction along with the number of patches applied each quarter

0

200

400

600

800

1000

1200

1400

0

20

40

60

80

100

120

Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Number of p

atchesa

applied

Time

(hours)

ShutdownStartup time Patching time Total time Patches

Figure 7 GSI Downtimes by Quarter

Table 5 below provides the underlying data and some additional detail including the number of planned outages and actual patching events It should be noted that tracking the detail of time consumed in pre- and post-patching steps was initiated as part of the downtime reduction initiative Therefore a breakdown of hours into pre and post is not available for all quarters

Area Q1 2010 Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Patching (hrs) 775 325 20 18 45 4 4 Pre patching and shutdown steps (hrs)

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post times (hrs)

27 23 10 165 95 105 9

Total downtime (hrs) 1725 1045 555 30 345 14 145 13

Table 5 Breakdown of GSI Downtime by Quarter (detail data not available for some quarters)

It is not simple to allocate percentages or hours of downtime reduction to all the factors addressed in Oraclersquos downtime reduction initiative Reductions in factors such as hot patching and the pre- and post-patching steps can be quantified to the minute with no dependencies to cloud the issue As shown in Figure 7 the increase in hot patching contributed to a 66 decrease in patching downtime and the improvements in pre and post patching steps to a 19 reduction Improvements in other factors

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

contributed to another 15 decrease in downtime but are much more interdependent and their benefits harder to accurately allocate For example both script performance tuning and an upgrade of Oraclersquos Global Single Instance to faster hardware reduced the downtime associated with patching script execution However since the hardware upgrade was initiated to improve overall GSI performance and was scheduled independently of the downtime reduction initiative the team could not accurately separate the effects of the hardware upgrade from those of script tuning A pure research organization would have made one of these changes at a time and quantified exactly how many hours of downtime reduction could be specifically be attributed to script tuning vs upgraded hardware Since Oracle ITrsquos primary mission is to support the business sometimes multiple improvements must be made simultaneously despite the inevitable confounds this produces

The exact impacts of process improvements such as increasing patch frequency are similarly difficult to break out Doing so would require putting exact numbers to downtime caused specifically by patches that were rushed in order to make the cutoff and by the reduction in accountability caused by very large bundles not a straightforward calculation

66

19

15

Hot Patching

Pre and Post patching steps

Patch tuning hardware upgrades and other factors

Figure 8 Contribution of Factor Improvements on Downtime Reduction Initiative

Despite the difficulties of exactly allocating a portion of the downtime reduction to each factor it is clear that all of the factors cited in Table 1 contributed substantially to Oraclersquos downtime and improvements in each factor contributed to the overall 85 reduction Table 5 below revisits the factors that contributed to Oraclersquos previously high downtime and recaps the actions taken to improve them

Factor Description

Cold Patching Percentage of patches applied hot increased from less than 1 in 2009 to over 40 in 2011

Pre and post patching Steps

Systems are shut down and started back up in parallel As a result pre and post patching times went down from over 20 hours per quarter in 2010 to less than 10 hours per quarter in 2011

Script performance Patching scripts are now tuned to run in under 10 minutes Database server hardware upgrades also helped speed up script execution times

Patch Frequency Smaller patch sets are applied weekly as opposed to large quarterly bundles This has improved patch quality and brought down instances of follow on patchings required to correct bad patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 11: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Results

Oracle ITrsquos downtime reduction initiative reduced maintenance downtime by 85 from 1045 hours in Q2 2010 to less than 15 hours in each of three consecutive quarters from Q2 2011 through Q4 2011 This reduction occurred although the number of patches remained essentially the same for most quarters and even increased in Q2 2011 Figure 6 below shows the trend of downtime reduction along with the number of patches applied each quarter

0

200

400

600

800

1000

1200

1400

0

20

40

60

80

100

120

Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011

Number of p

atchesa

applied

Time

(hours)

ShutdownStartup time Patching time Total time Patches

Figure 7 GSI Downtimes by Quarter

Table 5 below provides the underlying data and some additional detail including the number of planned outages and actual patching events It should be noted that tracking the detail of time consumed in pre- and post-patching steps was initiated as part of the downtime reduction initiative Therefore a breakdown of hours into pre and post is not available for all quarters

Area Q1 2010 Q2 2010 Q3 2010 Q4 2010 Q1 2011 Q2 2011 Q3 2011 Q4 2011 Patching (hrs) 775 325 20 18 45 4 4 Pre patching and shutdown steps (hrs)

4 3 3 2

Post patching and startup steps (hrs)

125 65 75 7

Combined pre and post times (hrs)

27 23 10 165 95 105 9

Total downtime (hrs) 1725 1045 555 30 345 14 145 13

Table 5 Breakdown of GSI Downtime by Quarter (detail data not available for some quarters)

It is not simple to allocate percentages or hours of downtime reduction to all the factors addressed in Oraclersquos downtime reduction initiative Reductions in factors such as hot patching and the pre- and post-patching steps can be quantified to the minute with no dependencies to cloud the issue As shown in Figure 7 the increase in hot patching contributed to a 66 decrease in patching downtime and the improvements in pre and post patching steps to a 19 reduction Improvements in other factors

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

contributed to another 15 decrease in downtime but are much more interdependent and their benefits harder to accurately allocate For example both script performance tuning and an upgrade of Oraclersquos Global Single Instance to faster hardware reduced the downtime associated with patching script execution However since the hardware upgrade was initiated to improve overall GSI performance and was scheduled independently of the downtime reduction initiative the team could not accurately separate the effects of the hardware upgrade from those of script tuning A pure research organization would have made one of these changes at a time and quantified exactly how many hours of downtime reduction could be specifically be attributed to script tuning vs upgraded hardware Since Oracle ITrsquos primary mission is to support the business sometimes multiple improvements must be made simultaneously despite the inevitable confounds this produces

The exact impacts of process improvements such as increasing patch frequency are similarly difficult to break out Doing so would require putting exact numbers to downtime caused specifically by patches that were rushed in order to make the cutoff and by the reduction in accountability caused by very large bundles not a straightforward calculation

66

19

15

Hot Patching

Pre and Post patching steps

Patch tuning hardware upgrades and other factors

Figure 8 Contribution of Factor Improvements on Downtime Reduction Initiative

Despite the difficulties of exactly allocating a portion of the downtime reduction to each factor it is clear that all of the factors cited in Table 1 contributed substantially to Oraclersquos downtime and improvements in each factor contributed to the overall 85 reduction Table 5 below revisits the factors that contributed to Oraclersquos previously high downtime and recaps the actions taken to improve them

Factor Description

Cold Patching Percentage of patches applied hot increased from less than 1 in 2009 to over 40 in 2011

Pre and post patching Steps

Systems are shut down and started back up in parallel As a result pre and post patching times went down from over 20 hours per quarter in 2010 to less than 10 hours per quarter in 2011

Script performance Patching scripts are now tuned to run in under 10 minutes Database server hardware upgrades also helped speed up script execution times

Patch Frequency Smaller patch sets are applied weekly as opposed to large quarterly bundles This has improved patch quality and brought down instances of follow on patchings required to correct bad patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 12: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

contributed to another 15 decrease in downtime but are much more interdependent and their benefits harder to accurately allocate For example both script performance tuning and an upgrade of Oraclersquos Global Single Instance to faster hardware reduced the downtime associated with patching script execution However since the hardware upgrade was initiated to improve overall GSI performance and was scheduled independently of the downtime reduction initiative the team could not accurately separate the effects of the hardware upgrade from those of script tuning A pure research organization would have made one of these changes at a time and quantified exactly how many hours of downtime reduction could be specifically be attributed to script tuning vs upgraded hardware Since Oracle ITrsquos primary mission is to support the business sometimes multiple improvements must be made simultaneously despite the inevitable confounds this produces

The exact impacts of process improvements such as increasing patch frequency are similarly difficult to break out Doing so would require putting exact numbers to downtime caused specifically by patches that were rushed in order to make the cutoff and by the reduction in accountability caused by very large bundles not a straightforward calculation

66

19

15

Hot Patching

Pre and Post patching steps

Patch tuning hardware upgrades and other factors

Figure 8 Contribution of Factor Improvements on Downtime Reduction Initiative

Despite the difficulties of exactly allocating a portion of the downtime reduction to each factor it is clear that all of the factors cited in Table 1 contributed substantially to Oraclersquos downtime and improvements in each factor contributed to the overall 85 reduction Table 5 below revisits the factors that contributed to Oraclersquos previously high downtime and recaps the actions taken to improve them

Factor Description

Cold Patching Percentage of patches applied hot increased from less than 1 in 2009 to over 40 in 2011

Pre and post patching Steps

Systems are shut down and started back up in parallel As a result pre and post patching times went down from over 20 hours per quarter in 2010 to less than 10 hours per quarter in 2011

Script performance Patching scripts are now tuned to run in under 10 minutes Database server hardware upgrades also helped speed up script execution times

Patch Frequency Smaller patch sets are applied weekly as opposed to large quarterly bundles This has improved patch quality and brought down instances of follow on patchings required to correct bad patches

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 13: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Custom Patches Custom patching EBS and non-EBS has been automated to reduce resource requirements and inefficiencies

Patch Management PASS provides improved patch management from initial request to approval to patching ARU and PASS have allowed efficient merging of patches and more accountability in the patch approval process

Table 6 Actions and Results of Oracle Downtime Reduction Initiative by Downtime Component Cause

Future Product Enhancements to Further Reduce Downtime

In addition to the practices and tools described in this paper help is also on the way from Product Development that will extend the concept of hot patching substantially EBS 122 to be released in the near future is expected to reduce downtime further by performing most patching activities while the system remains online and available to users So for example a user will be able to continue entering an expense report while the Payables module is being patched

Online patching will be achieved through an Editioning feature that creates a patchable copy of the production system applies patches to that copy and then switches users to it Patches will be applied to a secondary file system and a separate copy of all database code objects affected by the patches will be maintained Once patches have been successfully applied users will be moved over to the patched editions of the file system and the database Patching downtime will result solely from restarting the middle tier services and is expected to be measured in minutes rather than hours Oracle IT will report on its results from these new capabilities once they are adopted

Conclusion

The improvements made by Oracle to its patching processes reduced quarterly system maintenance downtimes by 85 - from over 100 hours during the first quarter of 2010 to less than 15 hours in the last quarter of 2011 In addition these improvements enabled Oracle to perform an upgrade of its Global Single Instance of E-Business Suite from 1211 to 1213 with only 9 hours of downtime

It is our recommendation that Oracle customers with sizable deployments and a need to reduce scheduled downtime consider adopting the process changes and solution patterns that enabled Oracle IT to achieve these results In addition Oracle IT recommends that customers begin evaluating the downtime reduction capabilities planned for E-Business Suite release 122

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611

Page 14: Reducing Maintenance Downtime by 85%

Reducing Maintenance Downtime by 85 Oraclersquos Internal Patch Automation and Process Improvements in a Heterogeneous Enterprise Application Deployment Including E-Business Suite

Reducing Maintenance Downtime by 85

Oraclersquos Internal Patch Automation and Process

Improvements in a Heterogeneous Enterprise

Application Deployment Including E-Business

Suite

May 2012

Authors Kishan Agrawal Operation Excellence

Manager Vinay Dwivedi Principal Product

Manager Jeffrey Pease Vice President Dave

Stephens Group Vice President

Oracle Corporation

World Headquarters

500 Oracle Parkway

Redwood Shores CA 94065

USA

Worldwide Inquiries

Phone +16505067000

Fax +16505067200

oraclecom

Copyright copy 2012 Oracle andor its affiliates All rights reserved This document is provided for information purposes only and the

contents hereof are subject to change without notice This document is not warranted to be error-free nor subject to any other

warranties or conditions whether expressed orally or implied in law including implied warranties and conditions of merchantability or

fitness for a particular purpose We specifically disclaim any liability with respect to this document and no contractual obligations are

formed either directly or indirectly by this document This document may not be reproduced or transmitted in any form or by any

means electronic or mechanical for any purpose without our prior written permission

Oracle and Java are registered trademarks of Oracle andor its affiliates Other names may be trademarks of their respective owners

Intel and Intel Xeon are trademarks or registered trademarks of Intel Corporation All SPARC trademarks are used under license and

are trademarks or registered trademarks of SPARC International Inc AMD Opteron the AMD logo and the AMD Opteron logo are

trademarks or registered trademarks of Advanced Micro Devices UNIX is a registered trademark licensed through XOpen

Company Ltd 0611