The Essential DBA Playbook for Optimized SQL The Essential DBA Playbook for Optimized SQL Server...

23
Assess your SQL Server environment, establish effective backup and recovery, and maintain SQL Server management optimization THE ESSENTIAL DBA PLAYBOOK FOR OPTIMIZED SQL SERVER MANAGEMENT

Transcript of The Essential DBA Playbook for Optimized SQL The Essential DBA Playbook for Optimized SQL Server...

Assess your SQL Server environment, establish effective backup and recovery, and maintain SQL Server management optimization

THE ESSENTIAL DBA PLAYBOOK FOR OPTIMIZED SQL SERVER MANAGEMENT

2

Table of Contents

Chapter One: Assessing Your SQL Server Environment 4

Chapter Two: Establishing Effective Backup and Recovery 11

Chapter Three: Ongoing SQL Server Management Optimization 17

3

AbstractWhether you’re taking a position as a DBA at a new company or you’ve been with your current company for years, you may be facing the same challenge: not having the optimal hardware, software, workflow or culture you need to run a smooth environment. In fact, it’s often quite the opposite: the department may have poor security practices, nonexistent disaster recovery plans, poor documentation and substandard hardware or network configuration. Then there are the cultural or organizational challenges: poor management, under-resourced departments or poorly thought-out processes.

While it would be impractical to formulate a strategy for every possible situation, you can put in place a handful of strategies to overcome panic and methodically focus on issues within your control. “The Essential DBA Playbook for Optimized SQL Server Management” explains how, in three chapters:

• Chapter 1: Assessing Your SQL Server Environment — The first step is to assess the SQL Server environment. An initial database infrastructure status check is necessary for sustained, optimized database management.

• Chapter 2: Establishing Effective Backup and Recovery — The next step is to establish effective backup and recovery, keeping in mind high availability and disaster recovery best practices.

• Chapter 3: Ongoing SQL Server Management Optimization — Finally, you need to maintain your optimization efforts by communicating effectively, stabilizing the environment and taking advantage of automation.

This e-book covers assessing your SQL Server environment.

4

Chapter One: Assessing Your SQL Server Environment

5

IntroductionToo many SQL Server DBAs are stuck in firefighting mode, forced to focus their attention on the latest problem that pops up, regardless of its true severity. Without a triage framework for sorting database issues, you’ll be stuck in this state of perpetual near-panic — and you’ll be more likely to compound or misevaluate problems.

To avoid this problem, whenever you take on any new database management assignment, begin by assessing the overall condition of the SQL Server environment. It’s time for some rapid battlefield analysis. Take a deep breath and start organizing your efforts along the following steps:

1. Take inventory.

2. Catalog issues by severity.

3. Identify required resources.

4. Assess security measures for protecting information and the environment.

6

Step 1.

Take inventoryHow many instances and databases in the enterprise are you managing? An up-to-date inventory of the hardware and systems in your purview is critical. This helps define your field of battle.

Things you need to know:

• The total inventory of the databases on your network and the amount of data they contain

• Database history, including installation and creation dates

• Which SQL Server versions and service packs are deployed

• Server, instance and operating system names

• Who your end users are

• Which databases are being backed up and how frequently — and which databases aren’t being backed up at all

• The retention periods for backups

This information becomes the battlefield map on which you will locate and evaluate your trouble spots. Armed with your wider view of the battlefield, you can start to categorize and prioritize problem areas.

Armed with your wider view of the battlefield, you can start to categorize and prioritize problem areas.

7

Step 2.

Catalog issues by severityNow it’s time to determine the severity of issues and assign priorities. Analyze your environment with the big three resources in mind: disk I/O, memory and CPU. The capacity and performance of these resources will delineate what you can and cannot do in terms of optimizing SQL Server performance.

Identify and prioritize the most critical handful of issues in terms of environment performance, security or (in extreme cases) viability. It will help you if you think of database performance through the lens of the business rather than as a purely technological issue.

Once your systems are ranked in order of importance, rank them again in order of the severity of issues. For help, use SQL Server’s error management system, which assigns severity levels from zero to 24, from mere informational messages to existential threats.

Identify and prioritize the most critical handful of issues in terms of environment performance, security or viability.

8

Step 3.

Identify required resourcesOnce you’ve captured tactical data and prioritized which issues represent the biggest and most immediate threats to business operations, you need to identify which resources are required to stabilize the SQL Server environment in the short term. Possible resources include technical expertise, software tools, server capacity and backup storage. Ask questions like the following:

• If backups or database performance are an issue, what steps can be taken and roughly when can a long-term fix be identified?

• Is the environment sufficiently secure to protect information?

• If additional resources are not available, what can you do with existing resources?

• Does the organization understand the risk?

The easiest way to proceed is to identify issues you can fix with the hardware you have on hand, and then categorize the resources you need to add, such as a solid-state drive, another disk cabinet for a SAN or extra CPUs.

Once you’ve identified resource demands, communicate those needs to your superior and appropriate colleagues. We discuss communication strategy in more detail in chapter 3, but at a minimum, it’s important to convey a few key points:

• Issues that you have identified, prioritized and justified with actionable information — your interpretation of relevant data, tailored to your internal audience

• The impact to the business if these issues are not addressed

• Which issues you can address with resources at hand, and which will require additional resources

Your triage report should show that you have methodically assessed the situation and its importance. Most SQL Server databases have problems, but not all DBAs have a plan of action. Address issues and take a proactive approach to long-term improvement by alerting managers to both the problems and potential solutions at once, constructively.

9

Step 4.

Assess security measuresWhile performance and availability will naturally be front-of-mind during the first stages of triage, you’ll also need to assess security liabilities, even though some may represent middle- or long-term threats. Be sure to keep the following best practices in mind:

REVIEW USER ACCESS, PERMISSIONS AND PASSWORDS.When creating database objects, you must grant permissions to make those objects accessible to users. Every securable object has permissions that can be granted to, or revoked from, a principal using permission statements. Granting permissions to roles, rather than to users, simplifies security administration. The permission set designated by a given role is inherited by all members of that role. It is far easier to add or remove users from a role than it is to create separate permission sets for each individual user. Roles can be nested; however, too many levels of nesting can degrade performance.

FIND AND DOCUMENT COMPLIANCE LIABILITIES.Log management for security compliance in SQL Server can be complicated. Suggestions come from all directions, including vendors, auditors and lawyers. As a DBA, it’s your job to determine what’s reasonable based on risk. Depending on your organization, industry and country, your company may have stringent compliance requirements. For example, for the Health Insurance Portability and Accountability Act (HIPAA) for patient health data, or the Payment Card Industry Data Security Standard (PCI DSS) for credit card and transactional data. You can use SQL Server’s

auditing functionality to find and document areas of concern, thereby lessening your company’s potential regulatory exposure. Regardless of the governing body, a big part of compliance initiatives is to monitor activity related to sensitive information and to keep good records.

INSTITUTE A CODE FREEZE DURING TRIAGE STEPS.Until you’ve had time to find and remediate major performance and security liabilities, introducing new code into an environment is a no-go.

While performance and availability will naturally be front-of-mind during the first stages of triage, you’ll also need to assess security liabilities.

10

ConclusionThe first step in optimizing database management is to carefully assess the SQL Server environment. By taking the four steps detailed here — taking inventory, cataloging issues, identifying resources and assessing security — you will have the initial database infrastructure status information you need to move on to chapter 2 of the playbook, which explains best practices for effective backup and recovery.

11

Chapter Two: Establishing Effective Backup and Recovery

12

Introduction In chapter 1 of the playbook, you learned how to assess your SQL Server environment. Now it’s time to ensure your organization has the backup and recovery it needs to minimize both data loss and downtime.

Many DBAs make the mistake of thinking that because they have high availability (HA), a backup and recovery strategy is unnecessary. However, the possibility of data loss is a risk even in HA scenarios. Moreover, end-user downtime will surely attract unwanted attention. Therefore, it’s critical to have a disaster recovery (DR) strategy even if you have high availability.

This chapter explains four key best practices for effective backup and recovery:

• Understand the differences between high availability and disaster recovery.

• Understand your options for high availability in SQL Server.

• Establish both a backup strategy and a recovery strategy.

• Build a disaster recovery strategy for the long term.

13

Understand the differences between high availability and disaster recovery.As noted earlier, high availability and disaster recovery are not synonymous:

• High availability is the measurement of a system’s ability to remain accessible in the event of a system component failure. Generally, HA is implemented by building in multiple levels of fault tolerance into a system.

• Disaster recovery is the process by which a system is restored to a previous acceptable state, after a natural or man-made disaster. With DR, there can be a significant loss of service while the DR plan is executed and the environment is restored.

In short, HA is about maintaining service, and DR is about retaining data.

It is imperative that both HA and DR strategies be driven by business requirements. Therefore, they should address non-functional requirements such as:

• Performance

• System availability

• Fault tolerance

• Data retention

• Business continuity

• User experience

For HA, determine any service-level agreements (SLAs) expected of your system. For DR, use measurable characteristics, such as a recovery time objective (RTO) and a recovery point objective (RPO).

High availability is about maintaining service, and disaster recovery is about retaining data.

14

Understand your options for high availability in SQL Server.High availability tools and functionality built into SQL Server are designed to ensure uptime. Here are a few options to consider:

• AlwaysOn Failover Cluster Instances — AlwaysOn Failover Cluster Instances provides server-instance level redundancy. It uses Windows Server Failover Clustering (WSFC) to provide local high availability through redundancy. A failover cluster instance (FCI) is a single instance of SQL Server that is installed across WSFC nodes and, possibly, across multiple subnets. On the network, an FCI appears as an instance of SQL Server running on a single computer, but the FCI provides failover from one WSFC node to another if the current node becomes unavailable.

• AlwaysOn Availability Groups — This enterprise-level high-availability solution was introduced in SQL Server 2012 to enable you to maximize availability for one or more user databases. It requires that the SQL Server instances reside on WSFC nodes without all the extra hardware typically associated with clustering, such as shared storage.

• Database mirroring — Database mirroring maintains a single standby database, or mirror database, of the production (principal) database. This approach increases the availability of a SQL Server database by supporting almost instantaneous failover, and also improves data protection.

• Log shipping — Like AlwaysOn Availability Groups and database mirroring, log shipping functions at the database level. It is an automated backup and restore process that allows you to create an additional copy of your database for failover. You can use log shipping to maintain one or more warm standby databases (secondary databases) for a single production database (the primary database).

15

Establish both a backup strategy and a recovery strategy.Not surprisingly, backup and recovery go hand and hand. There is no way to simply implement a recovery strategy without first assessing backup processes. Backups are instrumental to protecting critical data and giving you the ability to execute point-in-time recovery for the data in your environment. If data was maliciously or inadvertently deleted from a table, no amount of failovers on a cluster is going to bring that data back once the change was committed. As the old IT adage states: “A DBA is only as good as their last backup.”

In chapter 1 of the playbook, you took inventory of your environment, including which databases are being backed up, how frequently those backups are taken and the retention periods for backups — as well as which SQL Servers aren’t being backed up at all.

If you do identify servers that are not currently backed up, you may need some backup of your own:

• Engage your managers and explicitly declare a backup gap as a major problem.

• Investigate whether there may be a valid reason for databases not being backed up and suggest a minimum level of protection (according to your tolerance).

• Follow up with the owners of the databases to determine their RTOs and RPOs.

With this information and support, implement the agreed-upon SLAs. Regularly test a range of scenarios to ensure that your data can be restored within the expected time periods and SLAs.

Build a disaster recovery strategy for the long term.USE TRANSACTION LOG BACKUPS.

Without transaction log backups, it is impossible to restore your data to a specific point in time. Although managing transaction logs is costly — both in terms of the operational costs of your time and the capital expenditure on additional on-site and off-site storage — it’s likely your business cannot afford data loss.

As a guardian of the organization’s data, work to specify RPOs for your business. Many organizations schedule transaction log backups every 15–30 minutes; this baseline is a good starting point for your own conversations, during which you should stress the RPO is akin to data loss.

16

CREATE A RELIABLE RESTORE PROCESS.

Restore processes need to be seamless and automated. The best way to ensure effective restoration is to test and document processes continually until everybody in the department who may be called upon to do so can perform the required tasks. Great care should be taken to update the restore process on a frequent basis so that your RTOs can either be adhered to or altered if required.

In many cases, RTOs can be significantly reduced by employing third-party solutions that reduce backup and restore times using advanced compression algorithms. Some tools also offer object-level restores, which let you avoid restoring your entire database if a single table was destroyed.

ConclusionEstablishing effective backup and recovery is critical in any environment. By carefully assessing your environment as detailed in chapter 1 of the playbook and then following the best practices for backup and recovery you just learned, you will have a SQL Server environment that is more reliable and easier to manage. In the third and final chapter of the playbook, we’ll cover how you can continue to optimize SQL Server management by establishing effective communication, stabilizing the environment and taking advantage of automation.

17

Chapter Three: Ongoing SQL Server Management Optimization

18

IntroductionOnce you have assessed your SQL Server environment and established effective backup and recovery, you need to consider how to maintain SQL Server management optimization going forward. This chapter explains several best practices that will help you going forward:

• Communicate proactively across the organization.

• Graduate from reactive mode into proactive mode.

• Take advantage of automation.

19

Communicate proactively across the organization.Proactive communication is a critical skill for any successful DBA. As you work to establish and maintain an optimal SQL Server environment, you’ll need to have discussions with appropriate parties within your department, including both your manager and peers, as well as with key extra-departmental players whose needs have a large bearing on your work. Best practices for proactive communication include:

• Establish good relationships with developers and management — Capture and discuss key performance issues and events for your team.

• Gain credibility — Demonstrate awareness of problems and share what you intend to do about them.

• Establish transparency and accountability — Become a conduit of relevant information for management and colleagues, keeping them abreast of the health, status and ongoing priorities in managing the environment.

• Report on both problems and solutions — Keep a standardized report in which you update stakeholders on the status and next steps of key issues to show that you’re bringing constructive solutions along with the bad news. This is a career-enhancing trait that you need to make a part of your daily routine. In addition to enhancing transparency and accountability, these reports also build in some protection for you in case a major potential problem that you flagged and which required attention or resources from others went unheeded.

Define goal

Identity improvement candidates

Prioritize candidates

Identify metrics to track

Create or update baseline

Review a candidate

Change component (non-production)

Run workload

Benchmark performance Reset to base state

Improvement?

YesNo

Deploy to production

Figure 1. The basic steps in a baselining and benchmarking methodology

20

Graduate from reactive mode into proactive mode.In chapters 1 and 2 of this playbook, you ensured that your environment’s most immediate threats have been addressed and that your data is recoverable within acceptable parameters. It’s now time to look ahead and introduce stability and visibility into your environment. This is your opportunity to switch from reactive to proactive mode by tuning your environment to prevent problems in advance.

INTRODUCE BASELINING AND BENCHMARKING METHODOLOGY.

A baseline and benchmark methodology helps you spot problems and engage in a process of continuous improvement. Figure 1 shows the basic steps.

Our white paper, “Ten Tips for Optimizing SQL Server Performance,” provides a methodical approach for:

• Determining goals

• Tracking and analyzing data

• Creating a cycle of improvement that keeps you ahead of major performance problems

The paper also includes many other helpful tips for tuning and optimizing your environment so you can help ensure that your initial triage efforts have set the stage for longer-term performance and stability.

21

FOLLOW BEST PRACTICES FOR DEPLOYING CODE.

You’ll want to further protect your gains by instituting best practices around introducing new code into the environment. Many DBAs face constant requests to push out code from various points in the organization, which results in adhoc deployments that not only cause a great deal of stress, but also introduce security and performance risks.

Part of the solution lies in developing a standard process for reviewing and scheduling code deployment requests. Working with developers and other colleagues to craft this process might take some doing — be sure to take advantage of the best practices for proactive communication outlined earlier.

Every organization has its unique demands and working culture, but any good code deployment protocol will include the following components:

• Formal review

• A defined testing step, in a testing environment

• A step for discussing and tweaking problematic code

• Guidelines about when a rollback will be instituted and how it will be performed

22

Take advantage of automation.Stabilizing your environment also means stabilizing your routine; as you transition from reactive to proactive strategies, automation will become a necessity. As a bonus, automation will also give you the time you need to engage with your organization, tackle problems like ensuring your system can scale with the business and become a rock-star DBA.

Any task that is repeatable is a candidate for automation. As you identify ways to move from being a firefighter to the guardian of an environment that breezes along, start looking for tasks that can be run automatically and for tools that can help deliver that automation.

Automating your monitoring with a third-party tool is the best way to gain a consistent and guaranteed level of analysis across your entire SQL Server estate. No two SQL Server professionals would write the same code or look in the same place to solve a given problem, but consistency and reliability are the cornerstones for every DBA. You need a tool that you can rely on to perform the same every time, 24x7x365.

ConclusionInheriting an environment that you didn’t have the luxury of building can present you with some highly stressful challenges, but solving those challenges can be equally satisfying. By following the best practices in this playbook, you can optimize your SQL Server environment and get out of firefighting mode — and hopefully enjoy some of your evenings and weekends after you’ve taken the reins.

23

© 2016 Quest Software Inc. ALL RIGHTS RESERVED.

This guide contains proprietary information protected by copyright. The software described in this guide is furnished under a software license or nondisclosure agreement. This software may be used or copied only in accordance with the terms of the applicable agreement. No part of this guide may be reproduced or transmitted in any form or by any means, electronic or mechanical, including photocopying and recording for any purpose other than the purchaser’s personal use without the written permission of Quest Software Inc.

The information in this document is provided in connection with Quest Software products. No license, express or implied, by estoppel or otherwise, to any intellectual property right is granted by this document or in connection with the sale of Quest Software products. EXCEPT AS SET FORTH IN THE TERMS AND CONDITIONS AS SPECIFIED IN THE LICENSE AGREEMENT FOR THIS PRODUCT, QUEST SOFTWARE ASSUMES NO LIABILITY WHATSOEVER AND DISCLAIMS ANY EXPRESS, IMPLIED OR STATUTORY WARRANTY RELATING TO ITS PRODUCTS INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTY OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT. IN NO EVENT SHALL QUEST SOFTWARE BE LIABLE FOR ANY DIRECT, INDIRECT, CONSEQUENTIAL, PUNITIVE, SPECIAL OR INCIDENTAL DAMAGES (INCLUDING, WITHOUT LIMITATION, DAMAGES FOR LOSS OF PROFITS, BUSINESS INTERRUPTION OR LOSS OF INFORMATION) ARISING OUT OF THE USE OR INABILITY TO USE THIS DOCUMENT, EVEN IF QUEST SOFTWARE HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. Quest Software makes no representations or warranties with respect to the accuracy or completeness of the contents of this document and reserves the right to make changes to specifications and product descriptions at any time without notice. Quest Software does not make any commitment to update the information contained in this document.

Patents

Quest Software is proud of our advanced technology. Patents and pending patents may apply to this product. For the most current information about applicable patents for this product, please visit our website at www.quest.com/legal .

Trademarks

Quest, and the Quest logo are trademarks and registered trademarks of Quest Software Inc. in the U.S.A. and other countries. For a complete list of Quest Software trademarks, please visit our website at www.quest.com/legal. All other trademarks, servicemarks, registered trademarks, and registered servicemarks are the property of their respective owners.

ABOUT QUEST

Quest helps our customers reduce tedious administration tasks so they can focus on the innovation necessary for their businesses to grow. Quest® solutions are scalable, affordable and simple-to-use, and they deliver unmatched efficiency and productivity. Combined with Quest’s invitation to the global community to be a part of its innovation, as well as our firm commitment to ensuring customer satisfaction, Quest will continue to accelerate the delivery of the most comprehensive solutions for Azure cloud management, SaaS, security, workforce mobility and data-driven insight.

Ebook-Playbook4OptimizedSQLServerMgmt-Full-US-KS-25205

If you have any questions regarding your potential use of this material, contact:

Quest Software Inc. Attn: LEGAL Dept 4 Polaris Way Aliso Viejo, CA 92656

Refer to our Web site (www.quest.com) for regional and international office information.