Post on 27-Jan-2015
description
Copyright 2009 – Database Architechswww.dbarchitechs.com
HA cover
SQ
L S
atur
day
2009
–P
ortla
nd, O
rego
n
Copyright 2009 – Database Architechswww.dbarchitechs.com
Paul Bertucci• Founder Database Architechs –www.dbarchitechs.com
– Specializing in HA, Database Design, Data Architecture, Data Replication, and P&T for SQL Server, Sybase, DB2 and Oracle
– Over 28+ years experience in Data Base industry• Co-Author of SQL Server 2000 Unleashed! (SAMS)• Co-Author of SQL Server 2005 Unleashed! (SAMS)• Co-Author of SQL Server 2008 Unleashed! (SAMS) – Summer 2009 !• Co-Author of ADO.NET in 24 hours (SAMS)• Author MS SQL Server High Availability (SAMS)• Author Sybase Performance & Tuning• Author Sybase Physical DB Design• Veritas SQL Server Performance Series • Former Chief Data Architect Symantec Corporation• Current Chief Architect Autodesk Corporation
pbertucci@dbarchitechs.com
Copyright 2009 – Database Architechswww.dbarchitechs.com
Agenda
� What is High Availability?
� How do you assess your HA Requirements?
� What are the MS SQL Server related options for HA?
� How each option delivers HA…
� Performance and Tuning is critical too – SQL Shot!
� Q & A
Copyright 2009 – Database Architechswww.dbarchitechs.com
Test
1. What is the quickest way to test if your SQL Server Clusteringconfiguration is failing over properly?
2. What is the SQL Server feature in SQL Server 2005/2008that replaces Log Shipping?
Copyright 2009 – Database Architechswww.dbarchitechs.com
What is Availability?ApplicationAvailability
PlannedDowntime
UnplannedDowntime
RecoverableDisaster
Failure causes:- Human- Hardware- Software
Uptime
Copyright 2009 – Database Architechswww.dbarchitechs.com
The cost of Un-Availability
� Airline Reservation Systems - $67K to $112K per hour
� ATM Service Fees - $12K to $17K per hour
� Brokerage (Retail) - $5.6M to $7.3M per hour
What is your cost of downtime?
Copyright 2009 – Database Architechswww.dbarchitechs.com
Across all layers of your systems
NetworkApplication
MiddlewareDatabase
Operating System HARDWARENetwork Components
ServersDisk Systems
Memory
Copyright 2009 – Database Architechswww.dbarchitechs.com
Availability across planned operation
Feb 14-28 Mar 1 – Apr 15 Apr 16 – 20
Ava
ilabi
lity
(%)
100%
90%
Availability Goals
Starting Date Date of Failure Days Hours Minutes MBU (minutes) TU Avail %Period 1 2/14/2008 2/28/2008 15.00 24.00 60.00 21600.00 38.00 99.82407Period 2 3/1/2008 4/15/2008 46.00 24.00 60.00 66240.00 68.00 99.89734Period 3 4/16/2008 4/20/2008 5.00 24.00 60.00 7200.00 442.00 93.86111
Overall 2/14/2008 4/20/2008 66.00 24.00 60.00 95040.00 548.00 99.4234
Copyright 2009 – Database Architechswww.dbarchitechs.com
Extreme Availability
High Availability
Standard Availability
Acceptable Availability
Marginal Availability
Near zero downtime!
Minimal downtime
With some downtimetolerance
Non-critical Applications
Non-production Applications
8,760 hours/year | 168 hours/week | 24 hours/day
525,600 minutes/year | 7,200 minutes/week | 1,440 minutes/day
(99.5% - 100%)
(95% - 99.4%)
(83% - 94%)
(70%-82%)
(up to 69%)
Availability ContinuumCharacteristic Availability Range
Availability Range describes the percentage of time relative to the “planned” hours of operations
Copyright 2009 – Database Architechswww.dbarchitechs.com
ZeroUnplannedDowntime
ZeroPlanned
Downtime
Marginal Availability
Acceptable Availability
Standard AvailabilityHigh Availability
Extreme AvailabilityApplications and Availability
eCommerce
ATM
HRAccounting
MarketingMailers
911
InventoryMgmt
Five 9’s (99.999%) ~ 6 minutes/year downtime
Copyright 2009 – Database Architechswww.dbarchitechs.com
What do you need?
It’s as simple as 1, 2, 3 +
� Step One – Launch of a brief “Phase 0” HA Assessment
� Step Two – Complete an HA Primary Variables gauge
� Step Three – Match your need to the optimal HA solution
� Step + (optional) – Determine the ROI of the HA solution
Copyright 2009 – Database Architechswww.dbarchitechs.com
Assessing HA with Primary Variables
Uptime Requirement
Time to Recover
Data Resiliency
Application Resiliency
Cost of Downtime ($$ lost/hr)
Cost of the High Availability Solution ($$)
Degree of Distributed Access/Synchronization
Scheduled Maintenance Frequency
Performance/Scalability
0% 100%
Long Short
Low High
Low High
Low High
Often Never
Low High
Low High
Low High
Tolerance of Recovery TimeHigh Low
Copyright 2009 – Database Architechswww.dbarchitechs.com
Assessment(scope)
Assessment(scope)
Development Methodology“With High Availability built in”
RequirementsRequirements
DesignDesign
Code & TestCode & Test
System Test &Acceptance
System Test &Acceptance
ImplementationImplementation
0. Assessment- Project Planning- Project Sizing- Deliverables Identified (SOW)- Schedules/milestones- High-Level Requirements (scope) - Estimate HA Primary Variables (gauges)
1. Requirements- Detail Requirements (process/data/technology)- Early Prototyping (optional)- Detailed HA Primary Variables- Detailed Service Level Agreements/Rqmts- Detailed Disaster Recovery requirements
2. Design- Detail Design (data/process/technology)- Choose and design the matching HA solution for the application
3. Code & Test- Code Development/Unit Testing- Fully integrate the HA solution with the application
4. System Test & Acceptance- Full system Test/User Acceptance- Full HA Test/Validation/Acceptance
5. Implementation- Production Build/Implementation- Production HA build/monitoring begins
Copyright 2009 – Database Architechswww.dbarchitechs.com
Transition
Spiral/Rapid MethodologyIterative approach 0. Initial assessment
- Project Planning- Project Sizing- Deliverables Identified (SOW)- Schedules/milestones- High-Level Functions (scope)-Estimate HA Primary Variables (gauges)
3. Requirements- Detail Requirements
(process/data/technology)- Detail HA Primary Variables- Detailed SLA/Rqmts- Detailed Disaster
Recovery requirements
1. High-level Rqmts/Prototyping- High-level requirements (process/data/technology)
- High-level HA Primary Variables
4. Design- Detail Designs (process/data/technology)- Choose and design the matching HA solution for the application (verified via prototypes)
Elaboration/Prototype
Inception
2. Early Code & Test- Early code and testing of apps/DI
(process/data/technology)- Prototyping of HA options
Construction
5. Code & Test- Code Development/Unit Testing-Fully integrate the HA solutionwith the application
6. System Test & Acceptance- Full system Test/User Acceptance
-Full HA Test/Validation/Acceptance
7. Implementation- Production Build/Implementation- Production HA build/monitoring begins
Copyright 2009 – Database Architechswww.dbarchitechs.com
LogShipping
DBMirroring
SQL Clustering
Data Replication
Cluster Services
OtherHW
DiskMethods
Valid High Availability Options
DiskMethods
OtherHW
Cluster Services
Data Replication
SQL Clustering
DBMirroring
LogShipping
Copyright 2009 – Database Architechswww.dbarchitechs.com
Cluster GroupResources
Windows 2003Enterprise Edition
MSCS Cluster Services
Windows 2003Enterprise Edition
SCSI
LocalBinaries
LocalBinaries
Node A
D:
C:
C:Node B
Shared Disk
Q: Quorum
Copyright 2009 – Database Architechswww.dbarchitechs.com
Windows 2003Enterprise Edition
Windows 2003Enterprise Edition
SQL Server 2008 (Virtual SQL Server)
SCSI
LocalBinaries
LocalBinaries
COLTST1SQL Clustering
E:
C:
C:
Master DBTempDBAppl 1 DB
COLTST2
SQL Server 2008 (physical)
SQL Server 2008 (physical)
SQLConnections
Quorum Disk
Q:VSQLDBARCH\VSQLSRV1
Cluster GroupResources
MS DTC
SQL Agent
Copyright 2009 – Database Architechswww.dbarchitechs.com
Publication
Server
distribution
Central Publisher/Remote DistributorReplication model
AdventureWorks
SQL Server 2008
AdventureWorks
SQL Server 2008 Subscription
Server
SQL Server 2008
DistributionServer
Data Replication
“Primary”“Replicate”
Can be used as a Warm Standby and/or for Reporting needs
AdventureWorks
SQL Server 2008 Subscription
Server
“Replicate”
Database Mirroring
MSDB DB
SQL Server 2008
SQL Server 2008
WitnessServer
Adventure Works DB
SQL Server 2008 Mirror Server
translog
PrincipalServer
translog
Adventure Works DB
Network
Client Client ClientClient
B
C
A
DD
Database Mirroring
20FIG34
MSDB DB
SQL Server 2008
SQL Server 2008
WitnessServer
AdventureWorksDB
SQL Server 2008
Mirror Server
translog
PrincipalServer
Networ
k
Networ
k
Database Snapshot
Repor
ting U
sers
AdventureWorksDB
translog
Database Mirroring with DB Snapshots
Copyright 2009 – Database Architechswww.dbarchitechs.com
SQL Server 2008
“Source”
CallOne DB
CallOne DB
SQL Server 2008
SecondaryServer
“Destination”
translog
\Backup\CallOne_tlog_200405141120.TRN
TxnLogbackupsPrimary
Server
TxnLog Copies
\LogShare\CallOne_tlog_200405141120.TRN TxnLog Restores
MSDB DB
SQL Server 2008
MonitorServer
“Monitor”
Last
log
sh
ipp
ed
Del
ay A
nsw
er
Delay between logs loaded
Delay Answer
Log Shipping
Copyright 2009 – Database Architechswww.dbarchitechs.com
RAID Disk I/O Summary
RAID Level Fault Tolerance Logical Reads
Physical I/Os per
Read
Logical Writes
Physical I/Os per Write
RAID 0 None 1 1 1 1
RAID 1 or 10 Best(Optimal for OLTP)
1 1 1 2 writes
RAID 5 Moderate (Optimal for mostly
READ ONLY systems)
1 1 1 2 reads + 2 writes(that’s 4 per write!)
Several RAID vendors are now showing RAID 5 and RAID 10performance almost equivalent now via Cache/Buffer
advancements on their RAID controllers
NOTE:
Copyright 2009 – Database Architechswww.dbarchitechs.com
Fault Tolerance and SQL DB FilesDescription Fault Tolerance
Quorum Drive The quorum drive used with MSCS should be
isolated to a drive by itself (very often mirrored as well for maximum availability)
RAID 1 or
RAID 10
SQL Server
Database
files (OLTP)
For OLTP (online transaction processing) systems,
the database data/index files should be placed
on a RAID 10 disk system.
RAID 10
SQL Server
Database
files (DSS)
For DSS (Decision Support Systems) systems that
are primarily READ ONLY, the database
data/index files should be placed on a RAID 5
disk system.
RAID 5
Temp DB Highly volatile disk I/O (when not able to do all it’s
work in cache)
RAID 10
SQL Server
Transaction
Log files
The SQL transaction log files should be on their own
mirrored volume for both performance and
database protection. (for DSS systems, this
could be RAID 5 also).
RAID 10
Or RAID 1
Copyright 2009 – Database Architechswww.dbarchitechs.com
Example DB data files configuration
TempDB
OLTP X - DB
Master DB
OLTP Y - DB
log
log
DSS - DB(read only)
log
E:
F:
G:
H:
RAID 10
RAID 5
Q: Quorum
RAID 1 or RAID 10
Copyright 2009 – Database Architechswww.dbarchitechs.com
Database Snapshots
Decision Tree approach
Condition/Question
Case A Case B Case CCase D Case nAction
VAction
WAction
XAction
YAction
Z
. . .. . .
DiskMethods
OtherHW
Cluster Services
Data Replication
SQL Clustering
DatabaseMirroring
LogShipping
DistributedTransactions
Copyright 2009 – Database Architechswww.dbarchitechs.com
Decision-Tree Path Traversal
1
HW/DiskRedundancy
Cluster Services
Data Replication
SQL Clustering
LogShipping
DistributedTransactions
HANot Needed
1b2ba
cd
e
1c2
1a2
1e2
1d2
1a2c3
1a2a3
1a2b3
1a2d3
1a2e3
ab c d
e
DatabaseMirroring
DatabaseSnapshots
Copyright 2009 – Database Architechswww.dbarchitechs.com
Decision-Tree: ASP Questions 1-3What % of availability must your application have?
A% <= 70%
MarginalAvailability
70% < A% < =83% 83 < A% < =95% A%> 99.5%
AcceptableAvailability
StandardAvailability
ExtremeAvailability
HighAvailability
95% < A% < =99.5%
How much tolerance of downtime by end-users?
Very High
NotCritical
High Medium Very Low
LowCriticality
StandardCriticality
ExtremelyCritical
HighCriticality
Low
What is the per hour cost of downtime for this application?
$C<= $3K
Very LowCost
$3K < $C < =$7K $7K < $C < =$12K $C > $20K
LowCost
ModerateCost
Very HighCost
HighCost
$12K < $C < =$20K
1
2
3
Copyright 2009 – Database Architechswww.dbarchitechs.com
Decision-Tree: ASP Questions 4-6How long does it take to get the application back online?
Very Long
MarginalRecoverability
Long Average Very Short
AcceptableRecoverability
StandardRecoverability
ExtremeRecoverability
FastRecoverability
Short
How much of the application is distributed?
None
Non-Distributed
A Little Medium All
LowDistribution
ModeratelyDistributed
ExtremelyDistributed
HighDistribution
A Lot
How much data inconsistency can be tolerated?
Very Little
Very HighConsistency
A Little Medium Very Much
HighConsistency
ModerateConsistency
MinimalConsistency
LowConsistency
A Lot
4
5
6
Copyright 2009 – Database Architechswww.dbarchitechs.com
Decision-Tree: ASP Questions 7-9How often is scheduled maintenance required?
Very HighDowntime
Average Very Often
HighDowntime
ReasonableDowntime
MinimalDowntime
LowDowntime
Often
How important is high performance and scalability?
Not Very
Very lowPerformance
Somewhat Moderately Extremely
LowPerformance
ReasonablePerformance
ExtremePerformance
HighPerformance
Very Much
How important is the application connection to the end-user?
NotNeeded
Establish newConnection
ConnectionRetry process
ConnectionFail-over
ConnectionRe-established
easily
7
8
9
RarelyNot Often
Not Very Somewhat Moderately ExtremelyVery Much
Copyright 2009 – Database Architechswww.dbarchitechs.com
Decision-Tree: ASP Question 10What is the estimated cost of the HA Solution (budget)?
C$ < $10K
Very LowCost
$10K <= C$ < $100K
LowCost
ModerateCost
ExtremeCost
HighCost
10
DiskMethods
OtherHW
Cluster Services
SQL Clustering
$100K <= C$ < $250K $250K <= C$ < $500K C$ >= $500K
1. 1e ���� Extreme Availability goal2. 1e+2d ���� Very low tolerance of downtime3. 1e+2d+3e ���� $15k/hr cost of downtime (High Cost)4. 1e+2d+3e+4c ���� Average recovery time5. 1e+2d+3e+4c+5a ���� No distributed components or synchronization6. 1e+2d+3e+4c+5a+6b ���� A little data inconsistency can be tolerated7. 1e+2d+3e+4c+5a+6b+7c ���� Average amount of scheduled downtime8. 1e+2d+3e+4c+5a+6b+7c+8d ���� Performance is very much important9. 1e+2d+3e+4c+5a+6b+7c+8d+9b ���� Connection can be re-established10. 1e+2d+3e+4c+5a+6b+7c+8d+9b+10c ���� Moderate HA Cost/Good budget
Best fitting HA Solution (together)
Copyright 2009 – Database Architechswww.dbarchitechs.com
Basic “one-two” Punch approach
Hardware/NetworkRedundancy
Disk BackupsDB Backups
Vendor SLA’s
Training, QA,& Standards
SoftwareUpgrades
Build the proper foundation first
Then, build within the appropriate HA solution that your application requires
1
2
Database Snapshots
DiskMethods
OtherHW
Cluster Services
Data Replication
SQL Clustering
DatabaseMirroring
LogShipping
DistributedTransactions
Copyright 2009 – Database Architechswww.dbarchitechs.com
ASP – Scenario #1 with SQL Clustering
Windows 2003Enterprise Edition
Windows 2003Enterprise Edition
SQL Server 2005 (Virtual SQL Server)
SCSI
LocalBinaries
LocalBinaries
ASPProd1
E:
C:
C:
Master DB
TempDB
HOE DB
ASPProd2
SQL Server 2005 (physical)
SQL Server 2005 (physical) Quorum Disk
Q:
ASQL\ASPSERV1
Cluster GroupResources
MS DTC
SQL Agent
Net
wor
k
JRU
N/W
ebS
ervi
ces/
IIS
F:
G:
MSCS
MSCSActive/PassiveConfiguration
Copyright 2009 – Database Architechswww.dbarchitechs.com
SQL Server 2000
“Source”
CallOne DB
CallOne DB
SQL Server 2000
SecondaryServer
“Destination”
translog
\Backup\CallOne_tlog_200405141120.TRN
TxnLogbackupsPrimary
Server
TxnLog Copies
\LogShare\CallOne_tlog_200405141120.TRN TxnLog Restores
MSDB DB
SQL Server 2000
MonitorServer
“Monitor”
Last
log
sh
ipp
ed
Del
ay A
nsw
er
Delay between logs loaded
Delay Answer
Log Shipping
Copyright 2009 – Database Architechswww.dbarchitechs.com
Live REPL solutionPublication
Server
distribution
Central Publisher/Remote DistributorReplication model
MktgDB
SQL Server 2000
MktgDB
SQL Server 2000Subscription
Server
SQL Server 2000
DistributionServer
Headquarters (Santa Clara)
North America (Reporting & “warm/hot” spare)
MktgDB
SQL Server 2000Subscription
Server
Europe (Reporting)
MktgDB
SQL Server 2000Subscription
Server
Far East (Reporting)
Copyright 2009 – Database Architechswww.dbarchitechs.com
Publication
Server
distribution
Central Publisher(default option)
Northwind
SQL Server 2000 Northwind
SQL Server 2000Subscription
Server
DistributionServer
Northwind
SQL Server 2000Subscription
Server
Northwind
Oracle Subscription
Server
Northwind
SQL Server 7.0Subscription
Server
Copyright 2009 – Database Architechswww.dbarchitechs.com
Publication
Server
distribution
Central PublisherRemote Distributor
Northwind
SQL Server 2000 Northwind
SQL Server 2000Subscription
Server
Northwind
SQL Server 2000Subscription
Server
Northwind
Oracle Subscription
Server
Northwind
SQL Server 7.0Subscription
Server
SQL Server 2000
DistributionServer
Distributing DataData Access Latency Autonomy Sites(locations)
Frequency Network Machines Owner Other
Read OnlyReporting short high many high fast/
stable1
server/site1 OLTP
site
Each site only needs regionaldata
Central PublisherTransactional replfilter by region
REPLICATION
Read OnlyReporting long high many low fast/
stable1
server/site1 OLTP
site
Each site only needs regionaldata
Central PublisherSnapshot replfilter by region
Read MostlyA few updates short high < 10 medium fast/
stable1
server/site1 OLTP
site
Regional updateson one table
Central PublisherTransactional replUpdating Subs
Read MostlyA few updates medium high < 10 medium slow/
unreliab1
server/siteAll
update
Regional updateall tables
Central PublisherMerge repl
Inserts (new orders) short high many high fast/
stable1
server/site
1reportsite
Each site only needs regionaldata
Central SubscriberTransactional repl
Hot/WarmSpare
Veryshort
high < 2 high fast/stable
1server/site
1 OLTPsite
Fail-overCentral PublisherRemote DistributorTransactional repl
Read equalEqual updates short high < 10 medium fast/
stable1
server/siteAll
update
Regional updateall tables
Peer-to-PeerTransactional
repl
Database Mirroring
Database Mirroring
Database Mirroring
Copyright 2009 – Database Architechswww.dbarchitechs.com
Piecing it togetherHardware/Network
RedundancyDisk BackupsDB Backups
Vendor SLA’s
Training, QA,& Standards
SoftwareUpgrades
NetworkApplication
MiddlewareDatabase
OperatingSystem
HARDWARENetwork Components
ServersDisk Systems
Memory
System
Stack
Foundation, Foundation, Foundation
Copyright 2009 – Database Architechswww.dbarchitechs.com
ROI Calculati
on
Copyright 2009 – Database Architechswww.dbarchitechs.com
Database Mirroring
MSDB DB
SQL Server 2008
SQL Server 2008
WitnessServer
Applx DB
SQL Server 2008
Mirror Server
translog
PrincipalServer
translog
Applx DB
Network
B
C
A
DD
Transparent Client Redirect
“Copy-on-Write” technology
20FIG30
SQL Server 2005 Database Mirroring
20FIG31
SQL Server 2008 Database Mirroring
16
Copyright 2009 all rights reservedCopyright 2009 all rights reserved
4343
Copyright 2009 – Database Architechswww.dbarchitechs.com
Database Mirroring with DB Snapshot
MSDB DB
SQL Server 2005
SQL Server 2005
WitnessServer
Applx DB
SQL Server 2005
Mirror Server
translog
PrincipalServer
translog
Applx DB
Networ
k
Networ
k
Database Snapshot
Repor
ting U
sers
04SQL Server 2008
SQLServer
AdventureWorksDB
SnapshotAdventureWorks
DB
Source DataPages
System Catalogof changed pages
SparseFile
Pages
SELECT …..data……. FROM AdventureWorks
SNAPSHOT
Snapshot Users
PH Topology With SnapshotsPH Topology With Snapshots
SQL Server 2008
Adventure Works DB
SQL Server 2008
Mirror Server
translog
PrincipalServer
translog
Adventure Works DB
Instance: SQL2008xyz
Instance: SQL2008zzz
Endpoint Name: “endpoint4mirroring”
Endpoint Name: “endpoint4mirroring”
Role: PARTNER
Role: PARTNER
PH Topology
SQL Server 2008
PrincipalServer
Active
Passive
Clu
ster
ed
OLT
P A
pplic
atio
n
Rep
licat
ion
NetworkNetwork
Less CriticalReporting Users
Database Snapshot
Net
wo
rkN
etw
ork
CriticalReportUsers
Copyright 2009 – Database Architechswww.dbarchitechs.com
The Combo PackSQL Server 2008
PrincipalServer
SQL Server 2008
MirrorServer
SQL Server 2008
WitnessServer
SQL Server 2008
PrincipalServer
SQL Server 2008
MirrorServer
SQL Server 2008
Subscriber
SQL Server 2008
Subscriber
Publisher
Distributor
Copyright 2009 – Database Architechswww.dbarchitechs.com
DB Availability Improvement !
SQL Server 2000
SQLServer
SQL Server 2005/2008
SQLServer
time
RestartStage
Tra
nsac
tions
Rol
led
For
war
dT
rans
actio
nsR
olle
d B
ack
Res
tart
co
mpl
ete
SQL Server 2005/2008database is available
SQL Server 2000database is available
Performance and Tuning counts in HA
SQL SHOT – MS SQL Server
Copyright 2009 – Database Architechswww.dbarchitechs.com
Questions
Is there any time left????
Copyright 2009 – Database Architechswww.dbarchitechs.com
SQL Server Resources
Stop!
Copyright 2009 – Database Architechswww.dbarchitechs.com
Fail-Over via Move Group
ANSWER to question #1
Copyright 2009 – Database Architechswww.dbarchitechs.com
SQL Server 2000
Northwind 2
SQL Server 2000
Distributed Transactions
“Primary Location”
“Secondary Location”
Northwind 1ReadsTry primary first
If not available, try secondaryUpdates
Must succeed together, or be
both rolled back(two-phase commit)
MS
DT
C