IBM Tivoli Storage Manager for Virtual Environments-Data ... · PDF fileTSM Symposium 2013:...
Transcript of IBM Tivoli Storage Manager for Virtual Environments-Data ... · PDF fileTSM Symposium 2013:...
TSM Symposium 2013: Tivoli Storage Manager: Future Expectations – Vendor Talks
17.-20. September 2013, Hilton Hotel Gendarmenmarkt, Berlin, Germany
1
IBM Tivoli Storage Manager for Virtual Environments-Data Protection for VMware:
Solution Design
Dan Wolfe
Tivoli Storage SWAT
© 2013 IBM Corporation 2
Agenda: The four “S’s” of solution Design for DP for VMware
Strategizing (Planning)
Sizing
Scheduling
Support (for recovery/restore procedures)
NOTE: Familiarity with concepts of TSM for Virtual Environments is pre-req: other sessions at the TSM
Symposium provide TSM for VE concepts
© 2013 IBM Corporation
Or……
Simple
Steps to a
Successful
System
Solution
3
© 2013 IBM Corporation 4
© 2013 IBM Corporation 5
STRATEGIZING
© 2013 IBM Corporation
Strategizing: Key areas
Transition from in-guest backup
vSphere architecture: vCenters, Datacenters, Clusters, etc.
Backup storage device selection (e.g., disk, file, tape, VTL)
6
© 2013 IBM Corporation
Example environment description
7
Parameter Value Units
Utilized Data 400000 GB
Total VM’s 4000 Count
Total ESX Hosts 100 Count
Total Clusters 20 Count
Daily Data Change Rate 2% Pct
Backup Window 10 Hours
Days Retention Dev: 7, Prod: 30 Days
© 2013 IBM Corporation
Transitioning from legacy in-guest backup: Differences
Not available with legacy backup:
– Full image AND file-level restore from single backup image
– One step full image restore (NOTE: TBMR can provide this with legacy in-guest)
– Centralized file-level restore capability using DP for VMware “mount”
– Eliminate CPU and i/o workload on VM guest during backup
– Efficient, block-level backup
– Automatic detection of new VM’s
• Individual VM registration not required for backup
– Agentless
– Queries/reports are filespace based (not node based)
8
© 2013 IBM Corporation
Transitioning from legacy in-guest backup: Differences
Not available with VM image backup
– Individual file exclude/include capability
• However, VM image backup can exclude/include specific vmdk’s
– Customized retention for files within a VM guest
– Version selection for file-level restore (across all backups)
9
© 2013 IBM Corporation
Transitioning from legacy in-guest backup : Key planning elements
Initial phase-in of full backups:
– Establish timeline requirement for initial phase-in: e.g., how many weeks?
• vBS (vStorage Backup Server) sizing must include phase-in
• Temporary vBS’s for phase-in can be used
What to do with down level VM’s?
– ESX/i V3.5 does not have CBT (Change Block Tracking)
– Consider continuing with in-guest backup until ESXi hosts are upgraded
Determine infrastructure requirements: image backup vs. legacy file-level
– Daily incremental backup amount will be similar, with some “inflation”:
• Disk-block (image) backup vs. file-level backup: some “rounding” will occur
• No ability to exclude individual files: O/S files, swap files
– Network infrastructure must be capable of workload
– TSM server/s must be capable of workload
Application backups: Use in-guest backup only, or combine with VM image backup?
10
© 2013 IBM Corporation
vSphere architecture and TSM for Virtual Environments
Collaboration between VMware architects and storage/backup architects is important
Consider how vSphere architecture corresponds to backup requirements
– Architecture of vSphere environment can facilitate backup scheduling and policies
– For example:
• VM’s with similar retention requirement grouped within the same cluster
– Other groupings can be used:
• “Folders”
• Similar VM names (using v6.4 wildcard specification in backup schedule)
11
© 2013 IBM Corporation
vSphere architecture and TSM for Virtual Environments (2)
Distribution of VM storage capacity will help to determine vBS placement
– Is capacity evenly distributed across clusters?
• Evenly distributed sizes result in more evenly distributed vBS’s
• This is usually not possible
– For example, clusters with larger amounts of storage may require multiple vBS’s
– Placement of vBS’s important to balance workload
Identify network infrastructure for backup/restore (backup vs. production)
– Using the right network interface requires planning and configuration
• This is critical factor when using NBD and LAN communication to TSM server
– VADP backup technology uses vmkernel port and Management Network
• Vlan should include TSM server when using LAN communication
Determine scope of vMotion and TSM node assignments
– Contain vMotion scope within a single TSM “Datacenter” node
12
© 2013 IBM Corporation
Simplified example: vSphere Architecture
13
Vcente
r S
erv
er
Cluster
Cluster
Cluster
Cluster
Cluster
D
S D
S
D
S D
S D
S
Datastore SAN
261TB * 80% = 209TB
Pro
d.
D.C
. D
ev.
D.C
.
Production Clusters:
• 15 clusters
• 300TB
• 30 day retention
Dev/Test Clusters:
• 5 clusters
• 100TB
• 7 day retention
© 2013 IBM Corporation
Backup storage pool device factors
Cost vs. restore performance
– Evaluate tradeoff between restore performance and storage costs
– Consider co-location by filespace (VM) with tape/VTL when possible
– Specific, critical VM’s can be configured as “exceptions” for management class
• Highest performance storage device for subset of VM’s
Deduplication strategy:
– TSM native
• NOTE: client-side deduplication will influence throughput estimates
– Appliance
– Hybrid:
• Subset of VM’s backup to TSM server instance with TSM deduplication
• Subset of VM’s backup to TSM server instance with deduplicating appliance
– Deduplication ratio: benchmark to ensure realistic estimate is used
14
© 2013 IBM Corporation 15
SIZING
© 2013 IBM Corporation
Sizing: Estimate the resource requirements
Objectives of sizing:
– Determine number of TSM server instances
– Determine capacity requirements for TSM storage pools
– Determine number of vBS’s (vStorage Backup Servers)
• Determine physical or virtual vBS’s
Key parameters to determine sizing:
– Utilized storage capacity of VM’s to backup (current and future)
– Retention requirements
– Throughput estimates for backups AND restores
– Backup window
16
© 2013 IBM Corporation
Sizing: Key factors
Workload estimates:
– Daily incremental backups
– Daily full backups (newly created VM’s, and VM’s with CBT “reset”)
– Daily full VM images restores
– Initial full backup phase-in period (“contingency”)
– Disaster Recovery requirements (“contingency”)
Backup and restore throughput estimates:
– This is the most critical parameter for sizing
• And the most difficult to estimate
– Benchmarking is STRONGLY encouraged
– There is an upper limit to performance,
• determined by capabilities of infrastructure: laws of physics prevail!
– There is NO lower limit to performance: many factors can work against good
performance
17
© 2013 IBM Corporation
Sizing: TSM server
Number of TSM Server instances required
– May be driven by any or all of the following:
– Capacity limits of a single TSM server instance (e.g., TSM DB size)
– Throughput limits of a single TSM server instance
– Peak load restore requirement
• Don’t forget this!
• DR restore requirements may drive the number of server instances required
– Organizational boundaries (e.g., separation of dev/test and production)
Storage pool sizing requirements:
– Total backup data
– Daily change rate of data (incremental backups)
– Retention requirements
– Expected deduplication ratio
• Key parameter, difficult to predict
• Use conservative estimate or benchmarking to avoid under-sizing
18
© 2013 IBM Corporation
Sizing: Determine number (and placement) of vBS’s
Key factor for estimating number of vBS’s (vStorage Backup Servers):
– Throughput estimates:
• Full and incremental backup throughputs are typically different
• Backup and restore rates may be different
• Client-side deduplication will influence throughput rates (backup and restore)
Use of “contingencies”
– Contingency planning is important to ensure adequate capacity
• Initial phase-in
• DR restores
19
© 2013 IBM Corporation
Determine physical or virtual vBS
Can use a combination of both physical and virtual
Suggest starting with virtual vBS
– determine if there are any “extenuating” circumstances
• For example, network infrastructure contraints requiring use of SAN data transfers
Physical vBS’s could be used temporarily for initial phase-in or DR situations
Resource utilization impact:
– Biggest impact from vBS will be on datastore i/o
• if TSM client-side deduplication is used, then CPU is also a consideration
– Datastore i/o utilization will generally be comparable between physical and virtual vBS
• therefore, ESX host resource utilization should not drive the decision
20
© 2013 IBM Corporation
Simplified estimation for number of vBS’s
Simplified estimation technique:
– One vBS per 20 - 35TB of source data to backup
• Suggested minimum one vBS per DRS cluster (except for “small” clusters)
– This includes several factors:
• Incremental backups
• Occasional full backups (new VM’s and “CBT resets”)
• Occasional image restores
• Initial phase-in contingency
• DR restore contingency
– Assumptions (not all are listed: refer to appendix for more details):
• 2% daily change rate
• *Throughput: 20GB/hour incremental, 40GB/hour full
• 10 hour backup window
NOTE: *Throughput is for example only; benchmarking is strongly recommended
21
© 2013 IBM Corporation
Example vStorage Backup Servers sizing (virtual)
22
Vcente
r S
erv
er
Cluster
Cluster
Cluster
Cluster
Cluster
D
S D
S
D
S D
S D
S
TSM Server/s
Deduplicating
VTL
VMMC
VM
MC
CTL
Datastore SAN
VTL SAN
I
Incremental backups ONLY:
• 800GB/Hour Aggregate
• 20GB/Hour per vBS
• 1 proxy per cluster (20)
With contingency:
• 250GB/hour per vBS
Virtual vBS
(1 of 20)
261TB * 80% = 209TB
Pro
d.
D.C
. D
ev.
D.C
.
© 2013 IBM Corporation
Notes: Example vBS Sizing
20 vBS’s are suggested
– Initial full backup phase-in and DR restore “contingency” is included
– Incremental backup workload is very low per vBS (20GB/hour)
Without contingency:
– 10 vBS’s are suggested
– However “Hotadd” transport is not available for all clusters
• 10 vBS’s for 20 clusters
– “Hotadd” can provide improved efficiency and throughput
23
© 2013 IBM Corporation
Example of physical vBS
24
Vcente
r S
erv
er
Cluster
Cluster
Cluster
Cluster
Cluster
D
S D
S
D
S D
S D
S
TSM Server/s
Deduplicating
VTL
TSM
STG
Pool
Datastore SAN
VTL SAN
SAN Transport
Physical
vBS’s
TSM Lanfree to VTL
© 2013 IBM Corporation 25
SCHEDULING
© 2013 IBM Corporation
Scheduling of backups: Factors to consider
Schedule “Scope”
Schedule exceptions to scope
Number of backup sessions (parallel sessions per “datamover”)
26
© 2013 IBM Corporation
Scheduling of Backups: Schedule Scope and Exceptions
Schedule scope:
– Schedule definition is simplified if schedule “scope” corresponds to management class
– VM, ESX host/s, folder, cluster, datastore
– Determined by TSM option “Domain.vmfull” (vm, vmhost, vmfolder, vmhostcluster, vmdatastore)
Schedule exceptions to scope:
– VM’s with in-guest application backups (e.g., TDP’s)
– Vmdk exceptions (for example, exclude application disk/s)
– Special retention or storage pool requirements
– “Exception” VM’s can be excluded and backed up with special schedule
27
© 2013 IBM Corporation
Scheduling: What if VM’s are not organized by backup scope?
Scheduling becomes more complex if it needs to be done “ad hoc”
If VM’s cannot be organized by schedule scope:
– Consider the use of backup scheduling tools:
• For example Tivoli Workload Scheduler
• Tivoli lab services “custom scheduling tool”
28
© 2013 IBM Corporation
Number of backup sessions
Number of backup sessions (parallel sessions per “datamover”)
– Assume one “datamover” process per proxy
• V6.4 provides multiple, parallel sessions per single datamover process
• VMMAXPARALLEL setting in dsm.opt
– Only on exception basis do you need to consider more than one
• For example, separate datamovers for different backup retention policies
29
© 2013 IBM Corporation 30
SUPPORT FOR RESTORE AND RECOVERY
© 2013 IBM Corporation
Support for restore and recovery
Define requirements for restore of individual file data and recovery of VM’s:
– Frequency of file-level restore
– Frequency of recovery of full VM images
– Disaster recovery requirements:
• How many VM’s, how long to restore
Define organizational responsibility for restores
– Who owns file-level restores?
• Central team, or self-service?
• Determine if Recovery Agent needs to be installed in-guest
– Who owns VM image restores?
• VMware team?
• Backup team?
31
© 2013 IBM Corporation
Support for restore: access permissions
Permissions can be controlled by:
– vSphere role/permissions
• For plug-in GUI access, on a per vSphere admin login basis
• For TSM client access, on a per datamover basis (one VCenter login per datamover)
– TSM node ID
Determine access permissions for restores
– Image restores can be controlled by vSphere roles when using plug-in GUI
• VM read-only vs. write/create
Access to TSM VM image backup data
– control by TSM node assignment
– set access client command: “set access backup –TYPE=VM <vmname> <nodename>”
– most common use is for “self-serve” file-level restore scenarios
32
© 2013 IBM Corporation
TSM node relationships for datamovers
33
TSM
Datacenter
Node
DC1
TSM Datamover
Node
DC1_DM1
TSM Datamover
Node
DC1_DM2
TSM Datamover
Node
DC1_DM2
Datamovers
use “asnode”
to
Datacenter
Node
VM’s backed
up as
“filespaces”
© 2013 IBM Corporation
TSM node relationships for limiting restore access
34
TSM
Datacenter
Node
DC1
TSM “user” Node
vm1_user
TSM “user” Node
vm2_user
TSM “user” Node
vm3_user
Restore access to
individual user node id’s Using “set access”
VM’s backed
up as
“filespaces”
Set access backup
–type=VM vm1
vm1_user
Set access backup
–type=VM vm2
vm2_user
Set access backup –
type=VM vm3
vm3_user
© 2013 IBM Corporation
Disaster recovery planning
VM Image recovery requirements:
– Determine which VM’s are most critical and need fastest recovery
• Define recovery time objectives: how many VM’s, how long to restore
• Evaluate peak-load capacity of infrastructure, TSM server, storage devices
– Establish a plan for “bootstrapping” environment at recovery site
• How to recover vBS
• How to recover vCenter Server (for example, direct restore to ESXi host)
Alternate site recovery requirements
– Determine if VM’s need to be restored to an alternate physical datacenter
– Define TSM server replication requirements
• TSM node replication (active/active TSM servers)
• Hardware storage replication: VTL, Disk (active/passive TSM servers)
• Offsite copy storage pools
35
© 2013 IBM Corporation 36
VBS SIZING (APPENDIX)
© 2013 IBM Corporation
Step by step guide to TSM for Virtual Environments 6.4 vBS sizing
Condensed version of document on IBM Developerworks
https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Stora
ge%20Manager/page/Guide%20to%20vStorage%20Backup%20Server%20%28Proxy%29
%20Sizing
Disclaimer:
– Throughput numbers that are contained in this document are intended to be used for
estimation of vBS host sizing. Actual results are environment and configuration
dependent and may vary significantly. Users of this document should verify the
applicable data for their specific environment.
37
© 2013 IBM Corporation
Example environment:
38
© 2013 IBM Corporation
Steady State Daily Workload Assumptions
39
© 2013 IBM Corporation
Steady State Incremental Backup Workload Calculation
40
© 2013 IBM Corporation
Steady State Full Backup Workload Calculation
41
© 2013 IBM Corporation
Steady State Full Restore Workload Calculation
42
© 2013 IBM Corporation
Initial Full Backup Phase-In Calculation
43
© 2013 IBM Corporation
Calculate Peak VM Image Restore Workload
44
© 2013 IBM Corporation
Non-Deduplication Example With Initial Phase-In and Peak Restores
45
© 2013 IBM Corporation
Calculate Number of VBS Hosts Required: Non-Deduplication
46
© 2013 IBM Corporation
Non-Deduplication Example Excluding Initial Phase-In and Peak Restores
47
© 2013 IBM Corporation
Calculate Number of VBS Hosts Required: Non-Deduplication
48