IBM Tivoli Storage Manager for Virtual Environments-Data ... · PDF fileTSM Symposium 2013:...

TSM Symposium 2013: Tivoli Storage Manager: Future Expectations – Vendor Talks

17.-20. September 2013, Hilton Hotel Gendarmenmarkt, Berlin, Germany

1

IBM Tivoli Storage Manager for Virtual Environments-Data Protection for VMware:

Solution Design

Dan Wolfe

Tivoli Storage SWAT

© 2013 IBM Corporation 2

Agenda: The four “S’s” of solution Design for DP for VMware

Strategizing (Planning)

Sizing

Scheduling

Support (for recovery/restore procedures)

NOTE: Familiarity with concepts of TSM for Virtual Environments is pre-req: other sessions at the TSM

Symposium provide TSM for VE concepts

© 2013 IBM Corporation

Or……

Simple

Steps to a

Successful

System

Solution

3


STRATEGIZING


Strategizing: Key areas

Transition from in-guest backup

vSphere architecture: vCenters, Datacenters, Clusters, etc.

Backup storage device selection (e.g., disk, file, tape, VTL)

6


Example environment description

7

Parameter Value Units

Utilized Data 400000 GB

Total VM’s 4000 Count

Total ESX Hosts 100 Count

Total Clusters 20 Count

Daily Data Change Rate 2% Pct

Backup Window 10 Hours

Days Retention Dev: 7, Prod: 30 Days


Transitioning from legacy in-guest backup: Differences

Not available with legacy backup:

– Full image AND file-level restore from single backup image

– One step full image restore (NOTE: TBMR can provide this with legacy in-guest)

– Centralized file-level restore capability using DP for VMware “mount”

– Eliminate CPU and i/o workload on VM guest during backup

– Efficient, block-level backup

– Automatic detection of new VM’s

• Individual VM registration not required for backup

– Agentless

– Queries/reports are filespace based (not node based)

8


Transitioning from legacy in-guest backup: Differences

Not available with VM image backup

– Individual file exclude/include capability

• However, VM image backup can exclude/include specific vmdk’s

– Customized retention for files within a VM guest

– Version selection for file-level restore (across all backups)

9


Transitioning from legacy in-guest backup : Key planning elements

Initial phase-in of full backups:

– Establish timeline requirement for initial phase-in: e.g., how many weeks?

• vBS (vStorage Backup Server) sizing must include phase-in

• Temporary vBS’s for phase-in can be used

What to do with down level VM’s?

– ESX/i V3.5 does not have CBT (Change Block Tracking)

– Consider continuing with in-guest backup until ESXi hosts are upgraded

Determine infrastructure requirements: image backup vs. legacy file-level

– Daily incremental backup amount will be similar, with some “inflation”:

• Disk-block (image) backup vs. file-level backup: some “rounding” will occur

• No ability to exclude individual files: O/S files, swap files

– Network infrastructure must be capable of workload

– TSM server/s must be capable of workload

Application backups: Use in-guest backup only, or combine with VM image backup?

10


vSphere architecture and TSM for Virtual Environments

Collaboration between VMware architects and storage/backup architects is important

Consider how vSphere architecture corresponds to backup requirements

– Architecture of vSphere environment can facilitate backup scheduling and policies

– For example:

• VM’s with similar retention requirement grouped within the same cluster

– Other groupings can be used:

• “Folders”

• Similar VM names (using v6.4 wildcard specification in backup schedule)

11


vSphere architecture and TSM for Virtual Environments (2)

Distribution of VM storage capacity will help to determine vBS placement

– Is capacity evenly distributed across clusters?

• Evenly distributed sizes result in more evenly distributed vBS’s

• This is usually not possible

– For example, clusters with larger amounts of storage may require multiple vBS’s

– Placement of vBS’s important to balance workload

Identify network infrastructure for backup/restore (backup vs. production)

– Using the right network interface requires planning and configuration

• This is critical factor when using NBD and LAN communication to TSM server

– VADP backup technology uses vmkernel port and Management Network

• Vlan should include TSM server when using LAN communication

Determine scope of vMotion and TSM node assignments

– Contain vMotion scope within a single TSM “Datacenter” node

12


Simplified example: vSphere Architecture

13

Vcente

r S

erv

er

Cluster

Cluster

Cluster

Cluster

Cluster

D

S D

S

D

S D

S D

S

Datastore SAN

261TB * 80% = 209TB

Pro

d.

D.C

. D

ev.

D.C

.

Production Clusters:

• 15 clusters

• 300TB

• 30 day retention

Dev/Test Clusters:

• 5 clusters

• 100TB

• 7 day retention


Backup storage pool device factors

Cost vs. restore performance

– Evaluate tradeoff between restore performance and storage costs

– Consider co-location by filespace (VM) with tape/VTL when possible

– Specific, critical VM’s can be configured as “exceptions” for management class

• Highest performance storage device for subset of VM’s

Deduplication strategy:

– TSM native

• NOTE: client-side deduplication will influence throughput estimates

– Appliance

– Hybrid:

• Subset of VM’s backup to TSM server instance with TSM deduplication

• Subset of VM’s backup to TSM server instance with deduplicating appliance

– Deduplication ratio: benchmark to ensure realistic estimate is used

14


SIZING


Sizing: Estimate the resource requirements

Objectives of sizing:

– Determine number of TSM server instances

– Determine capacity requirements for TSM storage pools

– Determine number of vBS’s (vStorage Backup Servers)

• Determine physical or virtual vBS’s

Key parameters to determine sizing:

– Utilized storage capacity of VM’s to backup (current and future)

– Retention requirements

– Throughput estimates for backups AND restores

– Backup window

16


Sizing: Key factors

Workload estimates:

– Daily incremental backups

– Daily full backups (newly created VM’s, and VM’s with CBT “reset”)

– Daily full VM images restores

– Initial full backup phase-in period (“contingency”)

– Disaster Recovery requirements (“contingency”)

Backup and restore throughput estimates:

– This is the most critical parameter for sizing

• And the most difficult to estimate

– Benchmarking is STRONGLY encouraged

– There is an upper limit to performance,

• determined by capabilities of infrastructure: laws of physics prevail!

– There is NO lower limit to performance: many factors can work against good

performance

17


Sizing: TSM server

Number of TSM Server instances required

– May be driven by any or all of the following:

– Capacity limits of a single TSM server instance (e.g., TSM DB size)

– Throughput limits of a single TSM server instance

– Peak load restore requirement

• Don’t forget this!

• DR restore requirements may drive the number of server instances required

– Organizational boundaries (e.g., separation of dev/test and production)

Storage pool sizing requirements:

– Total backup data

– Daily change rate of data (incremental backups)

– Retention requirements

– Expected deduplication ratio

• Key parameter, difficult to predict

• Use conservative estimate or benchmarking to avoid under-sizing

18


Sizing: Determine number (and placement) of vBS’s

Key factor for estimating number of vBS’s (vStorage Backup Servers):

– Throughput estimates:

• Full and incremental backup throughputs are typically different

• Backup and restore rates may be different

• Client-side deduplication will influence throughput rates (backup and restore)

Use of “contingencies”

– Contingency planning is important to ensure adequate capacity

• Initial phase-in

• DR restores

19


Determine physical or virtual vBS

Can use a combination of both physical and virtual

Suggest starting with virtual vBS

– determine if there are any “extenuating” circumstances

• For example, network infrastructure contraints requiring use of SAN data transfers

Physical vBS’s could be used temporarily for initial phase-in or DR situations

Resource utilization impact:

– Biggest impact from vBS will be on datastore i/o

• if TSM client-side deduplication is used, then CPU is also a consideration

– Datastore i/o utilization will generally be comparable between physical and virtual vBS

• therefore, ESX host resource utilization should not drive the decision

20


Simplified estimation for number of vBS’s

Simplified estimation technique:

– One vBS per 20 - 35TB of source data to backup

• Suggested minimum one vBS per DRS cluster (except for “small” clusters)

– This includes several factors:

• Incremental backups

• Occasional full backups (new VM’s and “CBT resets”)

• Occasional image restores

• Initial phase-in contingency

• DR restore contingency

– Assumptions (not all are listed: refer to appendix for more details):

• 2% daily change rate

• *Throughput: 20GB/hour incremental, 40GB/hour full

• 10 hour backup window

NOTE: *Throughput is for example only; benchmarking is strongly recommended

21


Example vStorage Backup Servers sizing (virtual)

22

Vcente

r S

erv

er

Cluster

Cluster

Cluster

Cluster

Cluster

D

S D

S

D

S D

S D

S

TSM Server/s

Deduplicating

VTL

VMMC

VM

MC

CTL

Datastore SAN

VTL SAN

I

Incremental backups ONLY:

• 800GB/Hour Aggregate

• 20GB/Hour per vBS

• 1 proxy per cluster (20)

With contingency:

• 250GB/hour per vBS

Virtual vBS

(1 of 20)

261TB * 80% = 209TB

Pro

d.

D.C

. D

ev.

D.C

.


Notes: Example vBS Sizing

20 vBS’s are suggested

– Initial full backup phase-in and DR restore “contingency” is included

– Incremental backup workload is very low per vBS (20GB/hour)

Without contingency:

– 10 vBS’s are suggested

– However “Hotadd” transport is not available for all clusters

• 10 vBS’s for 20 clusters

– “Hotadd” can provide improved efficiency and throughput

23


Example of physical vBS

24

Vcente

r S

erv

er

Cluster

Cluster

Cluster

Cluster

Cluster

D

S D

S

D

S D

S D

S

TSM Server/s

Deduplicating

VTL

TSM

STG

Pool

Datastore SAN

VTL SAN

SAN Transport

Physical

vBS’s

TSM Lanfree to VTL


SCHEDULING


Scheduling of backups: Factors to consider

Schedule “Scope”

Schedule exceptions to scope

Number of backup sessions (parallel sessions per “datamover”)

26


Scheduling of Backups: Schedule Scope and Exceptions

Schedule scope:

– Schedule definition is simplified if schedule “scope” corresponds to management class

– VM, ESX host/s, folder, cluster, datastore

– Determined by TSM option “Domain.vmfull” (vm, vmhost, vmfolder, vmhostcluster, vmdatastore)

Schedule exceptions to scope:

– VM’s with in-guest application backups (e.g., TDP’s)

– Vmdk exceptions (for example, exclude application disk/s)

– Special retention or storage pool requirements

– “Exception” VM’s can be excluded and backed up with special schedule

27


Scheduling: What if VM’s are not organized by backup scope?

Scheduling becomes more complex if it needs to be done “ad hoc”

If VM’s cannot be organized by schedule scope:

– Consider the use of backup scheduling tools:

• For example Tivoli Workload Scheduler

• Tivoli lab services “custom scheduling tool”

28


Number of backup sessions

Number of backup sessions (parallel sessions per “datamover”)

– Assume one “datamover” process per proxy

• V6.4 provides multiple, parallel sessions per single datamover process

• VMMAXPARALLEL setting in dsm.opt

– Only on exception basis do you need to consider more than one

• For example, separate datamovers for different backup retention policies

29


SUPPORT FOR RESTORE AND RECOVERY


Support for restore and recovery

Define requirements for restore of individual file data and recovery of VM’s:

– Frequency of file-level restore

– Frequency of recovery of full VM images

– Disaster recovery requirements:

• How many VM’s, how long to restore

Define organizational responsibility for restores

– Who owns file-level restores?

• Central team, or self-service?

• Determine if Recovery Agent needs to be installed in-guest

– Who owns VM image restores?

• VMware team?

• Backup team?

31


Support for restore: access permissions

Permissions can be controlled by:

– vSphere role/permissions

• For plug-in GUI access, on a per vSphere admin login basis

• For TSM client access, on a per datamover basis (one VCenter login per datamover)

– TSM node ID

Determine access permissions for restores

– Image restores can be controlled by vSphere roles when using plug-in GUI

• VM read-only vs. write/create

Access to TSM VM image backup data

– control by TSM node assignment

– set access client command: “set access backup –TYPE=VM <vmname> <nodename>”

– most common use is for “self-serve” file-level restore scenarios

32


TSM node relationships for datamovers

33

TSM

Datacenter

Node

DC1

TSM Datamover

Node

DC1_DM1

TSM Datamover

Node

DC1_DM2

TSM Datamover

Node

DC1_DM2

Datamovers

use “asnode”

to

Datacenter

Node

VM’s backed

up as

“filespaces”


TSM node relationships for limiting restore access

34

TSM

Datacenter

Node

DC1

TSM “user” Node

vm1_user

TSM “user” Node

vm2_user

TSM “user” Node

vm3_user

Restore access to

individual user node id’s Using “set access”

VM’s backed

up as

“filespaces”

Set access backup

–type=VM vm1

vm1_user

Set access backup

–type=VM vm2

vm2_user

Set access backup –

type=VM vm3

vm3_user


Disaster recovery planning

VM Image recovery requirements:

– Determine which VM’s are most critical and need fastest recovery

• Define recovery time objectives: how many VM’s, how long to restore

• Evaluate peak-load capacity of infrastructure, TSM server, storage devices

– Establish a plan for “bootstrapping” environment at recovery site

• How to recover vBS

• How to recover vCenter Server (for example, direct restore to ESXi host)

Alternate site recovery requirements

– Determine if VM’s need to be restored to an alternate physical datacenter

– Define TSM server replication requirements

• TSM node replication (active/active TSM servers)

• Hardware storage replication: VTL, Disk (active/passive TSM servers)

• Offsite copy storage pools

35


VBS SIZING (APPENDIX)


Step by step guide to TSM for Virtual Environments 6.4 vBS sizing

Condensed version of document on IBM Developerworks

https://www.ibm.com/developerworks/community/wikis/home?lang=en#!/wiki/Tivoli%20Stora

ge%20Manager/page/Guide%20to%20vStorage%20Backup%20Server%20%28Proxy%29

%20Sizing

Disclaimer:

– Throughput numbers that are contained in this document are intended to be used for

estimation of vBS host sizing. Actual results are environment and configuration

dependent and may vary significantly. Users of this document should verify the

applicable data for their specific environment.

37


Example environment:

38


Steady State Daily Workload Assumptions

39


Steady State Incremental Backup Workload Calculation

40


Steady State Full Backup Workload Calculation

41


Steady State Full Restore Workload Calculation

42


Initial Full Backup Phase-In Calculation

43


Calculate Peak VM Image Restore Workload

44


Non-Deduplication Example With Initial Phase-In and Peak Restores

45


Calculate Number of VBS Hosts Required: Non-Deduplication

46


Non-Deduplication Example Excluding Initial Phase-In and Peak Restores

47


Calculate Number of VBS Hosts Required: Non-Deduplication

48

IBM Tivoli Storage Manager for Virtual Environments-Data ... · PDF fileTSM Symposium 2013:...

Documents

Transcript of IBM Tivoli Storage Manager for Virtual Environments-Data ... · PDF fileTSM Symposium 2013:...