VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

53
VMware vSphere Data Protection (VDP) Technical Deep Dive And Troubleshooting Session Darryl Hing, VMware Canada Jacy Townsend, VMware BCO4756 #BCO4756

description

VMworld 2013 Darryl Hing, VMware Canada Jacy Townsend, VMware Learn more about VMworld and register at http://www.vmworld.com/index.jspa?src=socmed-vmworld-slideshare

Transcript of VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

Page 1: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

VMware vSphere Data Protection (VDP) Technical

Deep Dive And Troubleshooting Session

Darryl Hing, VMware Canada

Jacy Townsend, VMware

BCO4756

#BCO4756

Page 2: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

2

Agenda

What Is VDP?

Concepts

Gathering the log bundle

Log Analysis

Backup Best Practices

Commands

Resources

Page 3: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

3

Overview

File and image level; Full and incremental backups .

Variable Length Block Deduplication

Page 4: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

4

Overview

Replacement for VDR 1 Optimized for Virtual

Advanced Dedupe 3 Backup and Recovery 4

2

Page 6: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

6

Key Features

Up to 100 VMs per appliance

100 VMs

Up to 8 TB of De-duplicated backup data

capacity per appliance

8TB Dedupe

Up to 10 VDP virtual appliances are supported per

vCenter

10 appliances

Page 7: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

7

Key Features

Powered by EMC

Avamar

Bundled with

vSphere 5.1

Essentials & +,

Standard, Enterprise

& Enterprise Plus

Variable Length

Dedupe

Page 9: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

9

Important URLs

Configuration: https://<VDP_IP>:8543/vdp-configure

Management URL: https://<vCenter_IP>:9443/vsphere-client

FLR Portal: https://<VDP_IP>:8543/flr

Default Credentials: root/changeme

Page 10: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

10

Agenda

What Is VDP?

Concepts

Gathering the log bundle

Log Analysis

Backup Best Practices

Commands

Resources

Page 11: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

11

Terminologies – General Backup

VMware ESXi &

ESX

SNAPSHOT

Snapshot: Preserve state of VM at

point in time including power state.

Full Backup: Complete backup of

VM.

Full Backup Differential

Differential: Files changed since

last FULL backup.

4

Incremental: Files changed since

last backup. 4

Incremental

5

File Level Restore (FLR): Restore

files individually. 5

1

2

3

1

2 3

Page 12: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

12

Terminologies – Backup Types

Full Backup

Cumulative or

Differential

Incremental

Full Cumulative Incremental

Page 13: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

13

Terminologies – VMware Specific

CBT: Identifies disk sectors

altered.

Microsoft VSS: Automatic or

manual backups and

snapshots of data.

Quiescing: Pause or alter

running processes that can

modify disk during backup.

Steady State: When data

being imported to the dedupe

store is less or equal to the

amount of data being pruned

Page 14: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

14

Terminologies - RPO & RTO

Recovery Time Objective (RTO) – How quickly you need to have applications back

up and running after downtime.

Recovery Point Objective (RPO) – Point to which data must be restored to

successfully resume work.

RTO RPO

Major Incident Last backup Backup Data Restored

Page 15: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

15

Terminologies - Deduplication

VM-A

A B C

D E F

1 2 3

VM-B

F E D

3 1 2

C A B

Source

Object

Pointers

Data

Compression

A B C D E F 1 3 2

X Y Z

Identify duplicate or

redundant data

Only unique data is

stored

Saves pointers

instead of multiple

copies

Consumes less

disk space

Page 16: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

16

Backup Process

Sticky Byte

Factoring

Compression

Hashing

Store hash and

Data on GSAN

#

1 2 3

4

1

2

3

4

Page 17: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

17

Sticky Byte Algorithm

Data chunks average size is 24kB

Data chunks vary in size between 1 and 64kB

10000000001000000000

00100000000000000000

10010001000001000001

10101010001010001010

10kB 25kB 5kB

40kB

10000000000000000000

00110000000000000000

10010001000001000001

10101010001010001010

5kB 10kB 25kB

40kB

First Backup

Subsequent Backup – Change in VM

Page 18: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

18

Terminologies – Compression

Chunks are compressed

to 30% - 50% there

original size

Average compressed

chunk size 12 kB – 16kB

Compression occurs

when we can achieve

=>25% compression 2kB 1kB 5kB 8kB

10kB 25kB 5kB

40kB

Page 19: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

19

Terminologies – Hashing

Hashing continues until

a single root hash for the

backup is created

Atomic hashes

are combined to

create composites.

The hash created from

each data object is

called an atomic hash.

Data is used to create

the hash, but it is not

converted into the hash

1

2

3

4

Page 20: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

20

VMware Backup History

VDP

2013 -> TBA

VDR

2009 -> TBA

VCB

2006 - 2010

Page 21: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

21

Agenda

What Is VDP?

Concepts

Gathering the log bundle

Log Analysis

Backup Best Practices

Commands

Resources

Page 22: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

22

Log Procurement

Open the VDP configure

URL

Click “Collect Logs”

Name appropriately

Page 23: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

23

How To Scope a VDP Issue

Who? 1 What?

When? 3 Where? 4

2

Page 24: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

24

Core Services

Scheduler

/usr/local/avamar/var/mc/server_log/mcserver.log

MCS

Worker Thread

/usr/local/avamarclient/var-proxy-N/avagent.log

AvaAgent

VMware API Module

/usr/local/avamarclient/var-proxy-N/<Jobname>-

<EPOCH>-vmimage[w|l].log

AvVcbImage

Page 25: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

25

Core Services

Deduplication and Compression

/usr/local/avamarclient/var-proxy-N/

<Jobname>-<EPOCH>-vmimage[w|l]_avtar.log AvTar

Storage

/data01/cur/gsan.log

GSAN

Page 26: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

26

Log Locations

• /usr/local/avamar/var/vdr/server_logs/vdr-configure.log

Installation

• /usr/local/Avamar/var/avi/server_log/avinstaller.log*

• /usr/local/Avamar/var/avi/server_log/AvamarInstallSles*.log

Configuration

Page 27: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

27

Log Locations

• /usr/local/avamar/var/mc/server_log/mcserver.log*

• /usr/local/avamar/var/vdr/server_logs/vdr-server*

• /usr/local/avamar/var/log/dpnctl.log*

• /usr/local/avamarclient/var-proxy-N/avagent*.log

• /usr/local/avamarclient/var-proxy-N/<jobname>-<EPOCH>-

vmimage[w|l].log

• /usr/local/avamarclient/var-proxy-N/<jobname>-<EPOCH>-

vmimage[w|l]_avtar.log

• /data01/cur/gsan.log

Backup and Restore

Page 28: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

28

Log Locations

• /usr/local/avamar/var/flr/server_log/flr-server.log

• /usr/local/avamarclient/bin/logs/FlrMerged.log

• /usr/local/avamarclient/bin/logs/VmwareFlr.log

• /usr/local/avamarclient/bin/logs/VmwareFlrWs.log

File Level Restore(FLR)

Page 29: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

29

ALG File – About the Job

<proxyDirectives>

<flag type="string" value="vm-221" name="vm_moref" />

<flag type="string" value="Windows Server 2008 R2"

name="guest_fullname" />

<flag type="string" value=“VDPTest" name="vmname" />

<flag type="string" value="[VMStore1] VDPTest/VDPTest.vmx"

name="vmx_path" />

<flag type="string" value="/VDP_Lab" name="vmware_datacenter" />

<flag type="string" value="192.168.8.31" name="esxserver" />

<flag type="string" value="192.168.8.43" name="vmware_server" />

</proxyDirectives>

ALG File

Page 30: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

30

LOG File – About the Process

2013-03-05 01:03:37 avvcbimage Info <9754>: VDDK IO

102400.00 MB, Performance: 297.5 MB/minute, Duration:

05:44:15

2013-03-04 16:38:53 avvcbimage Warning <14654>: The in-use

blocks (pass 1) could not be found for 'VDP-

136243273610203b57a3b4bb8946f82f4a78bdb8e0d0da870a', using

disk extents.

2013-03-05 01:09:25 avvcbimage Error <9769>: Timeout on wait

for spawned avtar process to complete

2013-03-05 01:09:25 avvcbimage FATAL <16018>: The datastore

information from VMX '[VMStore1]

VDP_Protected_VM/VDP_Protected_VM.vmx' will not permit a

restore or backup.

LOG File

Page 31: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

31

Finding the Work Order Logs Quickly

# cd /usr/local/avamarclient/var-proxy-3

# IFS=$(echo -en "\n\b");for i in `ls *.alg`;do grep -m 1 " START" $i | rev | awk

'{print $4" "$5}' | rev;grep vmname $i|awk -F\" '{print $4}';echo

$i;echo;done;unset IFS

2013-03-04 16:32:14

VM_Name_1

Daily 5 Day Retention-1362432700504-

618a82a5277ebb1dd536b018a407a21582926e6a-3016-vmimagew.alg

2013-03-05 16:07:30

VM_Name_1

Daily 5 Day Retention-1362517629476-

6acb4658af622ac48a52d73247aad95b1887af7c-3016-vmimagew.alg

Finding Work Orders

Page 32: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

32

Scenario 1

• /usr/local/avamar/var/mc/server_log/mcserver.log*

• /usr/local/avamar/var/vdr/server_logs/vdr-server*

• /usr/local/avamar/var/log/dpnctl.log*

• /usr/local/avamarclient/var-proxy-N/avagent*.log

• /data01/cur/gsan.log

Logs

Page 33: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

33

Scenario 1

2013-03-05 23:01:35 avvcbimage Info <16001>: Found 1 disk(s),

0 snapshots, and 1 snapshot ctk files, on the VMs datastore.

2013-03-05 23:01:35 avvcbimage Warning <16002>: Too many extra

snapshot files (1) were found on the VMs datastore. This can

cause a problem for the backup or restore.

2013-03-05 23:01:35 avvcbimage FATAL <16018>: The datastore

information from VMX '[VMStore1]

VDP_Protected_VM/VDP_Protected_VM.vmx ' will not permit a

restore or backup.

2013-03-05 23:01:35 avvcbimage Info <0000>: Starting graceful

(staged) termination, Too many pre-existing snapshots will not

permit a restore. (wrap-up stage)

2013-03-05 23:01:35 avvcbimage Error <9759>: createSnapshot:

snapshot creation failed

LOG File

Page 34: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

34

Scenario 2

$grep "Node restarted" ./data01/cur/err.log

2013/02/26-17:52:38.81009 {P0.0} [gsan] <0017> Node restarted

When?

2013/02/26-17:52:35.07740 {0.0} [strtask.6:3281] <0055> checkpoint

cp.20130223140423 3300 out of 3590 stripes complete

2013/02/26-17:52:36.21084 {0.0} [perfbeat.0:273] WARN:

<0963> server node 0.0 is swapping: check configuration

2013/02/26-17:52:38.81009 {P0.0} [gsan] <0017> Node restarted

Why?

Page 35: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

35

Scenario 2 – Successful Checkpoint Sample

2013/02/27-14:18:54.19296 {0.0} [manage:196] <0054>

checkpoint cp.20130227141853 started

2013/02/27-14:18:58.14928 {0.0} [strtask.1:3247] <0055>

checkpoint cp.20130227141853 300 out of 3595 stripes

complete

2013/02/27-14:19:00.72912 {0.0} [strtask.2:3483] <0055>

checkpoint cp.20130227141853 600 out of 3595 stripes

complete

<SNIP>

2013/02/27-14:19:27.42271 {0.0} [manage:2746] <0056>

checkpoint cp.20130227141853 completed

2013/02/27-14:19:27.50773 {0.0} [sched.cp:3263] <4301>

completed checkpoint maintenance

/data01/cur/err.log

Page 36: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

36

Scenario 3 – Storage Performance

2013/01/24-01:09:47.04134 {0.0} [perfbeat.7:197] WARN:

<1060> perfbeat::outoftolerance mask=[backup,restore]

average=2191.09 limit=219.1092 mbpersec=0.04

/data01/cur/gsan.log

Page 37: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

37

Scenario 2

#grep perfbeat /data01/cur/err.log |

awk '{print $1"="$10}' | awk -F= '{print $1" - "$3}'

2013/02/18-13:16:05.93532 - 10.95

2013/02/18-13:19:40.12223 -

2013/02/18-13:20:44.07831 - 25.40

Performance Data

2013/02/18-13:19:40.12223 {0.0} [perfbeat.0:218] WARN:

<0963> server node 0.0 is swapping: check configuration

Swapping

Page 38: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

38

What Next?

Review the monitor logs (vmware.log) at the time of the incident

for both the VDP appliance and the target VM. 1

Review the vCenter logs at the time logs at the time of the incident 2

Review the ESX logs (hostd/vmkernel) at the time of the incident. 3

Page 39: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

39

Agenda

What Is VDP?

Concepts

Gathering the log bundle

Log Analysis

Backup Best Practices

• Troubleshooting

• Administration

Commands

Resources

Page 40: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

40

Should only be used to resume

daily backups. Should not be

used as a workaround except in

extreme conditions.

Backup Best Practices - Troubleshooting

Redploy VDP

Define:

Who , What , When

Where and WHY

SCOPE - W5

Understand how the product

works and which modules

communicate with other modules.

Communications

Page 41: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

41

Plan your deployment

Backup Best Practices – Administration

Plan

Ensure your storage

infrastructure can handle

the capacity and load.

Always use HCL

hardware

Storage

Separate and group the

workload between

appliances, or

deduplication stores

Separate

Page 42: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

42

Check backups regularly,

do not set and forget

Backup Best Practices – Administration

Set And Forget

Think about single points

of failure and consider

correcting these

conditions.

Single Points Of Failure

At => 60% space

utilization be mindful of

storage consumption.

Consumption

Page 43: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

43

Limit on-demand backups

during the maintenance

window

Backup Best Practices – Administration

On Demand Backups

Avoid initiating

on-demand maintenance

activities (CP, CP

Validation, or GC)

On Demand Maintenance

Page 44: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

44

Backup Best Practices – Administration

• Check the status of the deduplication

store. (Checkpoints)

• Check the status of the backup

subsystems.

• Review any failed backups.

Weekly

• Test restore plan. Ensure business

continuity.

• Review and correct any new trends.

• Review storage performance, and

storage growth. Monthly / Quarterly

Page 45: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

45

Agenda

What Is VDP?

Concepts

Gathering the log bundle

Log Analysis

Backup Best Practices

Commands

Resources

Page 46: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

46

Commands - MCCLI

root@vdp:~/#: mccli server show-prop

State Full Access

Total capacity 535.7 GB

Capacity used 1.7 GB

Server utilization 0.3%

Bytes protected 10.0 GB

Time since Server initialization 21 days 21h:48m

Last checkpoint 2013-03-27 11:26:37 PDT

Last validated checkpoint 2013-03-27 11:26:37 PDT

System Name vdp.vdp.lab

IP address 192.168.2.99:26000

show-prop

Page 47: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

47

Commands - MCCLI

root@vdp:~/#: mccli server show-services

Name Status

-------------------------------- ---------------------------

Hostname vdp.vdp.lab

IP Address 192.168.2.99

Load Average 0.97

Last Admin Datastore Flush 2013-04-18 07:45:00 PDT

PostgreSQL database Running

192.168.2.103 All vCenter connections OK.

show-services

Page 48: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

48

Agenda

What Is VDP?

Concepts

Gathering the log bundle

Log Analysis

Backup Best Practices

Commands

Resources

Page 49: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

49

VMware Backup History - VDP

References

• Datasheet: http://www.vmware.com/files/pdf/products/vsphere/VMware-vSphere-with-Operations-

Management-Datasheet.pdf

• Admin Guide: http://www.vmware.com/files/pdf/products/vsphere/VMware-vSphere-Data-Protection-

Administration-Guide.pdf

• VDDK Guide: https://www.vmware.com/support/developer/vddk/vddk-511-releasenotes.html

Page 50: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

50

Other VMware Activities Related to This Session

HOL:

HOL-SDC-1305

Business Continuity and Disaster Recovery In Action

Group Discussions:

BCO1002-GD

Data Protection and Backup with Jeff Hunter

BCO4756

Page 51: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

THANK YOU

Page 52: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session
Page 53: VMworld 2013: vSphere Data Protection (VDP) Technical Deep Dive and Troubleshooting Session

VMware vSphere Data Protection (VDP) Technical

Deep Dive And Troubleshooting Session

Darryl Hing, VMware Canada

Jacy Townsend, VMware

BCO4756

#BCO4756