PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect...

98
Rob Girard Principal Technical Marketing Engineer Shawn Meyers SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed Lab Findings VMworld 2017 Content: Not for publication or distribution

Transcript of PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect...

Page 1: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Rob GirardPrincipal Technical Marketing Engineer

Shawn MeyersSQL Server Principal Architect

PBO3350BUS

#VMworld #PBO3350BUS

Snapshots and SQL Server - Technical Deep Dive & Detailed Lab Findings

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 2: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Rob Girard

Sr. Technical Marketing Engineer

PBO3350BUS:

Snapshots and SQL Server - Technical Deep Dive & Detailed Lab Findings

Shawn Meyers

SQL Server Principal Architect

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 3: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

33

About Shawn

© 2017 Tintri, Inc. All Rights Reserved.

Shawn Meyers

• SQL Server Principal Architect, practice lead

• Experience in VMware, Microsoft, SQL Server, storage infrastructure, performance tuning.

• Working in IT since 1992, SQL Server since 1996, VMware since 2009

@1dizzygoose linkedin.com/in/shawnmeyers42

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 4: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

44

About Rob

© 2017 Tintri, Inc. All Rights Reserved.

Rob Girard

• Principal Technical Marketing Engineer @ Tintri as of Jan, 2014

• Working in IT since 1997 with >10 years of VMware experience

• vExpert, VCAP4/5-DCA, VCAP4-DCD, VCP2/4/5, MCSE, CCNA AND TCSE

@robgirard www.linkedin.com/in/robgirard

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 5: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Introduction

5

Met at SQL Elite Workshop, hosted by VMware and Tintri [April 2015]

Partnered to share expertise with different aspects of virtualization

Delivered VAP6433 Group Discussion session @ VMworld 2015

This session summarizes the research & lab behind that session

For those who want a closer look under the covers

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 6: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Scope-creep was winning….

Scope of session& Testing

6

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 7: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• Safe?

• Not Safe!

• Not Safe?

• Safe!

• …..

The Great Debate – Application vs Crash

7

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 8: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Agenda© 2016 Tintri, Inc. All Rights Reserved.

8

Explain types of

snapshots

SQL Server

backup and

reovery

Factors Impacting

Snapshots

Testing setup RecommendationsTesting Results

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 9: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Crash consistent vs application consistent

Crash Consistent

• Same concept as if pulling the power plug out of the back of the server

• SQL Server recovery can take longer depending upon what the server was doing

when the crash occurred

Application Consistent

• SQL Server will be in the same state as an OS reboot

• SQL Server startup will be the same every time

• Flushes completed transactions

© 2016 Tintri, Inc. All Rights Reserved. 43

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 10: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Snapshot 101: Anatomy of a Snapshot

10

• Think of a snapshot as layers within PhotoShop

• Let’s use a quick visual aid to get us started…

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 11: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 12: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 13: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 14: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 15: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 16: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 17: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 18: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 19: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 20: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 21: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 22: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 23: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 24: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 25: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 26: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 27: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 28: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Snapshot Definition

25

Warning!!! – A snapshot is not a backup

• A snapshot is a point-in-time copy of the data

that represents an image

• Can be used to recover individual items to a full

server recovery and everything in between

• Basics is a metadata file and a collection of

pointer records for a point in time

© 2016 Tintri, Inc. All Rights Reserved.

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 29: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Their original intended purpose: Recovery

Patching

Cloning for various reasons:

• New server with a similar function

• Test/Dev/Staging Environments

• Non-Intrusive Recovery Verification

• Troubleshooting

• Moving Data

Snapshots – What can I use them for?

29© 2016 Tintri, Inc. All Rights Reserved.

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 30: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Done at the hypervisor level

VMware snapshots

https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015180

01

02

03

04

Includes the state and data of a virtual

machine (if memory is snapped)

Can have snapshot chains

Can have snapshot consolidation,

orphan snapshots, removal issues

© 2016 Tintri, Inc. All Rights Reserved. 27

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 31: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

SQL Server supports virtualization-aware backup solutions that use VSS (volume

snapshots). For example, SQL Server supports Hyper-V backup.

Virtual machine snapshots that do not use VSS volume snapshots are not supported by

SQL Server. Any snapshot technology that does a behind-the-scenes save of a VM’s

point-in-time memory, disk, and device state without interacting with applications on the

guest using VSS may leave SQL Server in an inconsistent state.

https://support.microsoft.com/en-us/kb/956893

Short version you need to quiesce for support from Microsoft

Microsoft SQL Server Support Statement on Snapshots

© 2016 Tintri, Inc. All Rights Reserved. 28

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 32: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Known as “Crash-Consistent” snapshots within Tintri VMstore

Storage array based snapshots

Don’t preserve VM state

Chaining is efficient, automatic, and user-proof

Very quick and imperceptible by the applications/users

Superior to a LUN snapshot as it is only the VM you want, not the churn

of everything else in the LUN

© 2016 Tintri, Inc. All Rights Reserved. 29

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 33: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

To quiesce or not to quiesce? That is the question!

© 2016 Tintri, Inc. All Rights Reserved. 30

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 34: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

To quiesce or not to quiesce? That is the question!

Windows servers use VSS

Process for bringing the OS into alignment for a proper backup

Flushes dirty pages (buffers) to disk

Primarily used for backups not for rollback snapshots

Can be used to call custom scripts – Freeze & Thaw

Performance & other implications? – Wait for test results!

Extremely minor risk of snapshot having database corruption

© 2016 Tintri, Inc. All Rights Reserved. 31

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 35: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

A sample .bat can be placed in C:\Program Files\VMware\VMware Tools\backupScripts.d)

Sample Freeze / Thaw script

© 2016 Tintri, Inc. All Rights Reserved. 32

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 36: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• Hypervisor: vSphere 6.0U2

• Physical Servers: Cisco UCS B200 Blades (12-core Intel E5-2697’s)

• Storage: Tintri T5080 VMstore

• Virtual Machine Config:

• Windows Server 2012R2

• Microsoft SQL 2016

• 6 vCPU + 16 GB RAM

• 7+ vDisks

• Client VMs: Windows 2012R2 w/ HammerDB

Testing Environment 1

36

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 37: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Testing methodology

© 2016 Tintri, Inc. All Rights Reserved. 47

HammerDB to populate databases and generate load

Take a variety of snapshots (vSphere & storage)

Observe and Measure:

Impact to VM during snapshot process

Recovery impact

Recovery Assessment

• Clone snapshots into VMs & power up into isolation (disconnect NIC)

• Review logs. Recovery times use time stamps from “Starting Master DB” until

“Recovery Complete”

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 38: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Testing methodology – Taking snaps

38

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 39: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Testing methodology – Cloning Snaps to Clones

39

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 40: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Testing methodology: Start to End Timing

40

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 41: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• We try to go in with a clean slate, but we all have ideas about what

the results are going to be prior

• The assumptions can pre judge the results

• Sharing so you know where we are coming from and you can judge

us

• These were written down prior to testing occurring

Test results assumptions

41

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 42: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• Quiesce will take longer but will provide for better SQL Server

recovery

• VAAI snapshots will be faster

• Managing VM Stun time is key

• Memory snapshot is not needed; offer no added value

The assumptions

42

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 43: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Fill ‘er Up! – Creating databases

Tests

01

© 2016 Tintri, Inc. All Rights Reserved. 53

03

02

04

Long time, no see! – Time since the last quiesced snap

On, and on, and on…. The never-ending transaction!

vsstrace.exeVMworld 2017 Content: N

ot for publicatio

n or distribution

Page 44: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

What are we testing, and under what conditions?

• Crash-consistent vs Quiesced

• Does the size of the DB make a difference?

• Do multiple databases on the same disk make a difference?

• What about a dataspace spanning many disks?

• Snapping the SQL VM under varying levels of stress (up to &

including 100% CPU)

Test #1 – Fill ‘er Up!

© 2016 Tintri, Inc. All Rights Reserved. 54

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 45: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

The Setup:

• 1 SQL 2016 VM w/ 11 vDisks: OS, Page, Data1, Logs1, TempDB,

TempLogs, Backup, Data2, Logs2, Data3, Logs3

• 9 x HammerDB Client VMs to create databases of various sizes

• Primarily Write operations

Test #1 – Fill ‘er Up!

© 2016 Tintri, Inc. All Rights Reserved. 55

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 46: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #1 – Fill ‘er Up!

46

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 47: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #1 – Fill ‘er Up!

47

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 48: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #1 – Fill ‘er Up!

48

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 49: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Observations:

• No errors observed on the HammerDB clients

• Screenshot of SQL Logs at time of quiesce….

Test #1 – Fill ‘er Up! [DURING]

49

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 50: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #1 – Fill ‘er Up! [DURING]

50

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 51: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #1 – Fill ‘er Up! [DURING]

51

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 52: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #1 – Fill ‘er Up! [DURING]

52

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 53: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #1 – Fill ‘er Up! (RECOVERY)

53

Observations :

• Review of SQL Logs… Screenshots

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 54: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #1 – Fill ‘er Up! [RECOVERY]

54

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 55: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #1 – Fill ‘er Up! [RECOVERY]

55

1 second recovery observed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 56: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Observations:

• Disk I/O observed (VSS vs Crash-Consistent)

• Not much to report, but there was some I/O incurred on crash-consistent snap not

seen on VSS

Test #1 – Fill ‘er Up! [RECOVERY]

56

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 57: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• No stuns observed for Crash-Consistent at time of snap

• VSS stuns in these tests were minimal

• Recovery:

• 1 second for VSS

• 11 seconds or crash-consistent

Test #1 – Fill ‘er Up! [CONCLUSION]

57

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 58: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• You can snap the memory of the VM within a VMware snapshot

(enabled by default)

What about my memory? …..I forget

58

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 59: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• Makes for longer stun times

VMware Snaps w/ Memory

© 2016 Tintri, Inc. All Rights Reserved. 37

• Can’t natively clone the snapped state, BUT, you can use Storage-

based snapshots to create a clone, and then revert. Handy to

troubleshoot a condition that would otherwise clear with a reboot.

• Not typically needed for backups but needed for recovering a VM to

an exact state, including inflight transactions.

• “Revert” snapshot rolls back VM into a running state

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 60: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• Takes longer to create & requires Disk I/O - all provisioned VM

memory needs to be dumped to disk

VMware Snaps w/ Memory – con’t

60

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 61: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Using agents tends to take snaps, attach to different guest for backups,

then revert the snapshot back

When error occur you have chain issues

Snapshot consolidation

© 2016 Tintri, Inc. All Rights Reserved. 39

Snapshot consolidation can be a smooth process or can take forever

Many factorsVMworld 2017 Content: N

ot for publicatio

n or distribution

Page 62: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Recovery time

Restore of a large

database can take a

long time with native

backups

Snaps can be back

online in minutes if not

seconds

Snaps can be mounted

to recover individual

objects

© 2016 Tintri, Inc. All Rights Reserved. 44

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 63: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Snapping SQL Server drives

© 2016 Tintri, Inc. All Rights Reserved. 45

All data and log drives need to be snapped at the same instant

• Not an issue with Tintri Snaps!

Many times data and log are on different datastores due to different IO

patterns or to create multiple queues on vSphere host

When data and log are on different datastores, the snapshots must keep

these consistent or there can be database corruption

Not all SAN vendors offer a way to snap multiple datastores at the same

instant

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 64: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• Does the time elapsed since the last VSS-quiesced snapshot make an

impact on crash-consistent snaps, or future VSS snaps? (ie.

database growth)

• 12 hours since the last VSS snapshot

Test #2 – Long Time, No See!

65

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 65: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• No noticeable changes observed at time of snap compared to 12+

hours earlier, when databases first began populating

Test #2 – Long Time, No See! [DURING]

66

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 66: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• Recovery of Crash-Consistent snapshot, 12 hours earlier: 11 seconds

Test #2 – Long Time, No See! [RECOVERY]

67

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 67: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

68

Test #2 – Long Time, No See! [RECOVERY]

• Inspect SQL logs from Crash-consistent, 12 hours since last VSS

operation

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 68: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

69

Test #2 – Long Time, No See! [RECOVERY]

• Compare to SQL Log of VSS-queisced snap…

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 69: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

70

Test #2 – Long Time, No See! [RECOVERY]

• 13 seconds crash-consistent versus 11 seconds crash-consistent 12

hours earlier (ie. within <15 mins of last VSS-quiesed snap)

• 13 seconds (crash-con) vs 1 second VSS

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 70: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

71

Test #2 – Long Time, No See!

• Let’s compare disk activity…

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 71: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

72

Test #2 – Long Time, No See!

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 72: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

73

Test #2 – Long Time, No See!

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 73: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #2 – Long Time, No See! (CONCLUSION)

74

The longer you go between

backups/VSS, the more work

that is required at recovery time

Recovery time difference is minimal

assuming decent performance

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 74: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• SQL Server Native backup when structured properly will allow for point in time recovery to the nearest transaction or sub second

• Only completed transactions will be part of the backup

SQL Server backup and recovery

Snapshots are not backups!!!

© 2016 Tintri, Inc. All Rights Reserved. 40

• Recovering from a snapshot will either be backup consistent or crash consistent

• Many SQL Server experts warn about all snapshotsVMworld 2017 Content: N

ot for publicatio

n or distribution

Page 75: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

DBA has the control they desire to manage their risk

Using a mixture of both snapshots and backups provide the most

flexibility

• Think about as multiple layers of protection one does not replace the other

• Depends upon SLA (business rules)

SQL Server Native Backups

© 2016 Tintri, Inc. All Rights Reserved. 41

Usually stored on separate disk subsystem

Storage corruption will not cause data loss

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 76: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

SQL Server has three recovery models

Simple – Can only be restored to last full backup

Bulk logged – Can be restored to point in time, but bulk logged

processes are not in the restore and have to be repeated.

Full – Can recover to any point in time, down to a single transaction

or even a certain millisecond

Recovery models

© 2016 Tintri, Inc. All Rights Reserved. 42

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 77: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Snapshots can pause a virtual machine in order to

quiesce

Have seen hourly 5 minute stuns of IO (bad config)

Proper setup can make these manageable stun

Some SQL Server databases can never be stunned

Even worse in a LUN-based datastore where ALL

VMs need to be stunned… prolonging the pain

Set your phasers for VM stun time

© 2016 Tintri, Inc. All Rights Reserved. 38

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 78: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #3 – The Never-ending Transaction

79Image Credit:

neverendingstory.com

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 79: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• What happens during quiesce if there’s no clean break in active I/O ?

• Start with an ugly query that never commits

• Run it against a decent-sized database.

• We used a 5,000 warehouse HammerDB database @ 500 GB

• Sit back and wait. And wait. And wait….

• While query is running, take snaps after 20 mins, 1.5 hours and ~21 hours (7.5 million

row affected)

• Finally terminated the process after 17+ million rows were affected, 2.5 days later

Test #3 – The Never-ending transaction

80

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 80: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #3 – The Never-ending transaction

81

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 81: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #3 – The Never-ending transaction [During]

82

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 82: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #3 – The Never-ending transaction (During)

83

Increased snapshot times (LONG!)

VMware snapshot: 02:40 (compared to ~40 seconds)

Removal: 34:43 (compared to ~20 second removals)

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 83: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• SQL log during snap removal – “I/O requests taking longer than 15s”

Test #3 – The Never-ending transaction [During]

84

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 84: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• Both VSS quiesced snapshot AND crash-consistent snap entered into

recovery mode after boot up

• SQL database was not ready, and relatively high I/O was observed on

storage

Test #3 – The Never-ending transaction [Recovery]

85

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 85: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #3 – The Never-ending transaction [Recovery]

86

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 86: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #3 – The Never-ending transaction [Recovery]

87

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 87: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• Analysis phase took 290x (!!!) LONGER on Crash-consistent snap

• Advantage: VSS ?

• Not quite… VSS = 7ms, crash-consistent = 2037ms

• Only 2 seconds added to recovery time…

• …Out of 16+ minute recoveries at 30K – 40K IOPS

Test #3 – The Never-ending transaction [Recovery]

88

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 88: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #3 – The Never-ending transaction [Recovery]

89

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 89: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #3 – The Never-ending transaction

90

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 90: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• Available in the Windows SDK

• Information overload!

• ~48,600 events for “idle” SQL Server w/ 9 databases on 11 vDisks

• 37 seconds logged

• ~35,500 events for an idle SQL Server with NO custom DBs

• 35.5 seconds logged

• What to use it for? Deep inspection

• Beware the observer effect – Debugging isn’t free

• What can be found?

Test #4 – vsstrace

91

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 91: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #4 – vsstrace – Interesting Findings – DB names

92

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 92: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #4 – vsstrace – Interesting Findings – DB names

93

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 93: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Test #4 – vsstrace – Interesting Findings – DB names

94

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 94: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

• References to SQL-specific use the SQLServerWriter by default

• Filter by the “WRITER” module to narrow down logging results (~10%

of events in our sample: 4,841 / 48,647 total events

• vssadmin list writer

• Many writers may be listed, vsstrace will tell you which is actually being used

Test #4 – vsstrace – As it relates to SQL

95

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 95: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Ask Again: To quiesce or not to quiesce?

96

Summary of results on whether it’s worthwhile or not

Data loss (transactions

rolled back)

Supportability

Stun Times

Recovery from snapshotsVMworld 2017 Content: Not fo

r publication or distri

bution

Page 96: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

Recommendations

97

Snapshots do not replace backup but are another tool for improved

recovery

Snapshots offer many great benefits beyond just recovery

Quiesce if supportability is key – Suggest frequent crash-consistent

with occasional VM-consistent or VSS-enabled backup job

It is a business tradeoff snapshots are for every situationVMworld 2017 Content: Not fo

r publication or distri

bution

Page 97: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution

Page 98: PBO3350BUS Snapshots and SQL Server - Technical Deep or ... · SQL Server Principal Architect PBO3350BUS #VMworld #PBO3350BUS Snapshots and SQL Server - Technical Deep Dive & Detailed

VMworld 2017 Content: Not fo

r publication or distri

bution