S de2784 footprint-reduction-edge2015-v2

46
© Copyright IBM Corporation 2015 Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM. sDE2784 Data Footprint Reduction – Understanding IBM Storage Efficiency Options Tony Pearson Master Inventor and Senior IT Specialist IBM Corporation

Transcript of S de2784 footprint-reduction-edge2015-v2

© Copyright IBM Corporation 2015Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.

sDE2784

Data Footprint Reduction –

Understanding IBM Storage Efficiency

Options

Tony Pearson

Master Inventor and Senior IT Specialist

IBM Corporation

© Copyright IBM Corporation 20151

Abstract

Data Footprint Reduction is the catchall term for a variety of technologies designed to help reduce storage costs. This session will cover four techniques for data footprint reduction: thin provisioning, space-efficient snapshots, data deduplication and real-time compression.

Come to this session to learn how these technologies work, which IBM storage products provide these capabilities and how they will benefit your data center.

© Copyright IBM Corporation 2015

This week with Tony Pearson

2

Day Time Topic

Monday 10:30am Software Defined Storage -- Why? What? How? (repeats Tuesday)

03:00pm IBM's Cloud Storage Options (repeats Wednesday)

04:30pm Data Footprint Reduction – Understanding IBM Storage Efficiency Options

Tuesday 10:30am Software Defined Storage -- Why? What? How?

12:30pm What Is Big Data? Architectures and Practical Use Cases

01:45pm IBM Smarter Storage Strategy (repeats Wednesday)

Wednesday 09:00am New Generation of Storage Tiering: Less Management Lower Investment and Increased Performance

10:30am IBM Smarter Storage Strategy

12:30pm IBM's Cloud Storage Options

01:45pm IBM Spectrum Scale (Elastic Storage) Offerings

Thursday 12:30pm The Pendulum Swings Back -- Understanding Converged and Hyperconverged Environments

Friday 09:00am IBM Spectrum Storage Integration with OpenStack

Thin ProvisioningSpace-Efficient CopyData DeduplicationCompression

Agenda

© Copyright IBM Corporation 2015

4

Why Space is Over-Allocated

Scenario 1

�Space requirements under-estimated

�Running out of space requires larger volume

�New request may take weeks to accommodate

� Application outage if not addressed in time

�Data must be moved to the larger volume

� Application outage during data movement

Scenario 2

• Space requirements over-estimated

• Capacity lasts for years

• No data migration

• No application outages

• No penalties

When faced with this dilemma,

most will err on the side of over-estimating

© Copyright IBM Corporation 2015

5

Fully Allocated vs. Thin Provisioning

Host sees fully allocated amount

Actual data written

Allocated but unused space dedicated to this host, wasted space

Host sees full virtual amount Actual data written

Empty space available to others

Physical Space Allocated

© Copyright IBM Corporation 2015

6

Blocks, Grains, Extents and Volumes/LUNs

Host sees a volumeor LUN that consists of blocks numbered 0 to nnnnnnnnnn

Extent – Allocation UnitOne or more grains

Volume/LUN – one or more extents

Grain – range of 1 or more blocks

Block – typically 512 or 4096 bytes

© Copyright IBM Corporation 2015

7

Thin Provisioning – Coarse and Fine Grain

9

8

7

6

5

4

3

2

1

0

0 1 2 3 4 5 6 7 8 9

9

5

0

0 1 2 3 4 5 6 7 8 9

Block 0,0, 55, and 99 writtenFully Allocated, all 10 extents allocatedCoarse-Grain, only 3 extents allocatedFine-Grain, only 1 extent allocated

Fully Allocated Fine-GrainCoarse-Grain

Grain 54-55

Grain 00-01

Grain 98-99

Grain 90-99 = extent

© Copyright IBM Corporation 2015

8

How IBM has Implemented Thin Provisioning

DS8000 XIV SVC and Storwize

DCS3700, DCS3860

Type Coarse Fine Fine Fine

Allocation Unit

1 GB 17 GB 16MB to 8GB

4 GB

Grain size 1 MB 32-256 KB 64 KB

© Copyright IBM Corporation 2015

9

Thin Provisioning

Advantages

Just-in-Time increased utilization percentage

Eliminates the pressure to make accurate space estimates

Dynamically expand volume without impacting applications or rebooting server

Reduces the data footprint and lowers costs

Shifts focus from volumes to storage pool capacity

Objections

�Not all file systems cooperate or friendly

� Deletion of files does not free space for others

� “sdelete” writes zeros over deleted file space

�Other implementations may impact I/O performance

�May not support same set of features, copy services, or replication

�“Selling more tickets than seats”

Thin ProvisioningSpace-Efficient CopyData DeduplicationCompression

Agenda

© Copyright IBM Corporation 2015

11

Space-Efficient Copies

Destination 1

100 GB allocated40 GB written

300 GB

30 GB

Traditional Copies

Space-Efficient Copies Typical 10%

Source

Destination 2 Destination 3

© Copyright IBM Corporation 2015

12

Cascaded FlashCopy:Copy the copies

Up to 256 targets

Source Volume

FlashCopy relationships

Start incremental FlashCopy

Data copied as normal

Some data changed by apps

Start incremental FlashCopy

Only changed data copiedby background copy

Later …

Disk0Source

Map 1 Map 2

Map 4

Disk1FlashCopy

target of Disk0

Disk2FlashCopytarget of Disk1

Disk4FlashCopytarget of Disk3

Disk3FlashCopy

target of Disk1

Incremental FlashCopy:Volume level

point-in-time copy

FlashCopy:Volume level point-in-time copywith any mix of thinand fully-allocated

Storwize family - FlashCopy

© Copyright IBM Corporation 2015

13

Space-efficient Copies

Advantages

Supports both Fully-allocated and Thin-Provisioned sources

Reduces the data footprint and lowers costs

Allows you to keep more copies online

Allows you to take copies more frequently

Can be used as checkpoint copies during batch processing

Objections

�Some implementations may impact I/O performance

�Other implementations require that you estimate the maximum percentage changed

�Typically 10-20 %

�Exceeding the reserved space invalidates destination copy

Thin ProvisioningSpace-Efficient CopyData DeduplicationCompression

Agenda

© Copyright IBM Corporation 2015

15

Data deduplication reduces capacity requirements by only storing one unique instance of the data on disk and creating pointers for duplicate data elements

1. Data elements are evaluated to determine a unique signature for each

2. Signature values are compared to identify all duplicates

3. Duplicate data elements are eliminated and replaced with pointers to reference element

Storage Optimization: Data Deduplication

© Copyright IBM Corporation 2015

PerformanceMeasured performance

over 2,800 MB/s inline deduplication backup rate

Over 3600 MB/s restore rate

CapacityUp to 1 PB physical capacity per clusterReduces required disk capacity by up to

25 times

Enterprise-Class Data Integrity

Binary diff process during dedupedesigned for the highest data integrity

Active-active cluster eliminates single points of failure

High Availability Cluster

ProtecTIER Data Deduplication Advantages

16

© Copyright IBM Corporation 2015

Repository

Backup ServersFC Switch TS7650G

HyperFactor

MemoryResident Index

“Filtered” Data

Existing Data

New Data Stream

Storage

Arrays

Only 4GB needed to map

1PB of physical disk!

IBM ProtecTIER – HyperFactor algorithm

17

© Copyright IBM Corporation 2015

18

Physical

capacity

ProtecTIER

Gateway

Backup Server

Backup Server

Represented capacityPrimary Site

Physical

capacityProtecTIER

GatewayBackup Server

Secondary Site

IP-based

WAN link

Tape library

Virtual cartridges can be copied to physical tape at DR site

Deduplication enables a large amount of data to be replicated with significantly less bandwidth

Significantly Reduces Replication Bandwidth

ProtecTIER Mainframe Edition (ME) for Shared Infrastructure

Common distributed and mainframe backup and disaster recovery solution

19

© Copyright IBM Corporation 2015

20

Virtual Desktop Infrastructure (VDI)

ILIO Diskless VDI and XenApp

ILIO Diskless VDI and XenAppILIOILIO

Application Analysis

Inline Deduplication

Content-Aware IO Processing

Compression

Server Hardware

Hypervisor (ESX, XenServer, Hyper-V)

Coalescing(IO Blender Fix)

NFS, iSCSI, Fibre Channel or Local DiskNFS, iSCSI, Fibre Channel or Local Disk

NFS or iSCSINFS or iSCSI

RAM ascache

VDI represents only 5% of Flash deployment

capacity*

Deduplication and Compression can achieve 90% savings for VDI workloads

Atlantis ILIO™ Server-Side Optimization Software

• Eliminates the storage problem at the source

• Lower cost per desktop with better performance

Less than $200 stateless desktopLess than $300 persistent desktop

• Proven at scale in the largest desktop virtualization deployments in the world

• Enterprise-class reliability with automated deployment and HA/DR

* Source: The Adoption of and Leading Use Cases for Solid State Storage by Enterprise Customers, IDCSeptember 2013, IDC #242808

IBM FlashSystem

© Copyright IBM Corporation 2015

21

Data Deduplication

Advantages

Designed for backups and VDI

Can offer up to 25x data footprint reduction (96% savings)

Allows more backup copies to remain on disk for faster restores

Reduces cost of disk backup repositories

Available with a variety of interfaces, including VTL, Symantec OST, CIFS and NFS

Objections

�Dealing with Hash Collisions

� May require byte-for-byte comparisons or keeping secondary copy of data

�Hash-based systems do not scale

�Other systems have slow restores

� Re-hydrating data back to normal

�Primary active data may not dedupe very well

� Your mileage may vary

Thin ProvisioningSpace-Efficient CopyData DeduplicationCompression

Agenda

© Copyright IBM Corporation 2015

23

Lossy vs. Lossless Methods

Lossy

• Used with music, photos, video, medical images, scanned documents, fax machines

Lossless

• Used with databases, emails, spreadsheets, office documents, source code

Good enough?

Exactly the same

Compress

Decompressdoes not return data back to its original contents

Compress

Decompressreturns data back to its original contents

© Copyright IBM Corporation 2015

24

How Compression Works

• Lempel-Ziv lossless compression builds a dictionary of repeated phrases, sequences of two or more characters that can be represented with fewer number of bits

• In the above excerpt from Lord of the Rings, all of the red textrepresents repeated sequences eligible for compression

Source: The Lempel Ziv Algorithm, Christian Zeeh, 2003

25

Data Footprint Reduction

Active Data Backup Data

Real-time Compression 40-80%Best

40-80%

20-30% 80-95 %

Best

Data

Deduplication

Real-Time Compression is a method of reducing storage needs by changing the encoding scheme as the data is being read and written

– Short patterns for frequent data

– Longer patterns for infrequent data

– Can achieve 40 to 80 percent reduction in storage capacity for active data

Data deduplication is a method of reducing storage needs by eliminating duplicate copies of data

– Store only one unique instance of the data

– Redundant data replaced with pointer

– Can achieve 80 to 95 percent reduction in storage capacity for backup data

© Copyright IBM Corporation 2015

26

Compressed Volumes based on Thin Provisioning

Actual data written

Allocated but unused space dedicated to this host, wasted until written to

Full

Actual data written

Physical Space Allocated

Thin Provisioning

Host sees full virtual amount

Physical Space Allocated, up to 80% reduction from actual data written

Actual data written

Thin Provisioningwith Compression

© Copyright IBM Corporation 2015

27

FIVO vs. VIFO

Fixed Input, Variable Output

• WAN transmission

• Sequential tape

• IBM Tivoli Storage Manager

• zip, tar, etc.

Variable Input, Fixed Output

Random Access Compression Engine™ (RACE)

• SAN Volume Controller

• Storwize V7000 and V7000 Unified

• FlashSystem V9000

• XIV Storage System

1

2

3

4

5

6

Data

1

2

3

4

5

6

1

2

3

4

5

6

Compressed

Data

2

1

3

4

5

6

DataCompressed

Data

© Copyright IBM Corporation 2015

28

Traditional Approaches

A

D

B

MN

G H

C

F

I

File

NewCompressed

FileABC DMN FGH I

Blocks Shift

Compression after Modification

Real-time Compression

File

Compressed File

A

D

B

MN

G H

C

F

I

File

NewCompressed

File ABC DEF1 GHI MN

Identical Blocks

Compression after Modification

A

D

B

E

G H

C

F

I

ABC DEF GHI

� The work to “update" a file may involve many more I/Os

� Data blocks shift• Negative impact to deduplication

� No notion of data location, data is processed sequentially

� The work to “update" a file about the same or fewer I/O

� Only modified block changed• Enhances deduplication

� Data location via map

Compression for Disk data

© Copyright IBM Corporation 2015

29

IBM Real-time Compression for File and Block level

For File and Block-level access

• IBM Storwize V7000 Unified

For Block-only access

• SAN Volume Controller

• Storwize V7000

• FlashSystem V9000

• XIV Storage System – NEW

Storwize V7000

To estimate space savings for file-level storage, use:

Real-time Compression Appliance Scan Tool

To estimate space savings for block-level storage, use: Comprestimator Tool

Storwize V7000 Unified

© Copyright IBM Corporation 2015

IBM Real-time Compression – Estimated Savings

� IBM’s Random-Access Compression Engine (RACE) delivers excellent capacity savings for a variety of data types:

Databases (DB2, Oracle, etc.) ~ 80%

Virtual Servers (Vmware, etc.)

Linux and Windows

Virtual guest images

50% to 70%

Microsoft Office2003 ~ 60%

2007 or later ~ 20%

CAD/CAM Engineering drawings ~ 70%

� IBM Comprestimator tool can be used to evaluate expected compression benefits for specific environments

• This pre-sales tool is available to estimate compression savings, percentage savings shown are typical results, based on client experiences, your mileage may vary.

• http://www14.software.ibm.com/webapp/set2/sas/f/comprestimator/home.html

� 45-day Free Trial of Compression available

Source: IBM internal tests and field resuls 30

Compression Acceleration Cards –Intel® QuickAssist Technology

Intel QuickAssist technology integrated into new Compression Acceleration cards

� Used to offload the LZ compression and decompression processing

� Each node supports up to two Compression Acceleration cards

� SVC uses 4 parallel compression threads per card

To use compressed volumes, nodes require at least:

� SVC 2145-DH8 or next generation Storwize V7000

� 64GB of Cache Memory per node

� One Compression Acceleration card

When compression is enabled

� 38GB is used as a Compression Cache

Optionally upgrade each node to contain second Compression Acceleration card

� Upgrade recommended when normal data working set > 32TB

31

Lower Cache

7.3.0 Software Stack

RAID

New Dual Layer Cache

Architecture

� First major update to cache since 2003

� Flexible design for plug and play style cache algorithm enhancements in the future

� “SVC” like L2 cache for advanced functions

Upper Cache – simple

write cache

Lower Cache – algorithm

intelligence

� Understands mdisks

Shared buffer space

between two layers* Only 4F2 hardware limited to running no later than 5.1 Software due to 32bit CPU

SCSI Initiator

Forwarding

Fibre Channel

iSCSI

FCoE

SAS

PCIe

Compression

Upper Cache

FlashCopy

Virtualization

Mirroring

Thin Provisioning

Forwarding

Forwarding

Easy Tier 3

Co

nfig

ura

tion

Pe

er C

om

mu

nic

atio

ns

Inte

rface

La

yer

Clu

ste

ring

SCSI Target

Replication

New

New

New

32

Store more IOPS Response time

Real Time Compression

[RtC]

store more Limited effect Limited effect

Auto Tiering

[Easy Tier and Flash Technology] No effect More IOPS Faster response

Turbo Compression

[RtC + Easy Tier and Flash Technology] store more More IOPS Faster response

+

=

Turbo Compression may double the net usability of existing Infrastructures

Turbo Compression Explained

Turbo Compression tests

Oracle TPC-C (07/2013)

[2 % Flash Capacity]

4x

Compression

2.1 x

IOPS Throughput

½ x

Response time

at a fraction of the cost of traditional means33

Turbo Compression for Tiered Flash/Disk Pools

•Easy Tier (no compression)

•1 Volume 100 GB

• 4% Flash (4GB) � 23% of IOPS

(assumption : skew = 7)

HDD Tier: 77% of IOPS

•Compression (RtC)

(assumption: 66% savings)

• 12% compressed data fits in 4 GB

• 12% data � 60% of IOPS

• HDD Tier: 40% of IOPS

•Turbo Compression

• Pool IOPS capability nearly doubled without adding any Flash

0%

20%

40%

60%

80%

100%

120%

0% 20% 40% 60% 80% 100%

I

O

%

Go %

RtC

4%

23%

60%

12% Capacity %

Cumulative IOps vs. Capacity

TC

34

© Copyright IBM Corporation 2015

35

Fully-allocatedor Thin-provisionedvolume

Volume mirror

Only non-zero blocks copied

Copy 0 Copy 1

Compressedvolume

Compressing Existing Data

© Copyright IBM Corporation 2015

XIV Compression & Snapshot Views

� Comprestimator tool built into IBM XIV 11.6 GUI

� Right click to compress volume

� Snapshot usage now reporting per volume36

© Copyright IBM Corporation 2015

37

Compression

Advantages

Can be used for data transmission, tape and disk data

�Supports both file-based and block-based disk storage

Real-time compression can be used with Databases, CAD/CAM and Virtual Machines with no impact to application performance

Can offer up to 80% data footprint reduction savings

Real-time Compression is “Dedupe-Friendly” and combines well with Thin Provisioning

Objections

�Some implementations are post-process

� Stores uncompressed data first, compresses later

�Other implementations impact application performance and/or consume substantial CPU resources

�Benefits vary by data type, and whether applications do their own compression or encryption

� Your mileage may vary

Summary

• Data Footprint Reduction technologies have been around for many years

• Algorithms are stable, mature, and well-understood by the IT industry

• Data is returned byte-for-byte identical to what was originally stored

• Implementations between vendors and products can vary greatly

• IBM’s implementations tend to have faster performance, offer better scalability, are easier to use and less expensive TCO

© Copyright IBM Corporation 2015 39

Some great prizes

to be won!

Please fill out an evaluation!

Session: sDE2784

40

© Copyright IBM Corporation 2015 41

IBM Tucson Executive Briefing Center

• Tucson, Arizona is home for storage hardware and software design and development

• IBM Tucson Executive

Briefing Center offers:

• Technology briefings

• Product demonstrations

• Solution workshops

• Take a video tour

• http://youtu.be/CXrpoCZAazg

42

About the Speaker

Tony Pearson is a Master Inventor and Senior managing consultant for the IBM System Storage™ product line. Tony joined

IBM Corporation in 1986 in Tucson, Arizona, USA, and has been there ever since. In his current role, Tony presents briefings

on storage topics covering the entire System Storage product line, and topics related to Cloud, Analytics and Social media. He

interacts with clients, speaks at conferences and events, and leads client workshops to help clients with strategic planning for

IBM’s integrated set of storage software, hardware and virtualization products.

Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners

every week. This blog was rated one of the top 10 blogs for the IT storage industry by “Networking World” magazine and #1

most read IBM blog on IBM’s developerWorks. The blog has been published into a series of books, Inside System Storage:

Volumes I through V.

Over the years, Tony has worked in development, marketing and consulting positions for various storage hardware and

software products. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in

Electrical Engineering both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and

software products.

9000 S. Rita Road

Bldg 9032 Floor 1

Tucson, AZ 85744

+1 520-799-4309 (Office)

[email protected]

Tony Pearson

Master Inventor,

Senior IT Specialist

IBM System Storage™

© Copyright IBM Corporation 2015

Email:[email protected]

Twitter:twitter.com/az99Øtony

Blog: ibm.co/Pearson

Books:www.lulu.com/spotlight/99Ø_tony

IBM Expert Network on Slideshare:www.slideshare.net/az99Øtony

Facebook:www.facebook.com/tony.pearson.16121

Linkedin:www.linkedin.com/profile/view?id=103718598

Additional Resources from Tony Pearson

43

© Copyright IBM Corporation 2015

Continue growing your IBM skills

ibm.com/training provides acomprehensive portfolio of skills and careeraccelerators that are designed to meet all your training needs.

• Training in cities local to you - where and when you need it, and in the format you want• Use IBM Training Search to locate public training classes

near to you with our five Global Training Providers

• Private training is also available with our Global Training Providers

• Demanding a high standard of quality –view the paths to success• Browse Training Paths and Certifications to find the

course that is right for you

• If you can’t find the training that is right for you with our Global Training Providers, we can help.• Contact IBM Training at [email protected]

44

Global Skills Initiative

© Copyright IBM Corporation 2015

Trademarks and Disclaimers

Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.

Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind.

The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.

Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.

All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.

Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.

Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.

Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography.

Photographs shown may be engineering prototypes. Changes may be incorporated in production models.

© IBM Corporation 2015. All rights reserved.References in this document to IBM products or services do not imply that IBM intends to make them available in every country.

Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml.

ZSP03490-USEN-00

45