© Copyright IBM Corporation 2015Technical University/Symposia materials may not be reproduced in whole or in part without the prior written permission of IBM.
sDE2784
Data Footprint Reduction –
Understanding IBM Storage Efficiency
Options
Tony Pearson
Master Inventor and Senior IT Specialist
IBM Corporation
© Copyright IBM Corporation 20151
Abstract
Data Footprint Reduction is the catchall term for a variety of technologies designed to help reduce storage costs. This session will cover four techniques for data footprint reduction: thin provisioning, space-efficient snapshots, data deduplication and real-time compression.
Come to this session to learn how these technologies work, which IBM storage products provide these capabilities and how they will benefit your data center.
© Copyright IBM Corporation 2015
This week with Tony Pearson
2
Day Time Topic
Monday 10:30am Software Defined Storage -- Why? What? How? (repeats Tuesday)
03:00pm IBM's Cloud Storage Options (repeats Wednesday)
04:30pm Data Footprint Reduction – Understanding IBM Storage Efficiency Options
Tuesday 10:30am Software Defined Storage -- Why? What? How?
12:30pm What Is Big Data? Architectures and Practical Use Cases
01:45pm IBM Smarter Storage Strategy (repeats Wednesday)
Wednesday 09:00am New Generation of Storage Tiering: Less Management Lower Investment and Increased Performance
10:30am IBM Smarter Storage Strategy
12:30pm IBM's Cloud Storage Options
01:45pm IBM Spectrum Scale (Elastic Storage) Offerings
Thursday 12:30pm The Pendulum Swings Back -- Understanding Converged and Hyperconverged Environments
Friday 09:00am IBM Spectrum Storage Integration with OpenStack
© Copyright IBM Corporation 2015
4
Why Space is Over-Allocated
Scenario 1
�Space requirements under-estimated
�Running out of space requires larger volume
�New request may take weeks to accommodate
� Application outage if not addressed in time
�Data must be moved to the larger volume
� Application outage during data movement
Scenario 2
• Space requirements over-estimated
• Capacity lasts for years
• No data migration
• No application outages
• No penalties
When faced with this dilemma,
most will err on the side of over-estimating
© Copyright IBM Corporation 2015
5
Fully Allocated vs. Thin Provisioning
Host sees fully allocated amount
Actual data written
Allocated but unused space dedicated to this host, wasted space
Host sees full virtual amount Actual data written
Empty space available to others
Physical Space Allocated
© Copyright IBM Corporation 2015
6
Blocks, Grains, Extents and Volumes/LUNs
Host sees a volumeor LUN that consists of blocks numbered 0 to nnnnnnnnnn
Extent – Allocation UnitOne or more grains
Volume/LUN – one or more extents
Grain – range of 1 or more blocks
Block – typically 512 or 4096 bytes
© Copyright IBM Corporation 2015
7
Thin Provisioning – Coarse and Fine Grain
9
8
7
6
5
4
3
2
1
0
0 1 2 3 4 5 6 7 8 9
9
5
0
0 1 2 3 4 5 6 7 8 9
Block 0,0, 55, and 99 writtenFully Allocated, all 10 extents allocatedCoarse-Grain, only 3 extents allocatedFine-Grain, only 1 extent allocated
Fully Allocated Fine-GrainCoarse-Grain
Grain 54-55
Grain 00-01
Grain 98-99
Grain 90-99 = extent
© Copyright IBM Corporation 2015
8
How IBM has Implemented Thin Provisioning
DS8000 XIV SVC and Storwize
DCS3700, DCS3860
Type Coarse Fine Fine Fine
Allocation Unit
1 GB 17 GB 16MB to 8GB
4 GB
Grain size 1 MB 32-256 KB 64 KB
© Copyright IBM Corporation 2015
9
Thin Provisioning
Advantages
Just-in-Time increased utilization percentage
Eliminates the pressure to make accurate space estimates
Dynamically expand volume without impacting applications or rebooting server
Reduces the data footprint and lowers costs
Shifts focus from volumes to storage pool capacity
Objections
�Not all file systems cooperate or friendly
� Deletion of files does not free space for others
� “sdelete” writes zeros over deleted file space
�Other implementations may impact I/O performance
�May not support same set of features, copy services, or replication
�“Selling more tickets than seats”
© Copyright IBM Corporation 2015
11
Space-Efficient Copies
Destination 1
100 GB allocated40 GB written
300 GB
30 GB
Traditional Copies
Space-Efficient Copies Typical 10%
Source
Destination 2 Destination 3
© Copyright IBM Corporation 2015
12
Cascaded FlashCopy:Copy the copies
Up to 256 targets
Source Volume
FlashCopy relationships
Start incremental FlashCopy
Data copied as normal
Some data changed by apps
Start incremental FlashCopy
Only changed data copiedby background copy
Later …
Disk0Source
Map 1 Map 2
Map 4
Disk1FlashCopy
target of Disk0
Disk2FlashCopytarget of Disk1
Disk4FlashCopytarget of Disk3
Disk3FlashCopy
target of Disk1
Incremental FlashCopy:Volume level
point-in-time copy
FlashCopy:Volume level point-in-time copywith any mix of thinand fully-allocated
Storwize family - FlashCopy
© Copyright IBM Corporation 2015
13
Space-efficient Copies
Advantages
Supports both Fully-allocated and Thin-Provisioned sources
Reduces the data footprint and lowers costs
Allows you to keep more copies online
Allows you to take copies more frequently
Can be used as checkpoint copies during batch processing
Objections
�Some implementations may impact I/O performance
�Other implementations require that you estimate the maximum percentage changed
�Typically 10-20 %
�Exceeding the reserved space invalidates destination copy
© Copyright IBM Corporation 2015
15
Data deduplication reduces capacity requirements by only storing one unique instance of the data on disk and creating pointers for duplicate data elements
1. Data elements are evaluated to determine a unique signature for each
2. Signature values are compared to identify all duplicates
3. Duplicate data elements are eliminated and replaced with pointers to reference element
Storage Optimization: Data Deduplication
© Copyright IBM Corporation 2015
PerformanceMeasured performance
over 2,800 MB/s inline deduplication backup rate
Over 3600 MB/s restore rate
CapacityUp to 1 PB physical capacity per clusterReduces required disk capacity by up to
25 times
Enterprise-Class Data Integrity
Binary diff process during dedupedesigned for the highest data integrity
Active-active cluster eliminates single points of failure
High Availability Cluster
ProtecTIER Data Deduplication Advantages
16
© Copyright IBM Corporation 2015
Repository
Backup ServersFC Switch TS7650G
HyperFactor
MemoryResident Index
“Filtered” Data
Existing Data
New Data Stream
Storage
Arrays
Only 4GB needed to map
1PB of physical disk!
IBM ProtecTIER – HyperFactor algorithm
17
© Copyright IBM Corporation 2015
18
Physical
capacity
ProtecTIER
Gateway
Backup Server
Backup Server
Represented capacityPrimary Site
Physical
capacityProtecTIER
GatewayBackup Server
Secondary Site
IP-based
WAN link
Tape library
Virtual cartridges can be copied to physical tape at DR site
Deduplication enables a large amount of data to be replicated with significantly less bandwidth
Significantly Reduces Replication Bandwidth
ProtecTIER Mainframe Edition (ME) for Shared Infrastructure
Common distributed and mainframe backup and disaster recovery solution
19
© Copyright IBM Corporation 2015
20
Virtual Desktop Infrastructure (VDI)
ILIO Diskless VDI and XenApp
ILIO Diskless VDI and XenAppILIOILIO
Application Analysis
Inline Deduplication
Content-Aware IO Processing
Compression
Server Hardware
Hypervisor (ESX, XenServer, Hyper-V)
Coalescing(IO Blender Fix)
NFS, iSCSI, Fibre Channel or Local DiskNFS, iSCSI, Fibre Channel or Local Disk
NFS or iSCSINFS or iSCSI
RAM ascache
VDI represents only 5% of Flash deployment
capacity*
Deduplication and Compression can achieve 90% savings for VDI workloads
Atlantis ILIO™ Server-Side Optimization Software
• Eliminates the storage problem at the source
• Lower cost per desktop with better performance
Less than $200 stateless desktopLess than $300 persistent desktop
• Proven at scale in the largest desktop virtualization deployments in the world
• Enterprise-class reliability with automated deployment and HA/DR
* Source: The Adoption of and Leading Use Cases for Solid State Storage by Enterprise Customers, IDCSeptember 2013, IDC #242808
IBM FlashSystem
© Copyright IBM Corporation 2015
21
Data Deduplication
Advantages
Designed for backups and VDI
Can offer up to 25x data footprint reduction (96% savings)
Allows more backup copies to remain on disk for faster restores
Reduces cost of disk backup repositories
Available with a variety of interfaces, including VTL, Symantec OST, CIFS and NFS
Objections
�Dealing with Hash Collisions
� May require byte-for-byte comparisons or keeping secondary copy of data
�Hash-based systems do not scale
�Other systems have slow restores
� Re-hydrating data back to normal
�Primary active data may not dedupe very well
� Your mileage may vary
© Copyright IBM Corporation 2015
23
Lossy vs. Lossless Methods
Lossy
• Used with music, photos, video, medical images, scanned documents, fax machines
Lossless
• Used with databases, emails, spreadsheets, office documents, source code
Good enough?
Exactly the same
Compress
Decompressdoes not return data back to its original contents
Compress
Decompressreturns data back to its original contents
© Copyright IBM Corporation 2015
24
How Compression Works
• Lempel-Ziv lossless compression builds a dictionary of repeated phrases, sequences of two or more characters that can be represented with fewer number of bits
• In the above excerpt from Lord of the Rings, all of the red textrepresents repeated sequences eligible for compression
Source: The Lempel Ziv Algorithm, Christian Zeeh, 2003
25
Data Footprint Reduction
Active Data Backup Data
Real-time Compression 40-80%Best
40-80%
20-30% 80-95 %
Best
Data
Deduplication
Real-Time Compression is a method of reducing storage needs by changing the encoding scheme as the data is being read and written
– Short patterns for frequent data
– Longer patterns for infrequent data
– Can achieve 40 to 80 percent reduction in storage capacity for active data
Data deduplication is a method of reducing storage needs by eliminating duplicate copies of data
– Store only one unique instance of the data
– Redundant data replaced with pointer
– Can achieve 80 to 95 percent reduction in storage capacity for backup data
© Copyright IBM Corporation 2015
26
Compressed Volumes based on Thin Provisioning
Actual data written
Allocated but unused space dedicated to this host, wasted until written to
Full
Actual data written
Physical Space Allocated
Thin Provisioning
Host sees full virtual amount
Physical Space Allocated, up to 80% reduction from actual data written
Actual data written
Thin Provisioningwith Compression
© Copyright IBM Corporation 2015
27
FIVO vs. VIFO
Fixed Input, Variable Output
• WAN transmission
• Sequential tape
• IBM Tivoli Storage Manager
• zip, tar, etc.
Variable Input, Fixed Output
Random Access Compression Engine™ (RACE)
• SAN Volume Controller
• Storwize V7000 and V7000 Unified
• FlashSystem V9000
• XIV Storage System
1
2
3
4
5
6
Data
1
2
3
4
5
6
1
2
3
4
5
6
Compressed
Data
2
1
3
4
5
6
DataCompressed
Data
© Copyright IBM Corporation 2015
28
Traditional Approaches
A
D
B
MN
G H
C
F
I
File
NewCompressed
FileABC DMN FGH I
Blocks Shift
Compression after Modification
Real-time Compression
File
Compressed File
A
D
B
MN
G H
C
F
I
File
NewCompressed
File ABC DEF1 GHI MN
Identical Blocks
Compression after Modification
A
D
B
E
G H
C
F
I
ABC DEF GHI
� The work to “update" a file may involve many more I/Os
� Data blocks shift• Negative impact to deduplication
� No notion of data location, data is processed sequentially
� The work to “update" a file about the same or fewer I/O
� Only modified block changed• Enhances deduplication
� Data location via map
Compression for Disk data
© Copyright IBM Corporation 2015
29
IBM Real-time Compression for File and Block level
For File and Block-level access
• IBM Storwize V7000 Unified
For Block-only access
• SAN Volume Controller
• Storwize V7000
• FlashSystem V9000
• XIV Storage System – NEW
Storwize V7000
To estimate space savings for file-level storage, use:
Real-time Compression Appliance Scan Tool
To estimate space savings for block-level storage, use: Comprestimator Tool
Storwize V7000 Unified
© Copyright IBM Corporation 2015
IBM Real-time Compression – Estimated Savings
� IBM’s Random-Access Compression Engine (RACE) delivers excellent capacity savings for a variety of data types:
Databases (DB2, Oracle, etc.) ~ 80%
Virtual Servers (Vmware, etc.)
Linux and Windows
Virtual guest images
50% to 70%
Microsoft Office2003 ~ 60%
2007 or later ~ 20%
CAD/CAM Engineering drawings ~ 70%
� IBM Comprestimator tool can be used to evaluate expected compression benefits for specific environments
• This pre-sales tool is available to estimate compression savings, percentage savings shown are typical results, based on client experiences, your mileage may vary.
• http://www14.software.ibm.com/webapp/set2/sas/f/comprestimator/home.html
� 45-day Free Trial of Compression available
Source: IBM internal tests and field resuls 30
Compression Acceleration Cards –Intel® QuickAssist Technology
Intel QuickAssist technology integrated into new Compression Acceleration cards
� Used to offload the LZ compression and decompression processing
� Each node supports up to two Compression Acceleration cards
� SVC uses 4 parallel compression threads per card
To use compressed volumes, nodes require at least:
� SVC 2145-DH8 or next generation Storwize V7000
� 64GB of Cache Memory per node
� One Compression Acceleration card
When compression is enabled
� 38GB is used as a Compression Cache
Optionally upgrade each node to contain second Compression Acceleration card
� Upgrade recommended when normal data working set > 32TB
31
Lower Cache
7.3.0 Software Stack
RAID
New Dual Layer Cache
Architecture
� First major update to cache since 2003
� Flexible design for plug and play style cache algorithm enhancements in the future
� “SVC” like L2 cache for advanced functions
Upper Cache – simple
write cache
Lower Cache – algorithm
intelligence
� Understands mdisks
Shared buffer space
between two layers* Only 4F2 hardware limited to running no later than 5.1 Software due to 32bit CPU
SCSI Initiator
Forwarding
Fibre Channel
iSCSI
FCoE
SAS
PCIe
Compression
Upper Cache
FlashCopy
Virtualization
Mirroring
Thin Provisioning
Forwarding
Forwarding
Easy Tier 3
Co
nfig
ura
tion
Pe
er C
om
mu
nic
atio
ns
Inte
rface
La
yer
Clu
ste
ring
SCSI Target
Replication
New
New
New
32
Store more IOPS Response time
Real Time Compression
[RtC]
store more Limited effect Limited effect
Auto Tiering
[Easy Tier and Flash Technology] No effect More IOPS Faster response
Turbo Compression
[RtC + Easy Tier and Flash Technology] store more More IOPS Faster response
+
=
Turbo Compression may double the net usability of existing Infrastructures
Turbo Compression Explained
Turbo Compression tests
Oracle TPC-C (07/2013)
[2 % Flash Capacity]
4x
Compression
2.1 x
IOPS Throughput
½ x
Response time
at a fraction of the cost of traditional means33
Turbo Compression for Tiered Flash/Disk Pools
•Easy Tier (no compression)
•1 Volume 100 GB
• 4% Flash (4GB) � 23% of IOPS
(assumption : skew = 7)
HDD Tier: 77% of IOPS
•Compression (RtC)
(assumption: 66% savings)
• 12% compressed data fits in 4 GB
• 12% data � 60% of IOPS
• HDD Tier: 40% of IOPS
•Turbo Compression
• Pool IOPS capability nearly doubled without adding any Flash
0%
20%
40%
60%
80%
100%
120%
0% 20% 40% 60% 80% 100%
I
O
%
Go %
RtC
4%
23%
60%
12% Capacity %
Cumulative IOps vs. Capacity
TC
34
© Copyright IBM Corporation 2015
35
Fully-allocatedor Thin-provisionedvolume
Volume mirror
Only non-zero blocks copied
Copy 0 Copy 1
Compressedvolume
Compressing Existing Data
© Copyright IBM Corporation 2015
XIV Compression & Snapshot Views
� Comprestimator tool built into IBM XIV 11.6 GUI
� Right click to compress volume
� Snapshot usage now reporting per volume36
© Copyright IBM Corporation 2015
37
Compression
Advantages
Can be used for data transmission, tape and disk data
�Supports both file-based and block-based disk storage
Real-time compression can be used with Databases, CAD/CAM and Virtual Machines with no impact to application performance
Can offer up to 80% data footprint reduction savings
Real-time Compression is “Dedupe-Friendly” and combines well with Thin Provisioning
Objections
�Some implementations are post-process
� Stores uncompressed data first, compresses later
�Other implementations impact application performance and/or consume substantial CPU resources
�Benefits vary by data type, and whether applications do their own compression or encryption
� Your mileage may vary
Summary
• Data Footprint Reduction technologies have been around for many years
• Algorithms are stable, mature, and well-understood by the IT industry
• Data is returned byte-for-byte identical to what was originally stored
• Implementations between vendors and products can vary greatly
• IBM’s implementations tend to have faster performance, offer better scalability, are easier to use and less expensive TCO
© Copyright IBM Corporation 2015 39
Some great prizes
to be won!
Please fill out an evaluation!
Session: sDE2784
© Copyright IBM Corporation 2015 41
IBM Tucson Executive Briefing Center
• Tucson, Arizona is home for storage hardware and software design and development
• IBM Tucson Executive
Briefing Center offers:
• Technology briefings
• Product demonstrations
• Solution workshops
• Take a video tour
• http://youtu.be/CXrpoCZAazg
42
About the Speaker
Tony Pearson is a Master Inventor and Senior managing consultant for the IBM System Storage™ product line. Tony joined
IBM Corporation in 1986 in Tucson, Arizona, USA, and has been there ever since. In his current role, Tony presents briefings
on storage topics covering the entire System Storage product line, and topics related to Cloud, Analytics and Social media. He
interacts with clients, speaks at conferences and events, and leads client workshops to help clients with strategic planning for
IBM’s integrated set of storage software, hardware and virtualization products.
Tony writes the “Inside System Storage” blog, which is read by hundreds of clients, IBM sales reps and IBM Business Partners
every week. This blog was rated one of the top 10 blogs for the IT storage industry by “Networking World” magazine and #1
most read IBM blog on IBM’s developerWorks. The blog has been published into a series of books, Inside System Storage:
Volumes I through V.
Over the years, Tony has worked in development, marketing and consulting positions for various storage hardware and
software products. Tony has a Bachelor of Science degree in Software Engineering, and a Master of Science degree in
Electrical Engineering both from the University of Arizona. Tony holds 19 IBM patents for inventions on storage hardware and
software products.
9000 S. Rita Road
Bldg 9032 Floor 1
Tucson, AZ 85744
+1 520-799-4309 (Office)
Tony Pearson
Master Inventor,
Senior IT Specialist
IBM System Storage™
© Copyright IBM Corporation 2015
Email:[email protected]
Twitter:twitter.com/az99Øtony
Blog: ibm.co/Pearson
Books:www.lulu.com/spotlight/99Ø_tony
IBM Expert Network on Slideshare:www.slideshare.net/az99Øtony
Facebook:www.facebook.com/tony.pearson.16121
Linkedin:www.linkedin.com/profile/view?id=103718598
Additional Resources from Tony Pearson
43
© Copyright IBM Corporation 2015
Continue growing your IBM skills
ibm.com/training provides acomprehensive portfolio of skills and careeraccelerators that are designed to meet all your training needs.
• Training in cities local to you - where and when you need it, and in the format you want• Use IBM Training Search to locate public training classes
near to you with our five Global Training Providers
• Private training is also available with our Global Training Providers
• Demanding a high standard of quality –view the paths to success• Browse Training Paths and Certifications to find the
course that is right for you
• If you can’t find the training that is right for you with our Global Training Providers, we can help.• Contact IBM Training at [email protected]
44
Global Skills Initiative
© Copyright IBM Corporation 2015
Trademarks and Disclaimers
Adobe, the Adobe logo, PostScript, and the PostScript logo are either registered trademarks or trademarks of Adobe Systems Incorporated in the United States, and/or other countries. IT Infrastructure Library is a registered trademark of the Central Computer and Telecommunications Agency which is now part of the Office of Government Commerce. Intel, Intel logo, Intel Inside, Intel Inside logo, Intel Centrino, Intel Centrino logo, Celeron, Intel Xeon, Intel SpeedStep, Itanium, and Pentium are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Linux is a registered trademark of Linus Torvalds in the United States, other countries, or both. Microsoft, Windows, Windows NT, and the Windows logo are trademarks of Microsoft Corporation in the United States, other countries, or both. ITIL is a registered trademark, and a registered community trademark of the Office of Government Commerce, and is registered in the U.S. Patent and Trademark Office. UNIX is a registered trademark of The Open Group in the United States and other countries. Java and all Java-based trademarks and logos are trademarks or registered trademarks of Oracle and/or its affiliates. Cell Broadband Engine is a trademark of Sony Computer Entertainment, Inc. in the United States, other countries, or both and is used under license therefrom. Linear Tape-Open, LTO, the LTO Logo, Ultrium, and the Ultrium logo are trademarks of HP, IBM Corp. and Quantum in the U.S. and other countries.
Other product and service names might be trademarks of IBM or other companies. Information is provided "AS IS" without warranty of any kind.
The customer examples described are presented as illustrations of how those customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics may vary by customer.
Information concerning non-IBM products was obtained from a supplier of these products, published announcement material, or other publicly available sources and does not constitute an endorsement of such products by IBM. Sources for non-IBM list prices and performance numbers are taken from publicly available information, including vendor announcements and vendor worldwide homepages. IBM has not tested these products and cannot confirm the accuracy of performance, capability, or any other claims related to non-IBM products. Questions on the capability of non-IBM products should be addressed to the supplier of those products.
All statements regarding IBM future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only.
Some information addresses anticipated future capabilities. Such information is not intended as a definitive statement of a commitment to specific levels of performance, function or delivery schedules with respect to any future products. Such commitments are only made in IBM product announcements. The information is presented here to communicate IBM's current investment and development activities as a good faith effort to help with our customers' future planning.
Performance is based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput or performance that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput or performance improvements equivalent to the ratios stated here.
Prices are suggested U.S. list prices and are subject to change without notice. Starting price may not include a hard drive, operating system or other features. Contact your IBM representative or Business Partner for the most current pricing in your geography.
Photographs shown may be engineering prototypes. Changes may be incorporated in production models.
© IBM Corporation 2015. All rights reserved.References in this document to IBM products or services do not imply that IBM intends to make them available in every country.
Trademarks of International Business Machines Corporation in the United States, other countries, or both can be found on the World Wide Web at http://www.ibm.com/legal/copytrade.shtml.
ZSP03490-USEN-00
45
Top Related