Data Footprint Reduction: Understanding IBM Storage Options

Click here to load reader

  • date post

    14-Jan-2015
  • Category

    Technology

  • view

    856
  • download

    1

Embed Size (px)

description

sSE20 presented at IBM Edge 2012 conference

Transcript of Data Footprint Reduction: Understanding IBM Storage Options

  • 1. sSE20 Data Footprint Reduction: Understanding IBM Storage Efficiency Options Tony Pearson Master Inventor and Senior Managing Consultant, IBM Corp Sanjay S Bhikot Advisory Unix and Storage Administrator, Ricoh Americas Corp#IBMEDGE 2012 IBM Corporation

2. Data Footprint Reduction is thecatch-all term for a variety oftechnologies designed to helpreduce storage costs. This sessionwill cover thin provisioning, space-efficient copies, deduplication andcompression technologies, anddescribe the IBM storage productsthat provide thesecapabilities.#IBMEDGE 2012 IBM Corporation 3. Sessions -- Tony Pearson Monday 1:00pm Storing Archive Data for Compliance Challenges 4:15pm IBM Watson: What it Means for Society Tuesday 4:15pm Using Social Media: Birds of a Feather (BOF) Wednesday 9:00am Data Footprint Reduction: IBM Storage options 2:30pm IBMs Storage Strategy in the Smarter Computing era 4:15pm IBM SONAS and the Cloud Storage Taxonomy Thursday 9:00am IBM Watson: What it Means for Society 10:30am Tivoli Storage Productivity Center Overview 5:30pm IBM Edge Free for All hosted by Scott Drummond3#IBMEDGE 2012 IBM Corporation 4. Agenda Thin Provisioning Space-Efficient Copy Data Deduplication Compression#IBMEDGE 2012 IBM Corporation 5. History of Thin Provisioning The StorageTek Iceberg 9200 Array Introduced Thin1997 Today Provisioning on slower 7200RPM drives for mainframe systems Thin Provisioning isavailable for manyoperating systems 1994on IBM storage,including DS8000,IBM resold this as XIV, SVC, N series, the RAMAC VirtualStorwize V7000, Array (RVA) forDS3500 and mainframe servers DCS37005#IBMEDGE 2012 IBM Corporation 6. Why Space is Over-Allocated Scenario 1 Scenario 2 Space requirements Space requirements under-estimatedover-estimated Running out of space Capacity lasts for years requires larger volume No data migration New request may take No application outages weeks to accommodate No penalties Application outage if not addressed in time Data must be moved to When faced with this dilemma, the larger volumemost will err on the side of over-estimating Application outage during data movement 6#IBMEDGE 2012 IBM Corporation 7. Fully Allocated vs. Thin ProvisionedAllocated but unused spacededicated to this host,wasted until written toHost sees fullyallocated amountActual data writtenEmpty space available to othersPhysical Space AllocatedHost sees fullvirtual amountActual data written7#IBMEDGE 2012 IBM Corporation 8. Fully Allocated vs. Thin Provisioned Volume/LUN one or more extentsHost sees a volumeor LUN that consists Extent Allocation Unitof blocks numbered One or more grains0 to nnnnnnnnnn Grain range of 1 or more blocks Block typically 512 or 4096 bytes 8#IBMEDGE 2012 IBM Corporation 9. Coarse and Fine-Grain9Block 00, 55, and 99 written8Fully Allocated, all 10 extents allocated Coarse-Grain, only 3 extents allocated7Fine-Grain, only 1 extent allocated65Grain 00-014 Grain 90-99 = extent3 Grain 54-5529Grain 98-9915000 1 2 3 4 5 6 7 8 90 1 2 3 4 5 6 7 8 9 Fully Allocated Coarse-GrainFine-Grain9#IBMEDGE 2012 IBM Corporation 10. How IBM has implemented TPIBM DS8000 IBM XIV SVC and DS3500, StorwizeDCS3700 V7000TypeCoarse FineFineFineAllocation1 GB 17 GB 16MB to 4 GBUnit 8GBGrain size 1 MB32-256 KB 64 KB 10#IBMEDGE 2012 IBM Corporation 11. Thick-to-Thin MigrationVolumeFully-allocatedmirrorThin-volumeprovisionedvolumeCopy 0Copy 1Only non-zero blocks copied 11#IBMEDGE 2012 IBM Corporation 12. Empty Space ReclaimThin Provisioning, allocations in 17GB units, with1MB chunks (grains). Only non-zero blocks consumephysical space.Avoid writing empty blocks, any I/O request thattries to write a block of all zeros to unallocated spaceis ignored.Background task to find empty chunks, abackground task scans all blocks, looking for chunkscontaining all zeros.Empty space reclaimed empty chunks arereturned to unallocated space, so that it can be usedfor other volumes 12#IBMEDGE 2012 IBM Corporation*** IBM Confidential until July 12, 2011 *** 13. Thin ProvisioningPros Cons Just-in-Time increased Not all file systems utilization percentage cooperate or friendly Eliminates the pressure toDeletion of files does not make accurate space free space for others estimates sdelete writes zeros over deleted file space Dynamically expand volume without impacting Some implementations may applications or rebootingimpact I/O performance server May not support same set Reduces the data footprint of features, copy services, and lowers costs or replication Shifts focus from volumesWriting checks you cant to storage pool capacity cash 13#IBMEDGE 2012 IBM Corporation 14. Agenda Thin Provisioning Space-Efficient Copy Data Deduplication Compression#IBMEDGE 2012 IBM Corporation 15. History of Space-Efficient Copies 1997TodayNetApp introducesSpace-Efficient CopySnapshot in itsis available on manyWAFL file system IBM storage systems, 1993including DS8000, XIV,SVC, N series, IBM Enterprise Storwize V7000, Storage ServerDS3500, DS5000 and(ESS) introducesDCS3700 NOCOPY parameteron FlashCopy 15#IBMEDGE 2012 IBM Corporation 16. Space-Efficient Copies 300 GBSource Traditional CopiesDestination 1 Destination 2 Destination 3 100 GB allocated40 GB writtenSpace-Efficient Copies. 10% reserved 30 GB16#IBMEDGE 2012 IBM Corporation 17. Method 1: Copy on Write (COW) SourceDestination Copy-On-Write (COW) Copy is set of pointers to Block A B C Doriginal data Write to original volume: Pause I/O SourceDestination Copy original block of data to destination Update original block Block A BC2 DC Slows performance May limit # of destinationcopies Can be combined withbackground copy for a fullcopy17#IBMEDGE 2012 IBM Corporation 18. Method 2: Redirect on Write (ROW) SourceDestination Redirect-On-Write (ROW) Copy is set of pointers to Block A B C D original data Write to original volume: Re-directed to new empty SourceDestinationspace Previous data left alone Block A BC D C2 Does not impact performance Supports many destination copies 18#IBMEDGE 2012 IBM Corporation 19. Space-Efficient CopiesPros Cons Supports both Some implementations Fully-allocated and may impact I/O Thin-Provisioned Sourcesperformance Reduces the data footprintRequires that you and lowers costsestimate the maximum Allows you to keep more percentage changed copies online Typically 10-20 % Allows you to take copies Exceeding the reserved more frequently space invalidatesCan be used as destination copycheckpoint copies duringbatch processing19#IBMEDGE 2012 IBM Corporation 20. Agenda Thin Provisioning Space-Efficient Copy Data Deduplication Compression#IBMEDGE 2012 IBM Corporation 21. History of Data Deduplication Advanced Single Today 2008 Instance Store (A-SIS) bring deduplication for the IBM N series andIBM offers a variety of NetApp disk storage choices, including ProtecTIER, N series,and Tivoli Storage 2007 Manager (TSM v6) IBM acquires Diligent and introduces theProtecTIER TS7600 virtual tape library withdata deduplication 21#IBMEDGE 2012 IBM Corporation 22. Data Deduplication Data deduplication reduces capacity requirements by only storing one unique instance of the data on disk and creating pointers for duplicate data elements22#IBMEDGE 2012 IBM Corporation 23. Deduplication reduces diskrequired for backup copies23#IBMEDGE 2012 IBM Corporation23 24. Two Primary Data DeduplicationApproachesHash based HyperFactor Deduplication A different approach Sometimes referred to based on an agnosticas a Content view of data Addressable Storageapproach24 #IBMEDGE 2012 IBM Corporation24 31-May-12 25. Hash-Based Approach 1. Slice data into chunks (fixed or variable) ABC D E 2. Generate Hash per chunk and saveAh Bh Ch Dh Eh 3. Slice next data into chunks and look for Hash Match ABC D E 4. Reference data previously stored25 #IBMEDGE 2012 IBM Corporation25 31-May-12 26. HyperFactor Approach 1. Look through data for similarity New Data Stream 2. Read elements that are most similar 3. Diff reference with version will use several elementsElement A Element BElement C 4. Matches factored out unique data added to repository 26 #IBMEDGE 2012 IBM Corporation2631-May-12 27. Assessment of Hash-basedApproachesExample: Imagine a chunk size Applicable for all chunkingof 8 KBmethods 1 TB repository has Hash Table in Memory~125,000,000 8 KB chunks Overhead for in-band deduplication Each hash is 20 bytes long Hash table will grow with data volume Need pointers scheme to Growing hash-table may becomereference 1 TBperformance bottleneckThe hash-table requires 2.5 GB Scalability issuesRAM no issue Hash-Collisions must be handled Hash table must be protectedWith a 100 TB repository One copy might not be sufficient ~250 GB of RAM is required 27#IBMEDGE 2012 IBM Corporation 28. When Deduplication Occurs1. In-line Processing As data is received by the target device it is Deduplicated in real time Only unique data stored on disk Data written to the disk storage is deduplicated2. Post-Processing As data is received by the target device it is Temporarily stored on disk storage Data is subsequently read back in to be processed by a deduplication engine28#IBMEDGE 2012 IBM Corporation 29. Comparison of OfferingsHash-based HyperFactorIn-line Other vendorsIBM ProtecTIERProcessTS7680G TS7650G TS7650 TS7620 Express TS7610 ExpressPost- IBM Tivoli StorageProcess Manager (TSM) N series 29#IBMEDGE 2012 IBM Corporation 30. IBM ProtecTIER with HyperFactor Gateways Attaches up to 1PB of disk Two models: TS7680 for IBM System z TS7650G for distributed systems Appliances Disk included inside Three models for distributedsystems TS7650 in three sizes TS7620 (New!) TS7610 ... in two sizes 30#IBMEDGE 2012 IBM Corporation 31. ProtecTIER vs.Tivoli Storage Manager Both Solutions Offer the Benefits of Target side Deduplication: Greatly reduced storage capacity requirements Lower operational costs, energy usage and TCOComplementa