Centera Vs Filesystem

download Centera Vs Filesystem

of 8

Transcript of Centera Vs Filesystem

  • 8/3/2019 Centera Vs Filesystem

    1/8

    1 of 8

    T E C H N O L O G Y B R I E F

    Copyright The TANEJA Group, Inc. 2009. All Rights Reserved87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com

    Archiving Beyond File Systems: Object Storage

    EMC Centera And Disk Archival

    January 2009

    Disk-based archiving answers many challenges in an organization, but thisstrong trend also creates questions for end users. Specifically, because of theirdistinct architectural approaches. Taneja Group sees some confusion regardingthe question of whether to deploy a traditional file system or an object storage(e.g. Content Addressed Storage, or CAS) approach in support of an enterprise

    archival initiative. While it may not appear critical at initial deployment, we believe that thewrong choice in the file system vs. object storage question will lead to far-ranging challengesthat compound over the course of an archives lifetime.

    Taneja Group has spent significant time researching object storage archiving and we firmlybelieve that Content Addressed Storage provides differentiated business value and a lower totalcost of ownership over traditional file system based approaches for long-term online diskarchival requirements. In this brief we will examine the world of file system based archiving,then provide a comparative look at the advantages of a CAS solution such as EMC Centera.

    Changed Game: Disk Archival

    Taneja Group has spent many hoursspeaking with both prospective and existingdisk archival end users. Across all of theseinteractions, one commonality comesthrough clearly: disk archival has changedthe game with fundamentally uniquerequirements that distinguish it from thetape and optical world. We find that someend users come to this realization early intheir selection process while others discover

    after their initial deployment that they have anew kind of beast on their hands.

    Some of the key characteristics that we seedefining the unique and emergingrequirements of disk archival can besummarized as follows

    Hyper-scalability. As disk-basedarchiving becomes the preferredmethod for long-term contentpreservation, we have seen the needfor unprecedented scalability reachinginto the tens, hundreds and thousandsof terabytes. We observe that thesescalability growth rates are furthercompounded as some administratorsare retaining as much content aspossible in the readily accessible diskmedium as opposed to sending data to

    an offline static archival on tapemedia. The speed and ease of use ofdisk-based archives has in fact made itpractical for administrators to createwhat Taneja Group defines as Active Archives, disk-based archives wherean organizations information is likelyto be retained for long periods of time

  • 8/3/2019 Centera Vs Filesystem

    2/8

    2 of 8

    T E C H N O L O G Y B R I E F

    Copyright The TANEJA Group, Inc. 2009. All Rights Reserved87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com

    is moved to the archive much earlierthan ever before and it is used fromthere rather than from primarystorage. By doing so, the ActiveArchive becomes a very cost effectiveextension of primary storage allowingan organization to better manageprimary storage capacity utilizationand reduce the overall cost of storageand its management.

    Centralized archives. A properlyarchitected disk-based archivechanges stored data into a readilyavailable, highly usable informationasset. Because of this fact, we haveseen organizations increasinglyapproaching their disk archives froman infrastructure-wide perspective.Specifically, we observe the trend thatorganizations want to deploy acentralized archiving platform in

    support of all relevant businessoperations. This trend towardscentralized archives is driven by anumber of factors, including total costof ownership, internal governance,regulatory compliance, and storageconsolidation projects across anorganization. We have examined thatin a high-growth disk archive, thealternative approach of supportingindividual archive silos on a per-

    application basis has proven itself to be fundamentally unmanageable asthese repositories grow in capacityover time.

    Dynamic application support.Because disk-based archiving oftentouches many applications (e.g.

    content management, email, file data,proprietary applications) disk-basedsolutions must be able to provide anabstracted view into all of thesupported applications in a seamlessfashion. This manner of dynamicapplication support has beenhistorically absent in disk-basedarchiving solutions that instead werestructured as application silos, each with their own archival content

    associations.

    Going further, we have observed thatdisk-archiving solutions areincreasingly required to supportmultiple views across all of theseapplications, providing the end users with the ability to perform complex,simultaneous queries for data basedon a range of programmable, business-relevant characteristics (e.g.

    various content attributes, usagehistory, and application associations.)

    Long-term online. One of theinteresting but little noted qualities ofdisk-based archiving is its tendency to become an attractor for more andmore archival content. Regularly, wespeak with end users who share thattheir growth rates in disk archiveshave exceeded their best projections

    prior to deployment. Uponexamination, the reasons becomeclear: disk-based archives, because oftheir online and always availablestatus, transform an organizationstraditional relationship with archivedcontent. Specifically, disk archiveshave enabled users to access and

  • 8/3/2019 Centera Vs Filesystem

    3/8

    3 of 8

    T E C H N O L O G Y B R I E F

    Copyright The TANEJA Group, Inc. 2009. All Rights Reserved87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com

    retrieve stored content within thecontext of their normal usagepatterns. The historical retrieval gapthat prevented offline and nearlinearchive content from playing an activerole in real-time business has beenremoved.

    As a result, today, archivedinformation is playing a more strategicrole in workflows and business

    processes. With this increased accessto information, the data repositoriesare growing at a accelerated rate withan ever-increasing requirement forimmediate access. Our clientengagements show that this generalalways on quality of disk-basedarchiving will persist over the lifetimeof the archive, creating the challengingrequirement that solutions be bothsupportable over many decades and

    still always available to users, ondemand.

    The File System Challenge

    Given the unique characteristics of diskarchiving outlined above, it is no wonder that we see increasing numbers of end usersasking serious questions regarding the abilityof their traditional file systems to deploy,scale, and manage disk archives effectively.

    The various questions regarding file systemsresult from one core technical issue:traditional file systems access and managedata in a hierarchical fashion, withsignificant dependencies on both theapplication and operating systems withwhich they are associated. As a result of that

    decades-old design principle, traditional filesystems face undeniable challenges when itcomes to supporting an enterprise diskarchive with the profile provided above.Taneja Group has grouped these challengesinto three general categories that weencourage end users to consider in their diskarchival evaluation process.

    Challenge: File System Lock-InBecause file systems straddle the kernel and

    user levels of a computing system, theycreate necessary dependencies on both theoperating systems (OS) and applications oftheir hosts. Over the years, these OS andapplication dependencies have fosteredsophisticated software innovations that haveabstracted file systems in appropriate anduseful ways (e.g. cluster file systems, virtualmachines, application clustering.)

    However, when placed in the context of

    todays disk-based archiving demands, thesesophisticated augmentations to file systemsare of little to no assistance in freeing thearchive from lock-in to a specificapplication and OS.

    Specifically, the challenge resides in how filesystems store and retrieve data. File systemsstore data in a hierarchical fashion, alwaysrelying on the datas placement within a fileand directory structures for its storage and

    retrieval. As a result of this approach,traditional file systems cannot create anabstraction layer for archival data that treatsstored data as an independent data object. Inother words, all data stored via a file systemis tightly associated with both its applicationand the OS that supports it.

  • 8/3/2019 Centera Vs Filesystem

    4/8

    4 of 8

    T E C H N O L O G Y B R I E F

    Copyright The TANEJA Group, Inc. 2009. All Rights Reserved87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com

    In the context of long-term disk archiving,this tight coupling of application and OScreates lock-in challenges on two fronts:first, it represents a management challengefor archiving content across multipleapplications (and operating systems) in acentralized manner. Second, file systemspose a viability risk to the archive over timeas they obsolesce along with applications andoperating systems, thereby forcingobsolescence onto the captive archived data.

    Challenge: File System Growth As a file system grows in relation to itsoperating system and application, iteventually encroaches on the outer bounds ofits available address space for storing data.The practical implication of hitting thisboundary is a noticeably negative impact onperformance. This is a very common ITconcern, and it is especially well known toanyone who has ever faced a growing

    departmental file server. With todaysdominant enterprise file systems (e.g. NTFSfor Windows environments and the variousLinux-based file systems), the maximumaccessible limit hovers effectively around 2terabytes per file system. Before reachingthat capacity boundary, users will proactivelyextend their production environment into anew file system that provides a new addressspace onto which data can be stored.

    The requirement to migrate a productionenvironment to a new file system is typicallya time-consuming and manually intensivetask. In the context of disk-based archiving,this manner of file system growthmanagement quickly becomes untenable. With archives that regularly range into themultiple terabytes in size and continue along

    that growth trajectory, the need tocontinually manage the scaling andmigration of multiple file systems and theirassociated applications constitutes a massivechallenge.

    Challenge: File System AccessWhen a user establishes a given file system asan interface into an archival pool, they havemade a commitment to begin layering datainto increasingly complex hierarchies. Even

    when that single archiving file system ispresented to multiple applications through anetwork mount (e.g. a NFS or CIFSinterface), it still represents a unified, deephierarchy of directory and file data. As thearchive grows, the file system will have toexpend increasingly more time performingdeep queries into its directories to extractdata. More critically, the data being stored isfrozen in its relation to both its applicationand the other data stored around it.

    This tight coupling prevents the file systemfrom being able to easily support dynamicdata views into the environment acrossmultiple applications and operating systems.Based on our client work, Taneja Group hasseen that the true business value of disk-based archiving is derived from the ability ofmultiple archiving applications (e.g. contentmanagement, email, voice & videorecordings, medical images, proprietary

    applications, file data, etc) to communicate with each other in a seamless fashion. Forthis reason, we are confident that therestricted access flexibility of a traditional filesystem approach is increasingly unacceptableto an organizations end users.

  • 8/3/2019 Centera Vs Filesystem

    5/8

    5 of 8

    T E C H N O L O G Y B R I E F

    Copyright The TANEJA Group, Inc. 2009. All Rights Reserved87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com

    Challenge: File System BackupFile systems in an archive solution have allthe management challenges alreadydiscussed and no built in mechanism forassuring content integrity and authenticity.As such, file systems can be easily corrupted.Knowing this, a common best practice is toconduct frequent backups, which furtheradds cost and complexity to the management burden of using file systems for archiving. With object storage approaches increasingly

    common, the advantages of this end-to-enddata integrity and authenticity have becomeobvious to end users.

    Object Approaches to Archiving

    Looking beyond traditional file system basedapproaches to disk archiving, what else isavailable? Taneja Group knows that viablealternatives are in the market. In particular,a distributed object storage approach to disk

    archiving has been in use by manyorganizations for over half a decade. Becauseof its strikingly different architecture andadditional use cases, the implications ofobject storage archiving are now clearlycomprehended by the enterprise community.

    We have seen that the difference in approachis exemplified by the market-defining EMCCentera archival appliance. Centera utilizes adistributed object software model known as

    Content Addressed Storage (CAS). CAS- based archiving differs from traditional filesystem-based approaches in several keyrespects that have had a profound impact forall deployments. Most notably for thisdiscussion, CAS does not utilize traditionalfile systems, nor does it need to utilizespecified storage media, nor does it require

    kernel level integration with hostapplications. Clearly, the compoundingeffect of these differences add up to afundamentally different kind of archivearchitecture and a lower total cost ofownership. However, the most salient,driving difference resides in how CAS storesand retrieves data. In other words, what CASdoes instead of using a hierarchical filesystem.

    To assist with educating an organization withcutting through the complexity in evaluatingpotential CAS-based solutions versustraditional file systems, we have summarizedthe following points of differentiationbrought to the table by CAS:

    CAS: Flat address spaceUnlike traditional file systems, CAS does notrely on a hierarchical scheme of directoriesand files to organize data. Rather, such

    solutions rely on unique hash-codeidentifiers (a digital fingerprint) specific toeach unique content element. This content- based addressing schema that encapsulatesentire files or sets of data independently fromany file system enables CAS to create whatTaneja Group calls archival objects. Wedefine archival objects as digital assets thathave been processed by an object-basedaddressing technology and enhanced withmetadata attributes that enable the asset to

    be utilized as an independent resource. WithCAS, a unit of data and its metadata areinextricably linked, and captured as a uniqueobject stored within a flat address space. Themost important results of storing archivalobjects in this flat address space are (1) thecontent authenticity of archived objects isassured and (2) the archived objects are now

  • 8/3/2019 Centera Vs Filesystem

    6/8

    6 of 8

    T E C H N O L O G Y B R I E F

    Copyright The TANEJA Group, Inc. 2009. All Rights Reserved87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com

    abstracted and independent of theirapplication and operating systemassociations. This translates into highflexibility with regard to the number and typeof applications and operating systems withwhich CAS can be deployed.

    By freeing archival storage from theconstraints of hierarchical, limited capacityfile systems, CAS reduces administrativecomplexity. Moreover, since CAS removes

    file system complexity and fragility, andincreases the integrity of stored data objects,organizations can rely solely on replicationfor disaster recovery, and negate on-sitearchive backup. As a result of this one-twopunch against management overhead, TanejaGroup has observed cases whereorganizations can easily manage magnitudesmore archived information using a CASsolution vs. tape, optical or traditional filesystem based storage. In one observed case it

    was greater than 100 times moreinformation.

    CAS: A Single Instance StoreCAS Metadata is specific to each users use ofthe content, yet points to the same piece ofunique content. The result can be dramaticreductions the quantity of storage requiredfor an archive.

    CAS: Metadata

    By storing metadata about content use,applications can often complete giveninformation requests by searching thestorage-based metadata and never open thecontent objects. The result is increasedapplication performance. More profound isthe ability to do cross-applicationinformation queries without using

    application cycles. This is possible because(1) content and metadata stored within CASis application, file and operating systemindependent, (2) metadata is searchable and(3) specific to EMC Centera CAS there is asearch engine available in the repository.Easy cross-application querying providesimmense benefits for day-to-day business,governance and compliance.

    CAS: Application level access

    Because of the unique content-basedaddressing approach of CAS solutions, theyare able to integrate directly with applicationenvironments via APIs. Unlike file systemsthat have kernel level dependencies on theoperating system, CAS solutions extend theirarchival support cleanly within the userspace of a given application. There areseveral significant impacts of this designapproach: first, it means that multipleapplications can simultaneously leverage the

    same centralized CAS archival storageinfrastructure. Second, it means that veryspecific archiving management attributes(e.g. aging of data, protection of data, andaccess to data) can be executed on a per-applicationbasis. These capabilities create acomplete chain of information custody,allowing data to be completely controlled,managed and authenticated after leaving theprimary application. These are capabilitiesnot native to traditional file system archival

    approaches.

    CAS: Media IndependenceFile systems and the operating systems onwhich they depend are designed and certifiedfor deployment with specific disk types (e.g.SCSI, ATA,) and protocols (e.g. Fibre-channel, iSCSI). By contrast, CAS based

  • 8/3/2019 Centera Vs Filesystem

    7/8

    7 of 8

    T E C H N O L O G Y B R I E F

    Copyright The TANEJA Group, Inc. 2009. All Rights Reserved87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com

    archiving solutions are truly mediaindependent. Because CAS leverages anobject-based model for its indexing, itremains neutral to any storage media onwhich it resides. The implications for a long-term online disk archival are therefore verysignificant: When a CAS archival solution isdeployed, it can migrate to new storagemedia over time without disturbing theintegrity of the archived objects. For long-term disk-based archiving, this represents

    significant risk mitigation and investmentprotection that is not readily achievable withtraditional file system archiving solutions.

    CAS: High Scalability With traditional archive solutions, scalinginto higher storage capacities over timerequires a constant awareness of the status ofthe file system versus remaining availableaddress space. As the file system reaches itsmaximum capacity, administrators must

    expand the entire file system silo(operating system, file system, application) inorder to scale the archive. By contrast, CAS- based archival solutions can expand in anopen fashion into extremely high capacities(multiple petabytes) due to their flat addressspace. In addition, because CAS solutions canabstract themselves across multipleapplications and storage media, they enablevery granular and dynamic online scaling totake place for both application hosts and

    storage capacities, each according to theirimmediate demands.

    CAS: Self-managingManagement of the archive infrastructureconstitutes a major point of differentiationbetween the CAS object-model approach andtraditional file systems. With file system-

    based archives, the administrator faces afamiliar range of tasks in deployment,recovery, migration, and changemanagement of the silo. By contrast, CAS- based approaches leverage their non-hierarchical architecture to distributemanagement controls across the entirearchive infrastructure. For example, if aCentera disk or node fails, the archive clusterknows how to self heal without manualintervention. This distributed management

    structure extends to cover the deployment,scaling, recovery and protection of all thearchival objects being stored by Centera. As aresult of this approach, Centera removes asignificant number of mundane touchesfrom the disk-based archive that still exist with traditional file system basedapproaches. As an archive scales to highercapacities with more applicationassociations, these self-managing qualities ofCAS add up to a meaningful increase in

    overall environment efficiency.

    Considered together, these qualities of CASdemonstrate that there are distinctadvantages to creating disk-based archivesoutside of traditional file systems.

    Taneja Group Opinion

    We know very well the challenges that endusers face in the deployment of disk archives.End users need to ask whether or not they

    desire a disk-based archive that provideshigh levels of scale, is readily available, cansurvive for long durations, and possessesminimal management requirements. Forend users that satisfy those criteria, they willfind traditional file system-based approachesto disk archiving inadequate.

  • 8/3/2019 Centera Vs Filesystem

    8/8

    8 of 8

    T E C H N O L O G Y B R I E F

    Copyright The TANEJA Group, Inc. 2009. All Rights Reserved87 Elm Street, Suite 900 Hopkinton, MA 01748 Tel: 508-435-5040 Fax: 508-435-1530 www.tanejagroup.com

    As indicated above, the Taneja Group hasobserved there are many critical advantagesto be gained by leveraging object-basedstorage in the form of CAS disk-basedarchiving solutions, such as EMC Centera. Bystepping outside of the silo-effect created viahierarchical file systems, CAS opens up awide new range of functionality that allows acomplete reconsideration of the role archivalinformation plays in an organization.

    Since wefirst wrote on this subject more than6 years ago, we have observed several things.First, we have seen these distinctions becomeself-evident, as more users adopt and scaleCAS solutions to capacities that clearlydemonstrate the unique capabilities of objectstorage. Second, because of CAS and EMCCentera in particular we have seenorganizations change how they use archiving. When first introduced, disk-based archivesreplaced tape and optical solutions which

    had been relegated to deep archives becauseof their lack of information retrieval speed.These were archives an organization woulduse to store information that they hoped theywould seldom need. However, today we see anew storage dilemma for organizations where archiving is helping. Specifically, for

    organizations that are being asked to store30%. 50% and sometimes 100+% moreinformation with flat or reduced IT budgets,orgnaizations are moving information thatcan be archived much more quickly to thearchive. They are creating what we havealready discussed as Active Archives. These Active Archives further lower anorganizations cost per megabyte to storeinformation at the same time they are beingleveraged to take large quantities of

    information out of the organizations backupstreams. These Active Archives reduce backup costs and simplify the organizationsIT infrastructure because the information nolonger lives on primary storage and no longerneeds to be backed up. However ourobservation is that these organizations onlycreate Active Archives when they areconfident in the robustness, scalability,performance and cost effectiveness of theirarchive platform. With thousands of

    customers and hundreds of PBs of productshipped since its inception, EMC Centera isthe shining example of how organizations areusing object-based storage to create deeparchives and this new generation of archives,Active Archives. .

    . NOTICE: The information and product recommendations made by the TANEJA GROUP are based upon public information and sourcesand may also include personal opinions both of the TANEJA GROUP and others, all of which we believe to be accurate and reliable.However, as market conditions change and not within our control, the information and recommendations are made without warranty ofany kind. All product names used and mentioned herein are the trademarks of their respective owners. The TANEJA GROUP, Inc. assumes

    no responsibility or liability for any damages whatsoever (including incidental, consequential or otherwise), caused by your use of, orreliance upon, the information and recommendations presented herein, nor for any inadvertent errors which may appear in this document.