Huawei HyperImage Technical White Paper.pdf

download Huawei HyperImage Technical White Paper.pdf

of 15

Transcript of Huawei HyperImage Technical White Paper.pdf

  • Doc. code

    HyperImage

    Technical White Paper

    Issue 01

    Date 2012-04-20

    HUAWEI TECHNOLOGIES CO., LTD.

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page2, Total15

    Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved.

    No part of this document may be reproduced or transmitted in any form or by any means

    without prior written consent of Huawei Technologies Co., Ltd.

    Trademarks and Permissions

    and other Huawei trademarks are trademarks of Huawei Technologies Co., Ltd.

    All other trademarks and trade names mentioned in this document are the property of their

    respective holders.

    Notice

    The purchased products, services and features are stipulated by the contract made between

    Huawei and the customer. All or part of the products, services and features described in this

    document may not be within the purchase scope or the usage scope. Unless otherwise

    specified in the contract, all statements, information, and recommendations in this document

    are provided "AS IS" without warranties, guarantees or representations of any kind, either

    express or implied.

    The information in this document is subject to change without notice. Every effort has been

    made in the preparation of this document to ensure accuracy of the contents, but all

    statements, information, and recommendations in this document do not constitute the

    warranty of any kind, express or implied.

    Huawei Technologies Co., Ltd.

    Address: Huawei Industrial Base

    Bantian, Longgang

    Shenzhen 518129

    People's Republic of China

    Website: http://www.huawei.com

    Email: [email protected]

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page3, Total15

    Chapter 1 Overview

    As the requirements of centralized storage applications become stricter, users

    need online protection for data to make the backup window shorter. The snapshot

    technology is effective in preventing data loss on online storage devices. More

    and more storage devices use the snapshot technology. What is a snapshot?

    How does a snapshot work? What is the function of a snapshot in a storage array?

    Defined by the Storage Networking Industry Association (SNIA), a snapshot is a

    fully usable copy of a defined collection of data that contains an image of the data

    as it appeared at the point in time at which the copy was initiated. A snapshot may

    be either a duplicate or a replicate of the data it represents.

    Many technologies can be used to implement the snapshot function. The most

    widely used two are the virtual snapshot technology and the split mirror

    technology. Huawei OceanStor storage arrays support both the virtual snapshot

    technology and the split mirror technology. So, users have multiple ways to

    protect data online. This document only describes the virtual snapshot technology.

    For information on the split mirror technology, see the OceanStor Storage Array

    HyperClone Technical White Paper.

    Chapter 2 Virtual Snapshot of the OceanStor

    Storage Array: HyperImage

    2.1 Basic Work Principle of the HyperImage

    The virtual snapshot of the OceanStor storage array is called HyperImage. The

    HyperImage can generate a virtual consistent image of source LUN at a certain

    point in time, thus quickly obtaining a consistent duplicate of data on the source

    LUN without interruption to services. The duplicate is available immediately after

    being generated. Writing data to or reading data from the duplicate have no

    impact on the source data. Therefore, the snapshot technology can implement

    online backup, data analysis, application tests, and so on. The HyperImage is

    implemented through the combination of the mapping table and the copy-on-write

    technology. The work principle is described as follows:

    1) Writing data to a storage array on which no snapshot is created is the same

    as writing data to a storage array without the snapshot function, that is,

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page4, Total15

    modifications to data are directly written to the disk block where the original

    data is. The original data is overwritten and not saved.

    2) Before being activated in a storage array, the HyperImage must be

    configured properly. First, a snapshot resource pool must be created, and

    then a source LUN must be selected. A snapshot resource pool consists of

    several resource LUNs. When the capacity of the resource pool is

    inadequate, new resource LUNs can be added to the resource pool

    dynamically.

    3) When the HyperImage is activated (this time is called the snapshot point in

    time), the system creates a mapping table. The mapping table records the

    mappings between pieces of original data on the source LUN and their

    physical addresses, and resource LUNs store the original data in various

    disk blocks of the first update after the snapshot point in time. Original data

    of any later update is overwritten rather than saved in the resource LUNs.

    Before new data is written to the source LUN, all address pointers in the

    mapping table point to the source LUN, and resource LUNs are empty, as

    shown in Figure 2-1.

    Figure 2-1 Activating the HyperImage

    4) When new data is to be written to the source LUN, the original data in the

    place where the new data is to be written onto is moved to a resource LUN,

    and the mapping table changes the corresponding mapping at the same time

    and records the new location of the original data. Then, the new data is

    written to the source LUN. The copy-on-write technology is applied in this

    process. As shown in Figure 2-2, when data s is to be written to block 3 of

    the source LUN, data d on the source LUN is moved to a resource LUN first.

    At the same time, the mapping on the snapshot LUN is changed. The

    address pointer points to the new position of data d on the resource LUN.

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page5, Total15

    Then data s is written onto the source LUN. Data on the source LUN is

    changed. Data d that is previously stored on the source LUN at the snapshot

    point in time is now stored on the resource LUN.

    Figure 2-2 Writing data onto the source LUN

    5) If new data is written to the same block of the source LUN later, the system

    checks the mapping table to find that the data in the block at the snapshot

    point in time has been moved to a resource LUN, and then, the new data is

    directly written to the same disk block. The original data is overwritten and

    not moved onto a resource LUN, that is, only one copy-on-write is

    implemented in the same position. Therefore, the data consistency at the

    snapshot point in time is effectively protected. See Figure 2-3.

    Figure 2-3 Overwriting data in the same position since the second update

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page6, Total15

    6) If the data at the snapshot point in time is required to be recovered, the

    recovery can be immediately implemented by the rollback function of the

    snapshot data. Through the rollback, the data in the storage array can be

    recovered to that at the snapshot point in time. Therefore, data is protected

    from being lost even if the source LUN is damaged by misoperations or

    viruses after the snapshot point in time. As shown in Figure 2-3, if data is

    required to be rolled back to that at the snapshot point in time, data d

    replaces data m in the block 3 of the source LUN. It must be noted that the

    snapshot rollback is irreversible. Through the rollback, data can be

    recovered to a specified time, but data processed between the specified time

    and the error time is lost.

    7) When the HyperImage stops, data in the mapping table and resource LUNs

    is emptied, and data at the snapshot point in time becomes unusable.

    Judged from the previous process, after the HyperImage is activated, no matter

    how data is read, written, or changed, data at the snapshot point in time can be

    obtained timely provided that the HyperImage does not stop.

    For some hosts using the cache management mechanism and the OceanStor

    storage arrays, the HyperImage is inadequate to keep data consistency. The

    reason is that some data in the cache of a host using the cache management

    mechanism is possibly not delivered to disks. In this type of applications, the

    HyperImage has to work with the snapshot agent on the host to keep data

    consistency. For the work principle of the HyperImage working with the snapshot

    agent, see the description of application scenarios.

    2.2 Features of the HyperImage

    2.2.1 Zero Backup Window

    The traditional backup reduces the performance of an application server (AS),

    even to an unacceptable level. So traditional backup tasks have to be processed

    when the AS is shut down or the service amount is comparatively small. The

    backup window refers to an interval of time during which a set of data can be

    backed up and that can be accepted by the applications. That is to say, the

    backup window is the maximum downtime accepted by the applications. A

    backup task through the HyperImage can be processed online, and the backup

    window is near to zero. The AS is not required to be down.

    2.2.2 Saving Disk Capacity

    When obtaining a consistent duplicate of the source LUN at the snapshot point in

    time, the user only needs to save the original data of the first update after the

    snapshot point in time on a resource LUN. The capacity of the resource LUN has

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page7, Total15

    nothing to do with that of the source LUN, but is determined by the variation of

    data on the source LUN after the snapshot point in time. The capacity of the

    resource LUN never exceeds that of the source LUN. When the variation of data

    on the source LUN is small, the HyperImage obtains a consistent duplicate of the

    source LUN by using a small disk space. The consistent duplicate can be used by

    other test services.

    Supporting sharing of resource LUNs (originated with Huawei), the OceanStor

    storage array enables the snapshot services of the entire system to occupy less

    disk capacity. Sharing of resource LUNs means that multiple source LUNs can

    share one resource LUN or use different resource LUNs, but all resource LUNs

    are in the resource pool, and data on each resource LUN is shared by the others.

    2.2.3 Quick Data Recovery

    Backup data generated by traditional offline backup tasks cannot be read online.

    Obtaining a usable duplicate of the source data at the backup point in time

    requires a long-time data recovery process. The HyperImage can obtain the data

    on the source LUN at the snapshot point in time by reading the snapshot LUN

    directly. When data on source LUN is damaged, data on the source LUN at the

    snapshot point in time can be directly recovered from the snapshot LUN through

    the convenient data rollback.

    2.2.4 Continuous Data Protection Through Cyclical Timing Snapshots

    The OceanStor storage array supports creating virtual snapshots of a source LUN

    at multiple points in time. The user can make a timing policy to activate and stop

    the HyperImage. When automatic operations upon snapshots at multiple points in

    time move forward along the time axis, continuous data protection at a low cost is

    achieved.

    2.2.5 Snapshot Consistent Group

    In the application of the online transaction processing (OLTP), snapshots of

    multiple source LUNs are required to be created to keep related data of the

    application on different LUNs at the same time. For example, in the Oracle

    database, management data, service data, and log information are scattered on

    different source LUNs. The snapshots of the source LUNs of the three parts of

    data should be created at the same point in time, so that the management data,

    service data, and logs can be recovered to the same point in time during data

    recovery. Otherwise, data relevance is lost because the data of three parts cannot

    be recovered to the same time, and the data recovery becomes meaningless.

    The snapshot consistent group of the OceanStor storage array solves this

    problem. It freezes data on multiple source LUNs at the same snapshot point in

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page8, Total15

    time and then obtains consistent snapshots of these source LUNs at the same

    point in time.

    2.2.6 Influence of Snapshots on Performance

    Using the HyperImage in a disk array has an influence on the system

    performance. When the HyperImage is activated, the write performance is not

    influenced, but the write operations become more complex. When the variation of

    data on the source LUN is great, write operations increase, thus reducing the

    system performance to some degree.

    2.3 Applications of the HyperImage

    2.3.1 Rapid Data Backup and Data Recovery

    The OceanStor HyperImage is differentiated from other backup methods by its

    speed. Using the HyperImage does not involve data replication. Therefore, no

    matter how large the amount of data is, it is very quick to create a snapshot.

    Similar to a photo, a snapshot through the HyperImage is created instantly, but it

    records the data at the moment when it is created. The HyperImage minimizes

    the backup window (nearly eliminated) when backing up data.

    When data on the source LUN is damaged due to a virus invasion or human

    factors, the HyperImage can quickly roll back to the data at the specified snapshot

    point of time to implement rapid data recovery.

    It must be noted that the data backed up by the HyperImage disappears when the

    HyperImage stops. Therefore, in many backup scenarios, the snapshot

    technology is used together with backup software to back up data. The backup

    software reads data from the snapshot and backs data up. In this way, the backup

    window of traditional backup is eliminated.

    2.3.2 Continuous Data Protection

    Another significant feature that differentiates the HyperImage from other backup

    methods is that the backup data takes up a very small storage space. This feature

    enables users to create multiple snapshots of the same piece of source data.

    Users can recover the source data at different points in time by using different

    snapshots. The HyperImage does not only inherit the merits of the traditional

    virtual snapshot technology, but also uses multiple new technologies to flexibly

    support the service continuity solution. Continuous data protection can be

    implemented through multiple snapshots automatically created by the

    HyperImage at different points in time in a specified period. Figure 2-4 shows the

    work principle:

    Figure 2-4 Using the HyperImage to implement continuous data protection

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page9, Total15

    The following description assumes that the system is set to create a snapshot

    hourly from 1:00 pm. If from 1:00 pm to 2:00 pm, a and d (names of two data

    blocks) change into a' and d'. Before a' and d' are written onto the source LUN, a

    and d are saved on the snapshot resource LUN. From 2:00 pm to 3:00 pm, g

    (name of a data block) changes to g'. Before g' is written onto the source LUN, g

    is saved on the resource LUN. From 3:00 pm to 4:00 pm, c and i (names of two

    data blocks) change into c' and i'. Before c' and i' are written onto the source LUN,

    c and i are saved on the resource LUN.

    To recover data at different points in time through the snapshot rollback, do as

    follows:

    1) To recover the data to that at 3:00 pm, replace c and i in the current source

    volume.

    2) To recover the data to that at 2:00 pm, replace g in the source volume of

    3:00 pm.

    3) To recover the data to that at 1:00 pm, replace a and d in the source volume

    of 2:00 pm.

    Continuous data protection through the HyperImage is not real, for real-time

    snapshots of the source LUN cannot be created through the HyperImage. The

    minimum interval between two snapshots determines the granularity of the

    continuous data protection. The OceanStor HyperImage uses an advanced

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page10, Total15

    algorithm to make full use of the resources when providing continuous data

    protection. Multiple snapshots of the same source LUN share a resource pool, so

    that storage capacity is saved and unnecessary copy-on-write is prevented. In

    addition, the overall performance of the storage array is improved.

    2.3.3 Redefining the Usage of the Data

    The consistent images created by the OceanStor HyperImage can be read

    directly. They are available for test, archiving, and query. They protect the

    production system and increase the usage of the backup data, as shown in Figure

    2-5.

    Figure 2-5 Snapshot copy

    2.3.4 Snapshot Agent of the OceanStor Series Storage Product:

    Snapshot Agent

    As known to all, databases or certain applications use the cache management

    mechanism to make a large amount of data stay in the memory or storage media

    in which data is transferred quicker, so that the response and access

    performance of the system are enhanced. However, the HyperImage is dedicated

    to storage arrays. Data in the host memory is not written onto hard disks in time

    when the HyperImage starts. As a result, data inconsistency is inevitable when

    the HyperImage is applied to these applications.

    The Snapshot Agent is a tailored application for databases and these applications.

    Its core function is to ensure the integrity and consistency of data on the hard

    disks when data protective measures, such as the HyperImage, are working. In

    this way, the Snapshot Agent ensures that the data is available immediately after

    the recovery. The Snapshot Agent is installed on the database or hosts that use

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page11, Total15

    the cache management mechanism. Together with the HyperImage, it ensures

    data consistency at the snapshot point in time. Figure 2-6 shows the work

    principle.

    Figure 2-6 HyperImage based on the Snapshot Agent

    1) Before starting the HyperImage, the system notifies the Snapshot Agent

    installed on the host first.

    2) The Snapshot Agent notifies the application of the database that the

    HyperImage is going to start. The database system goes to the backup

    mode, and then performs the "commit" operation, that is, the database

    system delivers the data stored by the application in the host memory to the

    production volumes. This process is called "disk flushing." During the disk

    brushing, data cannot be written to the database.

    3) When the disk brushing succeeds, the Snapshot Agent notifies the system

    that the HyperImage can start.

    4) As soon as the HyperImage starts successfully, the Snapshot Agent is

    notified of recovering the application of the database to the normal mode.

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page12, Total15

    Caution:

    The Snapshot Agent developed by Huawei is applicable to only the Suse 9

    platform and the Oracle 10i database. Now it is going through the certification on

    Veritas-related backup software. It is estimated that the Snapshot Agent can

    obtain the certificate in the second quarter of 2009.

    2.3.5 VSS and HyperImage

    Microsoft has introduced the Visual Source Safe (VSS) since Windows 2003 SP1.

    The VSS defines the backup application framework on the Windows platform.

    With the VVS, the parties involved in the backup can collaborate efficiently with

    each other. The VVS provides the application programming interface (API) for the

    administrator to automatically create snapshots and perform the backup. With the

    collaboration of the VSS, the storage, backup software, and applications from a

    third party can be smoothly used in the storage and backup process. Thus the

    time and complexity of backup are reduced greatly.

    The VSS defines three different roles to describe the different functions of the

    software entities in the backup process. The three roles are the writer, requester,

    and provider. A writer is an entity that uses data objects, such as a database and

    a mail server. A requester mainly refers to the backup software, such as Veritas

    NetBackup (NBU) and Windows NTBackup. A provider is a snapshot method

    provider, which can be an operating system (OS), volume management software,

    or hardware (array). Taking the backup using the NBU as an example, Figure 2-7

    shows the realization process.

    Figure 2-7 VSS invoking the HyperImage in the backup

    1) The backup software NBU (requester) sends a backup request to the VSS

    and specifies the writer that needs to be backed up.

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page13, Total15

    2) After receiving the backup request forwarded by the VVS, the database

    server (writer) writes the data in the buffer area onto the hard disks and

    notifies the VSS.

    3) The VSS queries whether the array (provider) has the snapshot ability (the

    default priority that the VSS invokes the snapshots: the hardware snapshot,

    the software snapshot, and then the snapshot of the VSS itself). At the same

    time, the VSS notifies the NBU of issuing the snapshot activation command.

    4) The NBU sends the snapshot activation command to the VSS.

    5) The VSS locks the applications on the host temporarily to prevent

    inconsistency of data at the snapshot point in time resulting from the read

    and write operations performed by the host.

    6) The VSS forwards the snapshot activation command to the OceanStor

    storage array for executing.

    7) After the snapshot is activated, the VSS notifies the host of starting the

    applications.

    8) The NBU accesses and backs up data saved on the snapshot LUN of the

    OceanStor storage array, and then writes the data onto the tapes.

    2.3.6 Backup Software and HyperImage

    The backup software NBU/BE implements the backup service in the mode of

    invoking the snapshot volume. This mode can be used to effectively avoid the

    backup window problem and can be used to back up the data online. Different

    backup modes such as LAN-Base, LAN-Free, and Server-Free can be

    implemented through different optional components of the backup software. As

    for the array, the problem is how to map the snapshot volume to the backup

    system so that the backup system can access the data. This problem can be

    solved in the following three ways: The first method is that the backup software

    directly invokes the snapshots of the array. This is the simplest way, but the

    prerequisite is that the backup software and the array are tightly coupled and they

    can provide interfaces. The second method is to use an intermediate to act as the

    coordinator. The typical solution is the VSS. The last method is to use a

    loose-coupling solution agreed by both parties. For example, both parties agree

    to perform the snapshot of the array at 9:50 a.m. and map the snapshot volume to

    the backup system. The backup software then performs the backup at 10:00 a.m.

    Taking the LAN-Base backup based on the NBU as an example, Figure 2-8

    shows the backup process using the agreement mode.

    Figure 2-8 LAN-base backup based on the NBU and the snapshot

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page14, Total15

    1) Create snapshots of the production volume in the mode of setting timing

    policy-creation by using the Snapshot Agent or in manual mode. Then map

    the snapshot to the client. As shown in Figure 2-8, the snapshot is created at

    9:50 a.m. and the backup operation is performed at 10:00 a.m.

    2) Set the backup policy on the master server. (The timing backup performed

    by the NBU is later than the timing backup set by the agent to ensure that the

    agent already creates the snapshots of the production volume when the

    NBU starts to back up.)

    3) The master server sends the backup notification.

    4) The media server accesses the data saved on the client.

    5) The media sever writes the data onto the tape devices.

    Taking the Server-Free backup based on the NBU as an example, Figure 2-9

    shows the backup process based on the intermediate VSS.

    Figure 2-9 Server-free backup based on the NBU, the VSS and the snapshot

  • HyperImage Technical White Paper CONFIDENTIAL

    2013-07-29 Copyright Huawei Technologies Co., Ltd. 2012. All rights reserved Page15, Total15

    1) Before backing data up, create the configuration file to record the source

    LUN ID, snapshot LUN ID, snapshot type, and array serial number; and

    configure the path of the in-band command (IC) and the host ID that is

    mapped to the array.

    2) The backup application NBU issues the backup command to the source

    LUN.

    3) The VSS framework (the VSS system service provided by the OS) queries

    the configuration information about the current snapshot.

    4) The master server notifies the client of creating the snapshots on the disk

    array.

    5) The VSS framework queries the information about the snapshot LUN.

    6) The VSS framework mounts the snapshot LUN on the backup system, that is,

    to the client and the media server, and notifies the NBU of the information on

    the snapshot LUN.

    7) The media server writes the data onto the tape devices by accessing the

    snapshot volume.