2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

download 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

of 14

Transcript of 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    1/14

    Privacy-Preserving Public Auditingfor Secure Cloud Storage

    Cong Wang, Member, IEEE, Sherman S.M. Chow, Qian Wang, Member, IEEE,

    Kui Ren, Senior Member, IEEE, and Wenjing Lou, Senior Member, IEEE

    AbstractUsing cloud storage, users can remotely store their data and enjoy the on-demand high-quality applications and services

    from a shared pool of configurable computing resources, without the burden of local data storage and maintenance. However, the fact

    that users no longer have physical possession of the outsourced data makes the data integrity protection in cloud computing a

    formidable task, especially for users with constrained computing resources. Moreover, users should be able to just use the cloud

    storage as if it is local, without worrying about the need to verify its integrity. Thus, enabling public auditability for cloud storage is of

    critical importance so that users can resort to a third-party auditor (TPA) to check the integrity of outsourced data and be worry free. To

    securely introduce an effective TPA, the auditing process should bring in no new vulnerabilities toward user data privacy, and introduce

    no additional online burden to user. In this paper, we propose a secure cloud storage system supporting privacy-preserving public

    auditing. We further extend our result to enable the TPA to perform audits for multiple users simultaneously and efficiently. Extensive

    security and performance analysis show the proposed schemes are provably secure and highly efficient. Our preliminary experiment

    conducted on Amazon EC2 instance further demonstrates the fast performance of the design.

    Index TermsData storage, privacy preserving, public auditability, cloud computing, delegation, batch verification, zero knowledge

    1 INTRODUCTION

    CLOUD computing has been envisioned as the next-generation information technology (IT) architecture forenterprises, due to its long list of unprecedented advantagesin the IT history: on-demand self-service, ubiquitousnetwork access, location independent resource pooling,rapid resource elasticity, usage-based pricing and transfer-

    ence of risk [2]. As a disruptive technology with profoundimplications, cloud computing is transforming the verynature of how businesses use information technology. Onefundamental aspect of this paradigm shifting is that data arebeing centralized or outsourced to the cloud. From usersperspective, including both individuals and IT enterprises,storing data remotely to the cloud in a flexible on-demandmanner brings appealing benefits: relief of the burden forstorage management, universal data access with locationindependence, and avoidance of capital expenditure onhardware, software, and personnel maintenances, etc., [3].

    While cloud computing makes these advantages moreappealing than ever, it also brings new and challengingsecurity threats toward users outsourced data. Since cloudservice providers (CSP) are separate administrative entities,data outsourcing is actually relinquishing users ultimatecontrol over the fate of their data. As a result, the

    correctness of the data in the cloud is being put at riskdue to the following reasons. First of all, although theinfrastructures under the cloud are much more powerfuland reliable than personal computing devices, they are stillfacing the broad range of both internal and external threatsfor data integrity [4]. Examples of outages and securitybreaches of noteworthy cloud services appear from time totime [5], [6], [7]. Second, there do exist various motivationsfor CSP to behave unfaithfully toward the cloud usersregarding their outsourced data status. For examples, CSPmight reclaim storage for monetary reasons by discardingdata that have not been or are rarely accessed, or even hide

    data loss incidents to maintain a reputation [8], [9], [10]. Inshort, although outsourcing data to the cloud is economic-ally attractive for long-term large-scale storage, it does notimmediately offer any guarantee on data integrity andavailability. This problem, if not properly addressed, mayimpede the success of cloud architecture.

    As users no longer physically possess the storage of theirdata, traditional cryptographic primitives for the purposeof data security protection cannot be directly adopted [11].In particular, simply downloading all the data for itsintegrity verification is not a practical solution due to theexpensiveness in I/O and transmission cost across the

    network. Besides, it is often insufficient to detect the datacorruption only when accessing the data, as it does not giveusers correctness assurance for those unaccessed data andmight be too late to recover the data loss or damage.

    362 IEEE TRANSACTIONS ON COMPUTERS, VOL. 62, NO. 2, FEBRUARY 2013

    . C. Wang is with the Department of Computer Science, City University ofHong Kong, Hong Kong. E-mail: [email protected].

    . S.S.M. Chow is with the Department of Information Engineering, ChineseUniversity of Hong Kong, Shatin, N.T., Hong Kong.E-mail: [email protected].

    . Q. Wang is with the School of Computer, Wuhan University, China.E-mail: [email protected].

    . K. Ren is with the Department of Computer Science and Engineering, TheState University of New York at Buffalo. E-mail: [email protected].

    . W. Lou is with the Department of Computer Science, Virginia PolytechnicInstitute and State University, 7054 Haycock Rd., Falls Church, VA22043. E-mail: [email protected].

    Manuscript received 15 Nov. 2010; revised 12 May 2011; accepted 16 Oct.

    2011; published online 15 Dec. 2011.Recommended for acceptance by B. Veeravalli.For information on obtaining reprints of this article, please send e-mail to:[email protected], and reference IEEECS Log Number TC-2010-11-0627.Digital Object Identifier no. 10.1109/TC.2011.245.

    0018-9340/13/$31.00 2013 IEEE Published by the IEEE Computer Society

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    2/14

    Considering the large size of the outsourced data and theusers constrained resource capability, the tasks of auditingthe data correctness in a cloud environment can beformidable and expensive for the cloud users [12], [8].Moreover, the overhead of using cloud storage should beminimized as much as possible, such that a user does notneed to perform too many operations to use the data (in

    additional to retrieving the data). In particular, users maynot want to go through the complexity in verifying the dataintegrity. Besides, there may be more than one user accessesthe same cloud storage, say in an enterprise setting. Foreasier management, it is desirable that cloud only entertainsverification request from a single designated party.

    To fully ensure the data integrity and save thecloud userscomputation resources as well as online burden, it is ofcritical importance to enable public auditing service for clouddata storage, so that users may resort to an independentthird-party auditor (TPA) to audit the outsourced data whenneeded. The TPA, who has expertise and capabilities that

    users do not, can periodically check the integrity of all thedata stored in the cloud on behalf of the users, whichprovides a much more easier andaffordablewayfor theusersto ensure their storage correctness in the cloud. Moreover, inaddition to help users to evaluate the risk of their subscribedcloud data services, the audit result from TPA would also bebeneficial for the cloud service providers to improve theircloud-based service platform, and even serve for indepen-dent arbitration purposes [10]. In a word, enabling publicauditing services will play an important role for this nascentcloud economy to become fully established, where users willneed ways to assess risk and gain trust in the cloud.

    Recently, the notion of public auditability has beenproposed in the context of ensuring remotely stored dataintegrity under different system and security models [9],[13], [11], [8]. Public auditability allows an external party, inaddition to the user himself, to verify the correctness ofremotely stored data. However, most of these schemes [9],[13], [8] do not consider the privacy protection of usersdata against external auditors. Indeed, they may potentiallyreveal users data to auditors, as will be discussed inSection 3.4. This severe drawback greatly affects thesecurity of these protocols in cloud computing. From theperspective of protecting data privacy, the users, who own

    the data and rely on TPA just for the storage security oftheir data, do not want this auditing process introducingnew vulnerabilities of unauthorized information leakagetoward their data security [14], [15]. Moreover, there arelegal regulations, such as the US Health Insurance Port-ability and Accountability Act (HIPAA) [16], furtherdemanding the outsourced data not to be leaked to externalparties [10]. Simply exploiting data encryption beforeoutsourcing [15], [11] could be one way to mitigate thisprivacy concern of data auditing, but it could also be anoverkill when employed in the case of unencrypted/publiccloud data (e.g., outsourced libraries and scientific data

    sets), due to the unnecessary processing burden for cloudusers. Besides, encryption does not completely solve theproblem of protecting data privacy against third-partyauditing but just reduces it to the complex key management

    domain. Unauthorized data leakage still remains possibledue to the potential exposure of decryption keys.

    Therefore, how to enable a privacy-preserving third-party auditing protocol, independent to data encryption, isthe problem we are going to tackle in this paper. Our workis among the first few ones to support privacy-preservingpublic auditing in cloud computing, with a focus on datastorage. Besides, with the prevalence of cloud computing, aforeseeable increase of auditing tasks from different usersmay be delegated to TPA. As the individual auditing ofthese growing tasks can be tedious and cumbersome, anatural demand is then how to enable the TPA to efficientlyperform multiple auditing tasks in a batch manner, i.e.,simultaneously.

    To address these problems, our work utilizes thetechnique of public key-based homomorphic linear authen-ticator (or HLA for short) [9], [13], [8], which enables TPA toperform the auditing without demanding the local copy ofdata and thus drastically reduces the communication andcomputation overhead as compared to the straightforward

    data auditing approaches. By integrating the HLA withrandom masking, our protocol guarantees that the TPAcould not learn any knowledge about the data contentstored in the cloud server (CS) during the efficient auditingprocess. The aggregation and algebraic properties of theauthenticator further benefit our design for the batchauditing. Specifically, our contribution can be summarizedas the following three aspects:

    1. We motivate the public auditing system of datastorage security in cloud computing and provide aprivacy-preserving auditing protocol. Our schemeenables an external auditor to audit users cloud data

    without learning the data content.2. To the best of our knowledge, our scheme is the first

    to support scalable and efficient privacy-preservingpublic storage auditing in cloud. Specifically, ourscheme achieves batch auditing where multipledelegated auditing tasks from different users canbe performed simultaneously by the TPA in aprivacy-preserving manner.

    3. We prove the security and justify the performance ofour proposed schemes through concrete experi-ments and comparisons with the state of the art.

    The rest of the paper is organized as follows: Section 2

    introduces the system and threat model, and our designgoals. Then, we provide the detailed description of ourscheme in Section 3. Section 4 gives the security analysisand performance evaluation. Section 5 presents furtherdiscussions on a zero-knowledge auditing protocol, fol-lowed by Section 6 that overviews the related work. Finally,Section 7 gives the concluding remark of the whole paper.

    2 PROBLEM STATEMENT

    2.1 The System and Threat Model

    We consider a cloud data storage service involving three

    different entities, as illustrated in Fig. 1: the cloud user, whohas large amount of data files to be stored in the cloud; thecloud server, which is managed by thecloud service providertoprovide data storage service and has significant storage

    WANG ET AL.: PRIVACY-PRESERVING PUBLIC AUDITING FOR SECURE CLOUD STORAGE 363

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    3/14

    space and computation resources (we will not differentiateCS and CSP hereafter); the third-party auditor, who hasexpertise and capabilities that cloud users do not have andis trusted to assess the cloud storage service reliability onbehalf of the user upon request. Users rely on the CS forcloud data storage and maintenance. They may alsodynamically interact with the CS to access and update theirstored data for various application purposes. As users no

    longer possess their data locally, it is of critical importancefor users to ensure that their data are being correctly storedand maintained. To save the computation resource as wellas the online burden potentially brought by the periodicstorage correctness verification, cloud users may resort toTPA for ensuring the storage integrity of their outsourceddata, while hoping to keep their data private from TPA.

    We assume the data integrity threats toward users datacan come from both internal and external attacks at CS.These may include: software bugs, hardware failures, bugsin the network path, economically motivated hackers,malicious or accidental management errors, etc. Besides,

    CS can be self-interested. For their own benefits, such as tomaintain reputation, CS might even decide to hide thesedata corruption incidents to users. Using third-partyauditing service provides a cost-effective method for usersto gain trust in cloud. We assume the TPA, who is in thebusiness of auditing, is reliable and independent. However,it may harm the user if the TPA could learn the outsourceddata after the audit.

    Note that in our model, beyond users reluctance to leakdata to TPA, we also assume that cloud servers has noincentives to reveal their hosted data to external parties. Onthe one hand, there are regulations, e.g., HIPAA [16],requesting CS to maintain users data privacy. On the otherhand, as users data belong to their business asset [10], therealso exist financial incentives for CS to protect it from anyexternal parties. Therefore, we assume that neither CS norTPA has motivations to collude with each other during theauditing process. In other words, neither entities will deviatefrom the prescribed protocol execution in the followingpresentation.

    To authorize the CS to respond to the audit delegated toTPAs, the user can issue a certificate on TPAs public key,and all audits from the TPA are authenticated against sucha certificate. These authentication handshakes are omittedin the following presentation.

    2.2 Design Goals

    To enable privacy-preserving public auditing for cloud datastorage under the aforementioned model, our protocol

    design should achieve the following security and perfor-mance guarantees:

    1. Public auditability: to allow TPA to verify thecorrectness of the cloud data on demand withoutretrieving a copy of the whole data or introducingadditional online burden to the cloud users.

    2. Storage correctness: to ensure that there exists no

    cheating cloud server that can pass the TPAs auditwithout indeed storing users data intact.

    3. Privacy preserving: to ensure that the TPA cannotderive users data content from the informationcollected during the auditing process.

    4. Batch auditing: to enable TPA with secure andefficient auditing capability to cope with multipleauditing delegations from possibly large number ofdifferent users simultaneously.

    5. Lightweight: to allow TPA to perform auditingwith minimum communication and computationoverhead.

    3 THEPROPOSEDSCHEMES

    This section presents our public auditing scheme whichprovides a complete outsourcing solution of datanot onlythe data itself, but also its integrity checking. Afterintroducing notations and brief preliminaries, we start froman overview of our public auditing system and discuss twostraightforward schemes and their demerits. Then, wepresent our main scheme and show how to extent ourmain scheme to support batch auditing for the TPA upondelegations from multiple users. Finally, we discuss how togeneralize our privacy-preserving public auditing schemeand its support of data dynamics.

    3.1 Notation and Preliminaries

    . Fthe data file to be outsourced, denoted as asequence of n blocks m1; . . . ; mi; . . . ; mn2ZZp forsome large prime p.

    . MACmessage authentication code (MAC)function, defined as: K f0; 1g ! f0; 1gl where Kdenotes the key space.

    . H,hcryptographic hash functions.

    We now introduce some necessary cryptographic back-

    ground for our proposed scheme.Bilinear Map.LetGG1,GG2, andGGTbe multiplicative cyclic

    groups of prime order p. Letg1 andg2 be generators ofGG1and GG2, respectively. A bilinear map is a map e: GG1GG2! GGT such that for all u2 GG1, v2 GG2 and a; b2 ZZp,eua; vb eu; vab. This bilinearity implies that for any u1,u22GG1, v2 GG2, eu1 u2; v eu1; v eu2; v. Of course,there exists an efficiently computable algorithm for com-puting e and the map should be nontrivial, i.e., e isnondegenerate:eg1; g2 61.

    3.2 Definitions and Framework

    We follow a similar definition of previously proposedschemes in the context of remote data integrity checking[9], [11], [13] and adapt the framework for our privacy-preserving public auditing system.

    364 IEEE TRANSACTIONS ON COMPUTERS, VOL. 62, NO. 2, FEBRUARY 2013

    Fig. 1. The architecture of cloud data storage service.

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    4/14

    A public auditing scheme consists of four algorithms(KeyGen, SigGen, GenProof, VerifyProof). KeyGen isa key generation algorithm that is run by the user to setupthe scheme. SigGen is used by the user to generateverification metadata, which may consist of digital signa-tures. GenProof is run by the cloud server to generate aproof of data storage correctness, while VerifyProof isrun by the TPA to audit the proof.

    Running a public auditing system consists of two phases,Setupand Audit:

    . Setup: The user initializes the public and secretparameters of the system by executing KeyGen, andpreprocesses the data file F by using SigGen togenerate the verification metadata. The user thenstores the data fileFand the verification metadata atthe cloud server, and deletes its local copy. As partof preprocessing, the user may alter the data file Fby expanding it or including additional metadata tobe stored at server.

    .

    Audit: The TPA issues an audit message orchallenge to the cloud server to make sure that thecloud server has retained the data fileFproperly atthe time of the audit. The cloud server will derive aresponse message by executing GenProof using Fand its verification metadata as inputs. The TPAthen verifies the response via VerifyProof.

    Our framework assumes that the TPA is stateless, i.e.,TPA does not need to maintain and update state betweenaudits, which is a desirable property especially in the publicauditing system [13]. Note that it is easy to extend theframework above to capture a stateful auditing system,essentially by splitting the verification metadata into twoparts which are stored by the TPA and the cloud server,respectively. Our design does not assume any additionalproperty on the data file. If the user wants to have moreerror resilience, he can first redundantly encodes the datafile and then uses our system with the data that has error-correcting codes integrated.1

    3.3 The Basic Schemes

    Before giving our main result, we study two classes ofschemes as a warmup. The first one is a MAC-based solutionwhich suffers from undesirable systematic demeritsbounded usage and stateful verification, which may pose

    additional online burden to users, in a public auditingsetting. This also shows that the auditing problem is still noteasy to solve even if we have introduced a TPA. The secondone is a system based on homomorphic linear authentica-tors, which covers many recent proof of storage systems. Wewill pinpoint the reason why all existing HLA-basedsystems are not privacy preserving. The analysis of thesebasic schemes leads to our main result, which overcomes allthese drawbacks. Our main scheme to be presented is basedon a specific HLA scheme.

    MAC-based solution. There are two possible ways tomake use of MAC to authenticate the data. A trivial way isjust uploading the data blocks with their MACs to theserver, and sends the corresponding secret key sk to the

    TPA. Later, the TPA can randomly retrieve blocks with theirMACs and check the correctness via sk. Apart from thehigh (linear in the sampled data size) communication andcomputation complexities, the TPA requires the knowledgeof the data blocks for verification.

    To circumvent the requirement of the data in TPA

    verification, one may restrict the verification to just consist

    of equality checking. The idea is as follows: Before dataoutsourcing, the cloud user chooses s random messageauthentication code keysfskg1s, precomputes s (deter-ministic) MACs, fMACskFg1s for the whole data file

    F, and publishes these verification metadata (the keys andthe MACs) to TPA. The TPA can reveal a secret key sk tothe cloud server and ask for a fresh keyed MAC forcomparison in each audit. This is privacy preserving as longas it is impossible to recoverFin full givenMACskFand

    sk. However, it suffers from the following severe draw-backs: 1) the number of times a particular data file can beaudited is limited by the number of secret keys that must be

    fixed a priori. Once all possible secret keys are exhausted,the user then has to retrieve data in full to recompute andrepublish new MACs to TPA; 2) The TPA also has to

    maintain and update state between audits, i.e., keep trackon the revealed MAC keys. Considering the potentiallylarge number of audit delegations from multiple users,maintaining such states for TPA can be difficult and errorprone; 3) it can only support static data, and cannotefficiently deal with dynamic data at all. However,supporting data dynamics is also of critical importance forcloud storage systems. For the reason of brevity and clarity,

    our main protocol will be presented based on static data.

    Section 3.6 will describe how to adapt our protocol fordynamic data.

    HLA-based solution. To effectively support public

    auditability without having to retrieve the data blocksthemselves, the HLA technique [9], [13], [8] can be used.HLAs, like MACs, are also some unforgeable verificationmetadata that authenticate the integrity of a data block. Thedifference is that HLAs can be aggregated. It is possible tocompute an aggregated HLA which authenticates a linearcombination of the individual data blocks.

    At a high level, an HLA-based proof of storage system

    works as follow. The user still authenticates each element ofF fmig by a set of HLAs . The TPA verifies the cloudstorage by sending a random set of challenge fig. Thecloud server then returns

    Pii mi and its aggregated

    authenticatorcomputed from .Though allowing efficient data auditing and consuming

    only constant bandwidth, the direct adoption of these HLA-based techniques is still not suitable for our purposes. Thisis because the linear combination of blocks,

    Pii mi,

    may potentially reveal user data information to TPA, andviolates the privacy-preserving guarantee. Specifically, bychallenging the same set of c block m1; m2; . . . ; mc using

    c different sets of random coefficients fig, TPA canaccumulatec different linear combinations 1; . . . ; c. With

    fig and fig, TPA can derive the users datam1; m2; . . . ; mcby simply solving a system of linear equations.

    WANG ET AL.: PRIVACY-PRESERVING PUBLIC AUDITING FOR SECURE CLOUD STORAGE 365

    1. We refer readers to [17], [18] for the details on integration of error-correcting codes and remote data integrity checking.

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    5/14

    3.4 Privacy-Preserving Public Auditing Scheme

    Overview. To achieve privacy-preserving public auditing,we propose to uniquely integrate the homomorphic linearauthenticator with random masking technique. In ourprotocol, the linear combination of sampled blocks in the

    servers response is masked with randomness generated bythe server. With random masking, the TPA no longer has allthe necessary information to build up a correct group oflinear equations and therefore cannot derive the users datacontent, no matter how many linear combinations of thesame set of file blocks can be collected. On the other hand,the correctness validation of the block-authenticator pairscan still be carried out in a new way which will be shownshortly, even with the presence of the randomness. Ourdesign makes use of a public key-based HLA, to equip theauditing protocol with public auditability. Specifically, weuse the HLA proposed in [13], which is based on the short

    signature scheme proposed by Boneh, Lynn, and Shacham(hereinafter referred as BLS signature) [19].

    Scheme details. Let GG1, GG2, and GGT be multiplicativecyclic groups of prime order p, ande : GG1 GG2! GGT be abilinear map as introduced in preliminaries. Let g be agenerator of GG2. H is a secure map-to-point hashfunction: f0; 1g !GG1, which maps strings uniformly toGG1. Another hash function h: GGT !ZZp maps groupelement ofGGTuniformly to ZZp. Our scheme is as follows:

    SetupPhase: The cloud user runs KeyGen to generatethe public and secret parameters. Specifically, the userchooses a random signing key pair spk; ssk, a random

    x ZZp, a random element u GG1, and computes v gx

    .The secret parameter is sk x;ssk and the publicparameters arepk spk;v;g;u;eu; v.

    Given a data file F fmig, the user runs SigGen tocompute authenticator i HWi u

    mi x 2GG1 for each i.Here, Wi nameki and name is chosen by the useruniformly at random from ZZp as the identifier of file F.Denote the set of authenticators by fig1in.

    The last part of SigGen is for ensuring the integrity ofthe unique file identifiername. One simple way to do this isto compute t namekSSigsskname as the file tag for F,where SSigsskname is the signature on name under the

    private key ssk. For simplicity, we assume the TPA knowsthe number of blocks n. The user then sends F along withthe verification metadata ; t to the server and deletesthem from local storage.

    AuditPhase: The TPA first retrieves the file tagt. Withrespect to the mechanism we describe in the Setupphase,the TPA verifies the signature SSigsskname via spk, andquits by emitting FALSE if the verification fails. Otherwise,the TPA recovers name.

    Now it comes to the core part of the auditing process.To generate the challenge message for the audit chal, theTPA picks a randomc-element subsetI fs1; . . . ; scgof set1; n. For each element i2 I, the TPA also chooses arandom valuei(of bit length that can be shorter than jpj, asexplained in [13]). The message chal specifies thepositions of the blocks required to be checked. The TPAsendschal fi; igi2Ito the server.

    Upon receiving challenge chal fi; igi2I, the serverruns GenProof to generate a response proof of data storagecorrectness. Specifically, the server chooses a randomelement r ZZp, and calculates R eu; v

    r 2GGT. Let 0

    denote the linear combination of sampled blocks specified

    in chal: 0 P

    i2Iimi. To blind 0 with r, the servercomputes: r 0 modp, where hR 2 ZZp. Mean-while, the server also calculates an aggregated authenticator

    Qi2I

    ii 2GG1. It then sends f;;Rg as the response

    proof of storage correctness to the TPA. With the response,the TPA runs VerifyProofto validate it by first comput-ing hRand then checking the verification equation

    R e; g ?

    eYscis1

    HWii

    !u; v

    !: 1

    The protocol is illustrated in Table 1. The correctness of

    the above verification equation is elaborated as follows:

    R e; g eu; vr eYscis1

    HWi umi xi

    !; g

    !

    eur; v eYscis1

    HWii uimi

    !; g

    !x

    eur; v eYscis1

    HWii

    !u

    0; v

    !

    e Ysc

    is1

    HWii

    !u

    0 r; v

    !

    eYscis1

    HWii

    !u; v

    !:

    366 IEEE TRANSACTIONS ON COMPUTERS, VOL. 62, NO. 2, FEBRUARY 2013

    TABLE 1The Privacy-Preserving Public Auditing Protocol

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    6/14

    Properties of our protocol. It is easy to see that ourprotocol achieves public auditability. There is no secretkeying material or states for the TPA to keep or maintainbetween audits, and the auditing protocol does not poseany potential online burden on users. This approachensures the privacy of user data content during the auditingprocess by employing a random masking r to hide , alinear combination of the data blocks. Note that the value R

    in our protocol, which enables the privacy-preservingguarantee, will not affect the validity of the equation, dueto the circular relationship between R and in hRand the verification equation. Storage correctness thusfollows from that of the underlying protocol [13]. Thesecurity of this protocol will be formally proven in Section 4.Besides, the HLA helps achieve the constant communica-tion overhead for servers response during the audit: thesize off;;Rg is independent of the number of sampledblocksc.

    Previous work [9], [8] showed that if the server ismissing a fraction of the data, then the number of blocksthat needs to be checked in order to detect server

    misbehavior with high probability is in the order ofO1.In particular, ift fraction of data is corrupted, then randomsampling c blocks would reach the detection probabilityP1 1 tc. Here, every block is chosen uniformly atrandom. Whent 1% of the dataF, the TPA only needs toaudit for c 300 or 460 randomly chosen blocks of F todetect this misbehavior with probability larger than 95 and99 percent, respectively. Given the huge volume of dataoutsourced in the cloud, checking a portion of the data fileis more affordable and practical for both the TPA andthe cloud server than checking all the data, as long as thesampling strategies provides high-probability assurance. InSection 4, we will present the experiment result based on

    these sampling strategies.For some cloud storage providers, it is possible that

    certain information dispersal algorithms (IDA) may be usedto fragment and geographically distribute the users out-sourced data for increased availability. We note that thesecloud side operations would not affect the behavior of ourproposed mechanism, as long as the IDA is systematic, i.e., itpreserves users data in its original form after encoding withredundancy. This is because from users perspective, as longas there is a complete yet unchanged copy of his outsourceddata in cloud, the precomputed verification metadata ; twill remain valid. As a result, those metadata can still beutilized in our auditing mechanism to guarantee thecorrectness of users outsourced cloud data.

    Storage and communication tradeoff. As describedabove, each block is accompanied by an authenticator ofequal size of jpj bits. This gives about 2 storageoverhead on server. However, as noted in [13], we canintroduce a parameter s in the authenticator constructionto adjust this storage overhead, in the cost of commu-nication overhead in the auditing protocol between TPAand cloud server. In particular, we assume each block miconsists ofs sectors fmijg with 1 j s, where mij2 ZZp.The public parameter pk is now spk;v;g; fujg; feuj; vg,1 j s, where u1; u2; . . . ; us are randomly chosen from

    GG1. The authenticator i of mi is constructed as:i HWi

    Qsj1u

    mijj

    x 2GG1. Because we now haveone authenticator per block (or per s sectors), we reducethe storage overhead to 1 1=s .

    To respond to the auditing challenge chal fi; igi2I,for 1 j s, the cloud server chooses a random elementsrj ZZp, and calculatesRj eu; v

    rj 2GGT. Then, the serverblinds each 0j

    Pi2Iimijwithrj, and derives the blinded

    j rj 0jmodp, where hR1kR2k kRs 2ZZp. Theaggregated authenticator is still computed as before. It thensendsf; fj; Rjg1jsgas the proof response to TPA. With

    the proof, TPA first computes hR1kR2k kRs, andthen checks the following verification:

    R1 Rs e; g

    ?e

    Yscis1

    HWii

    !Ysj1

    uj

    j ; v

    !: 2

    The correctness elaboration is similar to (1) and thusomitted. The overall storage overhead is reduced to1 1=s , but the proof size now increases roughly sdue to the additional s element pairs fj; Rjg1js that thecloud server has to return. For presentation simplicity, wecontinue to choose s 1 in our following scheme descrip-tion. We will present some experiment results with largerchoice ofs in Section 4.

    3.5 Support for Batch Auditing

    With the establishment of privacy-preserving publicauditing, the TPA may concurrently handle multipleauditing upon different users delegation. The individualauditing of these tasks for the TPA can be tedious and veryinefficient. Given K auditing delegations on K distinctdata files from Kdifferent users, it is more advantageousfor the TPA to batch these multiple tasks together andaudit at one time. Keeping this natural demand in mind,we slightly modify the protocol in a single user case, and

    achieves the aggregation of Kverification equations (forKauditing tasks) into a single one, as shown in (3). As aresult, a secure batch auditing protocol for simultaneousauditing of multiple tasks is obtained. The details aredescribed as follows:

    Setup phase: Basically, the users just perform Setupindependently. Suppose there are Kusers in the system,and each user k has a data file Fk mk;1; . . . ; mk;n to beoutsourced to the cloud server, where k2 f1; . . . ; Kg. Forsimplicity, we assume each fileFk has the same number ofn blocks. For a particular user k, denote his/her secret keyas xk;sskk, and the corresponding public parameter as

    spkk; vk; g ; uk; euk; vkwherevk gxk . Similar to the singleuser case, each user k has already randomly chosen adifferent (with overwhelming probability) name namek 2ZZp for his/her file Fk, and has correctly generated thecorresponding file tag tk namekkSSigsskk namek. Then,each user k runs SigGenand computesk;i for block

    mk;i : k;i

    Hnamekki umk;ik

    xk

    HWk;i umk;ik

    xk 2GG1i2 f1; . . . ; ng;whereWk;i namekki. Finally, each userk sends fileFk, setof authenticators k, and tag tk to the server and deletes

    them from local storage.Auditphase: TPA first retrieves and verifies file tag tk

    for each user k for later auditing. If the verification fails,TPA quits by emitting FALSE. Otherwise, TPA recovers

    WANG ET AL.: PRIVACY-PRESERVING PUBLIC AUDITING FOR SECURE CLOUD STORAGE 367

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    7/14

    namek and sends the audit challenge chal fi; igi2I tothe server for auditing data files of all Kusers.

    Upon receiving chal, for each user k2 f1; . . . ; Kg, theserver randomly picks rk2ZZp and computes Rk euk; vk

    rk . Denote R R1 R2 RK, and L vk1kvk2k

    kvkK, our protocol further requires the server to computek hRkvkkL. Then, the randomly masked responses canbe generated as follows:

    k kXscis1

    imk;i rk modp and k Yscis1

    ik;i:

    The server then responds with ffk; kg1kK; Rg.To verify the response, the TPA can first compute k

    hRkvkkL for1 k K. Next, TPA checks if the followingequation holds:

    R eYKk1

    kk ; g !

    ?YKk1

    eYscis1

    HWk;ii !

    k ukk ; vk !

    : 3

    The batch protocol is illustrated in Table 2. Here, the left-hand side (LHS) of (3) expands as

    LHS R1 R2 RKYKk1

    e

    kk ; g

    YKk1

    Rk e

    kk ; g

    YK

    k1

    e Ysc

    is1

    HWk;ii !

    k

    uk

    k

    ; vk !;which is the right-hand side, as required. Note that the lastequality follows from (1).

    Efficiency improvement. As shown in (3), batch audit-ing not only allows TPA to perform the multiple auditingtasks simultaneously, but also greatly reduces the computa-tion cost on the TPA side. This is because aggregatingKverification equations into one helps reduce the numberof relatively expensive pairing operations from 2K, asrequired in the individual auditing, toK 1, which saves aconsiderable amount of auditing time.

    Identification of invalid responses. The verificationequation (3) only holds when all the responses are valid,and fails with high probability when there is even onesingle invalid response in the batch auditing, as we will

    show in Section 4. In many situations, a response collectionmay contain invalid responses, especially fkg1kK, causedby accidental data corruption, or possibly malicious activityby a cloud server. The ratio of invalid responses to the validcould be quite small, and yet a standard batch auditor will

    reject the entire collection. To further sort out these invalidresponses in the batch auditing, we can utilize a recursivedivide-and-conquer approach (binary search), as suggestedby Ferrara et al. [20]. Specifically, if the batch auditing fails,we can simply divide the collection of responses into twohalves, and repeat the auditing on halves via (3). TPA maynow require the server to send back all thefRkg1kK, as inindividual auditing. In Section 4.2.2, we show throughcarefully designed experiment that using this recursivebinary search approach, even if up to 20 percent ofresponses are invalid, batch auditing still performs fasterthan individual verification.

    3.6 Support for Data Dynamics

    In cloud computing, outsourced data might not only beaccessed but also updated frequently by users for variousapplication purposes [21], [8], [22], [23]. Hence, supportingdata dynamics for privacy-preserving public auditing isalso of paramount importance. Now, we show how to buildupon the existing work [8] and adapt our main scheme tosupport data dynamics, including block level operations ofmodification, deletion, and insertion.

    In [8], data dynamics support is achieved by replacingthe index informationi withmiin the computation of blockauthenticators and using the classic data structure

    Merkle hash tree (MHT) [24] for the underlying blocksequence enforcement. As a result, the authenticator foreach block is changed to i Hmi u

    mi x. We can adoptthis technique in our design to achieve privacy-preservingpublic auditing with support of data dynamics. Specifi-cally, in the Setup phase, the user has to generate andsend the tree root T RMH T to TPA as additional metadata,where the leaf nodes of MHT are values of Hmi. In theAudit phase, besides f;;Rg, the servers responseshould also include fHmigi2I and their correspondingauxiliary authentication informationauxin the MHT. Uponreceiving the response, TPA should first use T RMH T and

    aux to authenticate fHmigi2I computed by the server.OncefHmigi2Iare authenticated, TPA can then performthe auditing on f; ;R; fHmigi2Ig via (1), whereQ

    s1iscHWi

    i is now replaced byQ

    s1iscHmi

    i . All

    368 IEEE TRANSACTIONS ON COMPUTERS, VOL. 62, NO. 2, FEBRUARY 2013

    TABLE 2The Batch Auditing Protocol

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    8/14

    these changes does not interfere with the proposed randommasking technique, so data privacy is still preserved. Tosupport data dynamics, each data update would requirethe user to generate a new tree root T RMH T, which is latersent to TPA as the new metadata for storage auditing task.The details of handling dynamic operations are similar to[8] and thus omitted.

    Application to version control system. The abovescheme allows TPA to always keep the new tree root forauditing the updated data file. But it is worth noting that ourmechanism can be easily extended to work with versioncontrol system, where both current and previous versions ofthe data file F and the corresponding authenticators arestored and need to be audited on demand. One possible wayis to require TPA to keep tracks of both the current andprevious tree roots generated by the user, denoted asfT R1MH T; T R

    2MH T; . . . ; T R

    VMH Tg. Here, V is the number of

    file versions and T RVMH T is the root related to the mostcurrent version of the data file F. Then, whenever adesignated versionv(1v V) of data file is to be audited,

    the TPA just uses the correspondingT RvMH Tto perform theauditing. The cloud server should also keep track of all theversions of data file Fand their authenticators, in order tocorrectly answer the auditing request from TPA. Note thatcloud server does not need to replicate every block of datafile in every version, as many of them are the same afterupdates. However, how to efficiently manage such blockstorage in cloud is not within the scope of our paper.

    3.7 Generalization

    As mentioned before, our protocol is based on the HLA in[13]. It has been shown in [25] that HLA can be constructedby homomorphic identification protocols. One may applythe random masking technique we used to construct thecorresponding zero knowledge proof for different homo-morphic identification protocols. Therefore, our privacy-preserving public auditing system for secure cloud storagecan be generalized based on other complexity assumptions,such as factoring [25].

    4 EVALUATION

    4.1 Security Analysis

    We evaluate the security of the proposed scheme byanalyzing its fulfillment of the security guarantee described

    in Section 2.2, namely, the storage correctness and privacy-preserving property. We start from the single user case,where our main result is originated. Then, we show thesecurity guarantee of batch auditing for the TPA inmultiuser setting.

    4.1.1 Storage Correctness Guarantee

    We need to prove that the cloud server cannot generatevalid response for the TPA without faithfully storing thedata, as captured by Theorem 1.

    Theorem 1. If the cloud server passes theAuditphase, it mustindeed possess the specified data intact as it is.

    Proof. We show that there exists an extractor of 0 in therandom oracle model. With valid f; 0g, our theoremfollows from [13, Theorem 4.2].

    The extractor controls the random oracle h andanswers the hash query issued by the cloud server,which is treated as an adversary here. For a challenge hR returned by the extractor, the cloud serveroutputsf;;Rgsuch that the following equation holds:

    R e; g e Ysc

    is1

    HWii !

    u; v !: 4Suppose that our extractor can rewind a cloud server

    in the execution of the protocol to the point just beforethe challengehRis given. Now, the extractor setshRto be 6. The cloud server outputs f; ; Rg suchthat the following equation holds:

    R e

    ; g eYscis1

    HWii

    !u

    ; v

    0@

    1A: 5

    The extractor then obtains f; 0 = g asa valid response of the underlying proof of storage

    system [13]. To see why, recall that i HWi umi x. Ifwe divide (4) by (5), we have

    e

    ; g eYscis1

    HWii

    !u

    ; v

    0@

    1A

    e

    ; g eYscis1

    HWii

    !; gx

    0@

    1Aeu ; gx

    Yscis1

    HWii

    !xux

    Yscis1

    ii

    !

    Yscis1

    HWii

    !x

    ux

    ux

    Yscis1

    i=HWixi

    !

    ux

    Yscis1

    uxmi i

    !

    Xscis1

    mii

    !

    Xscis1

    mii !

    = :

    Finally, we remark that this extraction argument andthe random oracle paradigm are also used in the proof ofthe underlying scheme [13]. tu

    4.1.2 Privacy-Preserving Guarantee

    The below theorem shows that TPA cannot derive usersdata from the information collected during auditing.

    Theorem 2. From the servers response f;;Rg, TPA cannotrecover 0.

    Proof. We show the existence of a simulator that canproduce a valid response even without the knowledgeof 0, in the random oracle model. Now, the TPA istreated as an adversary. Given a valid from the

    WANG ET AL.: PRIVACY-PRESERVING PUBLIC AUDITING FOR SECURE CLOUD STORAGE 369

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    9/14

    cloud server, first, randomly pick ; from ZZp, setR e

    Qscis1

    HWii u; v=e; g. Finally, back-

    patch hR since the simulator is controlling therandom oracle h. We remark that this backpatchingtechnique in the random oracle model is also used inthe proof of the underlying scheme [13]. tu

    4.1.3 Security Guarantee for Batch Auditing

    Now, we show that our way of extending our result to amultiuser setting will not affect the aforementioned securityinsurance, as shown in Theorem 3.

    Theorem 3. Our batch auditing protocol achieves the samestorage correctness and privacy-preserving guarantee as in thesingle-user case.

    Proof. The privacy-preserving guarantee in the multiusersetting is very similar to that of Theorem 2, and thusomitted here. For the storage correctness guarantee, weare going to reduce it to the single-user case. We use theforking technique as in the proof of Theorem 1. However,

    the verification equation for the batch audits involvesKchallenges from the random oracle. This time we needto ensure that all the other K 1 challenges aredetermined before the forking of the concerned randomoracle response. This can be done using the idea in [26].As soon as the adversary issues the very first randomoracle query for i hRkvikL for any i2 1; K, thesimulator immediately determines the values j hRkvjkL for all j2 1; K. This is possible since theyare all using the same Rand L. Now, all but one of theks in (3) are equal, so a valid response can be extractedsimilar to the single-user case in the proof of Theorem 1.tu

    4.2 Performance AnalysisWe now report some performance results of our experi-ments. We consider our auditing mechanism happensbetween a dedicated TPA and some cloud storage node,where users data are outsourced to. In our experiment, theTPA/user side process is implemented on a workstationwith an Intel Core 2 processor running at 1.86 GHz,2,048 MB of RAM, and a 7,200 RPM Western Digital250 GB Serial ATA drive. The cloud server side process isimplemented on Amazon Elastic Computing Cloud (EC2)with a large instance type [27], which has 4 EC2 ComputeUnits, 7.5 GB memory, and 850 GB instance storage. The

    randomly generated test data is of 1 GB size. All algorithmsare implemented using C language. Our code uses thePairing-Based Cryptography (PBC) library version 0.4.21.The elliptic curve utilized in the experiment is an MNTcurve, with base field size of 159 bits and the embeddingdegree 6. The security level is chosen to be 80 bit, whichmeans jij 80 and jpj 160. All experimental resultsrepresent the mean of 20 trials.

    Because the cloud is a pay-per-use model, users have topay both the storage cost and the bandwidth cost (for datatransfer) when using the cloud storage auditing. Thus,when implementing our mechanism, we have to take into

    consideration both factors. In particular, we conducts theexperiment with two different sets of storage/communica-tion tradeoff parametersas introduced in Section 3.4. Whens 1, the mechanism incurs extra storage cost as large as

    the data itself, but only takes very small auditingbandwidth cost. Such a mechanism can be adopted whenthe auditing has to happen very frequently (e.g., checkingthe storage correctness every few minutes [21]), because theresulting data transfer charge could be dominant in the pay-per-use-model. On the other hand, we also choose aproperly largers 10, which reduces the extra storage costto only 10 percent of the original data but increases theauditing bandwidth cost roughly 10 times larger than the

    choice ofs 1. Such a case is relatively more desirable if theauditing does not need to happen frequently. In short, userscan flexibly choose the storage/communication tradeoffparameters for their different system application scenarios.

    On our not-so-powerful workstation, the measurementshows that the user setup phase (i.e., generating authentica-tors) achieves a throughput of around9.0 KB/s and17.2 KB/swhen s 1 and s 10, respectively. These results are notvery fast due to the expensive modular exponentiationoperations for each 20 byte block sector in the authenticatorcomputation. (See[28] for somesimilar experimental results.)Note that foreach data file to be outsourced, such setup phase

    happens once only. Further, since the authenticator genera-tion on each block is independent, these one-time operationscan be easily parallelized by using multithreading techniqueon the modern multicore systems. Therefore, variousoptimization techniques can be applied to speedup the userside setup phase. As ourpaperfocuses on privacy-preservingstorage auditing performance, in the following, we willprimarily assess the performance of the proposed auditingschemes on both TPA side and cloud server side, and showthey are indeed lightweight. We will focus on the cost of theprivacy-preserving protocol and our proposed batch audit-ing technique.

    4.2.1 Cost of Privacy-Preserving Protocol

    We begin by estimating the cost in terms of basiccryptographic operations (refer to Table 3 for notations).Suppose there are c random blocks specified in thechallenge message chal during the Audit phase. Underthis setting, we quantify the cost introduced by the privacy-preserving auditing in terms of server computation, auditorcomputation as well as communication overhead. Since thedifference for choices onshas been discussed previously, inthe following privacy-preserving cost analysis we only givethe atomic operation analysis for the case s 1 forsimplicity. The analysis for the case of s 10 follows

    similarly and is thus omitted.On the server side, the generated response includes an

    aggregated authenticator Q

    i2Iii 2GG1, a random

    factorR eu; vr 2GGT, and a blinded linear combination

    370 IEEE TRANSACTIONS ON COMPUTERS, VOL. 62, NO. 2, FEBRUARY 2013

    TABLE 3Notation of Cryptographic Operations

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    10/14

    of sampled blocks P

    i2Iimi r2 ZZp, where hR 2ZZp. The corresponding computation cost isc-MultExp1GG1 jij, Exp

    1GGT

    jpj, a nd Hash1ZZp AddcZZp

    Multc1

    ZZp, respectively. Compared to the existing HLA-based

    solution for ensuring remote data integrity [13], the extracost resulted from the random mask R is only a constant:Exp1GGTjpj Mult

    1ZZp

    Hash1ZZp Add1ZZp

    , which has nothingto do with the number of sampled blocksc. Whencis set tobe 300 to 460 for high assurance of auditing, as discussed inSection 3.4, the extra cost on the server side for privacy-preserving guarantee would be negligible against the totalserver computation for response generation.

    Similarly, on the auditor side, upon receiving the response

    f;R;g, the corresponding computation cost for response

    v a l i d at i o n i s Hash1ZZp cMultExp1GG1

    jij HashcGG1

    Mult1GG1 Mult

    1GGT Exp

    3GG1 jpj Pair

    2GG1;GG2 , among which

    only Hash1ZZp Exp2GG1

    jpj Mult1GGT account for the addi-

    tional constant computation cost. For c 460 or 300, and

    considering the relatively expensive pairing operations, this

    extra cost imposes little overhead on the overall cost of

    response validation, and thus can be ignored. For the sake of

    completeness, Table 4 gives the experiment result on

    performance comparison between our scheme and the state

    of the art [13]. It can be shown that the performance of our

    scheme is almost the same as that of [13], even if our scheme

    supports privacy-preserving guarantee while [13] does not.

    For the extra communication cost of our scheme when

    compared with [13], the servers response f;R;gcontains

    an additional random element R, which is a group element of

    GGTand has the size close to 960 bits.

    4.2.2 Batch Auditing Efficiency

    Discussion in Section 3.5 gives an asymptotic efficiencyanalysis on the batch auditing, by considering only the totalnumber of pairing operations. However, on the practicalside, there are additional less expensive operations requiredfor batching, such as modular exponentiations and multi-plications. Thus, whether the benefits of removing pairingssignificantly outweighs these additional operations remains

    to be verified. To get a complete view of batching efficiency,we conduct a timed batch auditing test, where the numberof auditing tasks is increased from 1 to approximately 200with intervals of 8. Note that we only focus on the choice of

    s 1 here, from which similar performance results can bedirectly obtained for the choice ofs 10. The performance

    of the corresponding nonbatched (individual) auditing isprovided as a baseline for the measurement. Following thesame settings c 300 and c 460, the average per taskauditing time, which is computed by dividing total auditingtime by the number of tasks, is given in Fig. 2 for both batchand individual auditing. It can be shown that compared toindividual auditing, batch auditing indeed helps reducingthe TPAs computation cost, as more than 15 percent of per-task auditing time is saved.

    4.2.3 Sorting Out Invalid Responses

    Now, we use experiment to justify the efficiency of ourrecursive binary search approach for the TPA to sort out the

    invalid responses for negative batch auditing result, asdiscussed in Section 3.5. This experiment is tightlypertained to the work in [20], which evaluates the batchverification of various short signatures.

    The feasibility of the recursive approach is evaluatedunder the choice of s 1, which is consistent with theexperiment settings in Section 4.2.2. We do not duplicateevaluation of the recursive binary search methodology fors 10, because similar results can be easily deduced fromthe choice of s 1. We first generate a collection of 256valid responses, which implies the TPA may concurrentlyhandle 256 different auditing delegations. We then conduct

    the tests repeatedly while randomly corrupting an-fraction, ranging from 0 to 20 percent, by replacing themwith random values. The average auditing time per taskagainst the individual auditing approach is presented inFig. 3. The result shows that even when the number ofinvalid responses exceeds 18 percent of the total batch size,the performance of batch auditing can still be safelyconcluded as more preferable than the straightforwardindividual auditing. Note that the random distribution ofinvalid responses within the collection is nearly the worstcase for batch auditing. If invalid responses are groupedtogether, even better results can be expected.

    5 ZEROKNOWLEDGEPUBLICAUDITING

    Though our scheme prevents the TPA from directly deriving

    0 from , it does not rule out the possibility of offline

    WANG ET AL.: PRIVACY-PRESERVING PUBLIC AUDITING FOR SECURE CLOUD STORAGE 371

    TABLE 4Performance under Different Number of Sampled

    Blocksc for High Assurance (95%) Auditing

    Fig. 2. Comparison on auditing time between batch and individualauditing: Per task auditing time denotes the total auditing time divided bythe number of tasks.

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    11/14

    guessing threat by TPA using valid from the response.Specifically, the TPA can always guess whether 0

    ?~0, by

    checking e; g ?

    eQsc

    is1 HWii

    u~0

    ; v, where ~0

    isconstructed from random coefficients chosen by the TPAin the challenge and the guessed message f~migs1isc .However, we must note that ~0 is chosen from ZZp and jpjis usually larger than 160 bits in practical security settings(see Section 4.2). Given no background information, thesuccess of this all-or-nothing guess on 0 launched by TPAover such a large space ZZp can be very difficult. Besides,because TPA must at least make csuccessful guesses on thesame set of blocks to derivefmigs1isc from the system ofclinear equations, we can specifycto be large enough in theprotocol (e.g., as discussed in Section 3.4, a strict choice ofc

    should be at least larger than 460), which can significantlydecrease the TPAs successful guessing probability. Inaddition, we can also restrict the number of reauditing onexactly the same set of blocks (e.g., to limit the repeatedauditing times on exactly the same set of blocks to be alwaysless thanc). In this way, TPA can be kept from accumulatingsuccessful guesses on 0 for the same set of blocks, whichfurther diminishes the chance for TPA to solve forfmigs1isc . In short, by appropriate choices of parameter cand group size ZZp, we can effectively defeat such potentialoffline guessing threat.

    Nevertheless, we present a public auditing scheme with

    provably zero knowledge leakage. This scheme can com-pletely eliminate the possibilities of above offline guessingattack, but at the cost of a little higher communication andcomputation overhead. The setup phase is similar to ourmain scheme presented in Section 3.4. The secret para-meters are sk x;ssk and the public parameters arepk spk;v;g;u;eu; v; g1, where g12 GG1 is an additionalpublic group element. In the auditphase, upon receivingchallenge chal fi; igi2I, the server chooses three ran-dom elements rm; r; ZZp, and calculates R eg1; g

    r eu; vrm 2GGT and hR 2ZZp. Let 0 denote the linearcombination of sampled blocks 0 Pi2Iimi, and denote the aggregated authenticator

    Qi2I

    ii 2GG1. To

    ensure the auditing leaks zero knowledge, the server has toblind both 0 and . Specifically, the server computes: rm

    0 modp, and g1. It then sendsf&;; ; Rg

    as the response proof of storage correctness to the TPA,where &r modp. With the response from theserver, the TPA runs VerifyProof to validate theresponse by first computing hR and then checkingthe verification equation

    R e; g ?

    e Ysc

    is1

    HWii !

    u; v ! eg1; g&: 6To see the correctness of the above equation, we have

    R e; g eg1; gr eu; vrm e g1

    ; g

    eg1; gr eu; vrm e; g eg1 ; g

    eu; vrm e; g eg1; gr

    eYscis1

    HWii

    ! u; v

    ! eg1; g

    &:

    The last equality follows from the elaboration of (1) inSection 3.4.

    Theorem 4.The above auditing protocol achieves zero-knowledgeinformation leakage to the TPA, and it also ensures the storagecorrectness guarantee.

    Proof.Zero-knowledge is easy to see. Randomly pick ;;& from ZZp and from GG1, set R e

    Qscis1

    HWii

    u; v eg1; g&=e; g and backpatch hR. For

    proof of storage correctness, we can extract similar tothe extraction of 0 as in the proof of Theorem 1.Likewise, can be recovered from . To conclude, avalid pair ofand 0 can be extracted. tu

    6 RELATEDWORKAteniese et al.[9] arethe first to consider public auditabilityintheir provable data possession (PDP) model for ensuringpossession of data files on untrustedstorages. They utilize theRSA-based homomorphic linear authenticators for auditingoutsourced data and suggest randomly sampling a fewblocks of the file. However, among their two proposedschemes, the one with public auditability exposes the linearcombination of sampled blocks to external auditor. Whenused directly, their protocol is not provably privacy preser-ving, and thus may leak user data information to the externalauditor. Juels et al. [11] describe a proof of retrievability

    (PoR) model, where spot-checking and error-correctingcodes are used to ensure both possession and retrieva-bility of data files on remote archive service systems.However, the number of audit challenges a user can performis fixed a priori, and public auditability is not supported intheir main scheme. Although they describe a straightforwardMerkle-tree construction for public PoRs, this approach onlyworks with encrypted data. Later, Bowers et al. [18] proposean improved framework for POR protocols that generalizesJuels work. Dodis et al. [29] also give a study on differentvariants of PoR with private auditability. Shacham andWaters [13] design an improved PoR scheme built from

    BLS signatures [19] with proofs of security in the securitymodel defined in [11]. Similar to the construction in [9], theyuse publicly verifiable homomorphic linear authenticatorsthat are built from provably secure BLS signatures. Based on

    372 IEEE TRANSACTIONS ON COMPUTERS, VOL. 62, NO. 2, FEBRUARY 2013

    Fig. 3. Comparison on auditing time between batch and individualauditing, when-fraction of 256 responses are invalid: Per task auditingtime denotes the total auditing time divided by the number of tasks.

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    12/14

    the elegant BLS construction, a compact and public verifiablescheme is obtained. Again, their approach is not privacypreserving due to the same reason as [9]. Shah et al. [15], [10]propose introducing a TPA to keep online storage honest byfirst encrypting the data then sending a number of pre-computed symmetric-keyed hashes over the encrypted datato theauditor. Theauditor verifies theintegrityof thedata fileand the servers possession of a previously committeddecryption key. This scheme only works for encrypted files,requires the auditor to maintain state, and suffers frombounded usage, which potentially brings in online burden tousers when the keyed hashes are used up.

    Dynamic data have also attracted attentions in the recentliterature on efficiently providing the integrity guarantee ofremotely stored data. Ateniese et al. [21] is the first topropose a partially dynamic version of the prior PDPscheme, using only symmetric key cryptography but with abounded number of audits. In [22], Wang et al. consider asimilar support for partially dynamic data storage in adistributed scenario with additional feature of data error

    localization. In a subsequent work, Wang et al. [8] proposeto combine BLS-based HLA with MHT to support fully datadynamics. Concurently, Erway et al. [23] develop a skip list-based scheme to also enable provable data possession withfull dynamics support. However, the verification in bothprotocols requires the linear combination of sampled blocksas an input, like the designs in [9], [13], and thus does notsupport privacy-preserving auditing.

    In other related work, Sebe et al. [30] thoroughly study aset of requirements which ought to be satisfied for a remotedata possession checking protocol to be of practical use.Their proposed protocol supports unlimited times of file

    integrity verifications and allows preset tradeoff between theprotocol running time and the local storage burden at theuser. Schwarz and Miller [31] propose the first study ofchecking the integrity of the remotely stored data acrossmultiple distributed servers. Their approach is based onerasure-correcting code and efficient algebraic signatures,which also have the similar aggregation property as thehomomorphic authenticator utilized in our approach.Curtmola et al. [32] aim to ensure data possession of multiplereplicas across the distributed storage system. They extendthe PDP scheme in [9] to cover multiple replicas withoutencoding each replica separately, providing guarantee thatmultiple copies of data are actually maintained. In [33],

    Bowers et al. utilize a two-layer erasure-correcting codestructure on the remotely archived data and extend theirPOR model [18] to distributed scenario with high-dataavailability assurance. While all the above schemes providemethods for efficient auditing and provable assurance on thecorrectness of remotely stored data, almost none of themnecessarily meet all the requirements for privacy-preservingpublic auditing of storage. Moreover, none of these schemesconsider batch auditing, while our scheme can greatlyreduce the computation cost on the TPA when coping witha large number of audit delegations.

    Portions of the work presented in this paper havepreviously appeared as an extended abstract in [1]. We

    have revised the paper a lot and improved many technicaldetails as compared to [1]. The primary improvements areas follows: First, we provide a new privacy-preservingpublic auditing protocol with enhanced security strength in

    Section 3.4. For completeness, we also include an additional(but slightly less efficient) protocol design for provablysecure zero-knowledge leakage public auditing scheme inSection 5. Second, based on the enhanced main auditingscheme, we provide a new provably secure batch auditingprotocol. All the experiments in our performance evaluationfor the newly designed protocol are completely redone.Third, we extend our main scheme to support datadynamics in Section 3.6, and provide discussions on howto generalize our privacy-preserving public auditingscheme in Section 3.7, which are lacking in [1]. Finally, weprovide formal analysis of privacy-preserving guaranteeand storage correctness, while only heuristic arguments aresketched in [1].

    7 CONCLUSION

    In this paper, we propose a privacy-preserving publicauditing system for data storage security in cloud comput-ing. We utilize the homomorphic linear authenticator and

    random masking to guarantee that the TPA would not learnany knowledge about the data content stored on the cloudserver during the efficient auditing process, which not onlyeliminates the burden of cloud user from the tedious andpossibly expensive auditing task, but also alleviates theusers fear of their outsourced data leakage. ConsideringTPA may concurrently handle multiple audit sessions fromdifferent users for their outsourced data files, we furtherextend our privacy-preserving public auditing protocol intoa multiuser setting, where the TPA can perform multipleauditing tasks in a batch manner for better efficiency.Extensive analysis shows that our schemes are provably

    secure and highly efficient. Our preliminary experimentconducted on Amazon EC2 instance further demonstratesthe fast performance of our design on both the cloud and theauditor side. We leave the full-fledged implementation of themechanism on commercial public cloud as an importantfuture extension, which is expected to robustly cope withvery large scale data andthus encourage users to adopt cloudstorage services more confidently.

    ACKNOWLEDGMENTS

    This work was supported in part by the US National Science

    Foundation (NSF) under grants CNS-1054317, CNS-1116939, CNS-1156318, and CNS-1117111, and by Amazonweb service research grant. A preliminary version [1] of thispaper was presented at the 29th IEEE Conference onComputer Communications (INFOCOM 10).

    REFERENCES[1] C. Wang, Q. Wang, K. Ren, and W. Lou, Privacy-Preserving

    Public Auditing for Storage Security in Cloud Computing, Proc.IEEE INFOCOM 10, Mar. 2010.

    [2] P. Mell and T. Grance, Draft NIST Working Definition ofCloud Computing, http://csrc.nist.gov/groups/SNS/cloud-computing/index.html, June 2009.

    [3] M. Armbrust, A. Fox, R. Griffith, A.D. Joseph, R.H. Katz, A.Konwinski, G. Lee, D.A. Patterson, A. Rabkin, I. Stoica, and M.Zaharia, Above the Clouds: A Berkeley View of Cloud Comput-ing, Technical Report UCB-EECS-2009-28, Univ. of California,Berkeley, Feb. 2009.

    WANG ET AL.: PRIVACY-PRESERVING PUBLIC AUDITING FOR SECURE CLOUD STORAGE 373

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    13/14

    [4] Cloud Security Alliance, Top Threats to Cloud Computing,http://www.cloudsecurityalliance.org, 2010.

    [5] M. Arrington, Gmail Disaster: Reports of Mass Email Deletions,http://www.techcrunch.com/2006/12/28/gmail-disasterreports-of-mass-email-deletions/, 2006.

    [6] J. Kincaid, MediaMax/TheLinkup Closes Its Doors, http://www.techcrunch.com/2008/07/10/mediamaxthelinkup-closes-its-doors/, July 2008.

    [7] Amazon.com, Amazon s3 Availability Event: July 20, 2008,http://status.aws.amazon.com/s3-20080720.html, July 2008.

    [8] Q. Wang, C. Wang, K. Ren, W. Lou, and J. Li, Enabling PublicAuditability and Data Dynamics for Storage Security in CloudComputing, IEEE Trans. Parallel and Distributed Systems, vol. 22,no. 5, pp. 847-859, May 2011.

    [9] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z.Peterson, and D. Song, Provable Data Possession at UntrustedStores, Proc. 14th ACM Conf. Computer and Comm. Security(CCS 07), pp. 598-609, 2007.

    [10] M.A. Shah, R. Swaminathan, and M. Baker, Privacy-PreservingAudit and Extraction of Digital Contents, Cryptology ePrintArchive, Report 2008/186, 2008.

    [11] A. Juels and J. Burton, S. Kaliski, PORs: Proofs of Retrievabilityfor Large Files, Proc. ACM Conf. Computer and Comm. Security(CCS 07),pp. 584-597, Oct. 2007.

    [12] Cloud Security Alliance, Security Guidance for Critical Areas of

    Focus in Cloud Computing, http://www.cloudsecurityalliance.org, 2009.

    [13] H. Shacham and B. Waters, Compact Proofs of Retrievability,Proc. Intl Conf. Theory and Application of Cryptology and InformationSecurity: Advances in Cryptology (Asiacrypt), vol. 5350, pp. 90-107,Dec. 2008.

    [14] C. Wang, K. Ren, W. Lou, and J. Li, Towards Publicly AuditableSecure Cloud Data Storage Services, IEEE Network Magazine,vol. 24, no. 4, pp. 19-24, July/Aug. 2010.

    [15] M.A. Shah, M. Baker, J.C. Mogul, and R. Swaminathan, Auditingto Keep Online Storage Services Honest, Proc. 11th USENIXWorkshop Hot Topics in Operating Systems (HotOS 07),pp. 1-6, 2007.

    [16] 104th United States Congress, Health Insurance Portability andAccountability Act of 1996 (HIPPA), http://aspe.hhs.gov/admnsimp/pl104191.htm, 1996.

    [17] R. Curtmola, O. Khan, and R. Burns, Robust Remote DataChecking, Proc. Fourth ACM Intl Workshop Storage Security and

    Survivability (StorageSS 08), pp. 63-68, 2008.[18] K.D. Bowers, A. Juels, and A. Oprea, Proofs of Retrievability:

    Theory and Implementation,Proc. ACM Workshop Cloud Comput-ing Security (CCSW 09), pp. 43-54, 2009.

    [19] D. Boneh, B. Lynn, and H. Shacham, Short Signatures from theWeil Pairing,J. Cryptology, vol. 17, no. 4, pp. 297-319, 2004.

    [20] A.L. Ferrara, M. Green, S. Hohenberger, and M. Pedersen,Practical Short Signature Batch Verification, Proc. Cryptogra-

    phers Track at the RSA Conf. 2009 on Topics in Cryptology (CT-RSA),pp. 309-324, 2009.

    [21] G. Ateniese, R.D. Pietro, L.V. Mancini, and G. Tsudik,Scalable and Efficient Provable Data Possession, Proc. IntlConf. Security and Privacy in Comm. Networks (SecureComm 08),pp. 1-10, 2008.

    [22] C. Wang, Q. Wang, K. Ren, and W. Lou, Towards Secure andDependable Storage Services in Cloud Computing, IEEE Trans.Service Computing, vol. 5, no. 2, 220-232, Apr.-June 2012.

    [23] C. Erway, A. Kupcu, C. Papamanthou, and R. Tamassia,Dynamic Provable Data Possession,Proc. ACM Conf. Computerand Comm. Security (CCS 09), pp. 213-222, 2009.

    [24] R.C. Merkle, Protocols for Public Key Cryptosystems,Proc. IEEESymp. Security and Privacy, 1980.

    [25] G. Ateniese, S. Kamara, and J. Katz, Proofs of Storage fromHomomorphic Identification Protocols, Proc. 15th Intl Conf.Theory and Application of Cryptology and Information Security:

    Advances in Cryptology (ASIACRYPT),pp. 319-333, 2009.[26] M. Bellare and G. Neven, Multi-Signatures in the Plain Public-

    Key Model and a General Forking Lemma, Proc. ACM Conf.Computer and Comm. Security (CCS), pp. 390-399, 2006.

    [27] Amazon.com, Amazon Elastic Compute Cloud, http://aws.amazon.com/ec2/, 2009.[28] Y. Zhu, H. Wang, Z. Hu, G.-J. Ahn, H. Hu, and S. Yau, Efficient

    Provable Data Possession for Hybrid Clouds, Cryptology ePrintArchive, Report 2010/234, 2010.

    [29] Y. Dodis, S.P. Vadhan, and D. Wichs, Proofs of Retrievability viaHardness Amplification,Proc. Theory of Cryptography Conf. Theoryof Cryptography (TCC), pp. 109-127, 2009.

    [30] F. Sebe, J. Domingo-Ferrer, A. Martnez-Balleste, Y. Deswarte, andJ.-J. Quisquater, Efficient Remote Data Possession Checking inCritical Information Infrastructures, IEEE Trans. Knowledge andData Eng.,vol. 20, no. 8, pp. 1034-1038, Aug. 2008.

    [31] T. Schwarz and E.L. Miller, Store, Forget, and Check: UsingAlgebraic Signatures to Check Remotely Administered Storage,Proc. IEEE Intl Conf. Distributed Computing Systems (ICDCS 06),

    2006.[32] R. Curtmola, O. Khan, R. Burns, and G. Ateniese, MR-PDP:

    Multiple-Replica Provable Data Possession,Proc. IEEE Intl Conf.Distributed Computing Systems (ICDCS 08),pp. 411-420, 2008.

    [33] K.D. Bowers, A. Juels, and A. Oprea, HAIL: A High-Availabilityand Integrity Layer for Cloud Storage, Proc. ACM Conf. Computerand Comm. Security (CCS 09), pp. 187-198, 2009.

    Cong Wang is an assistant professor in theComputer Science Department at City Univer-sity of Hong Kong. He received his BE and MEdegrees from Wuhan University in 2004 and2007, and the PhD degree from Illinois Instituteof Technology in 2012, all in electrical andcomputer engineering. He worked at the PaloAlto Research Center in the summer of 2011.His research interests are in the areas of cloudcomputing security, with current focus on

    secure data outsourcing and secure computation outsourcing in publiccloud. He is a member of the ACM and a member of the IEEE.

    Sherman S.M. Chow received the BEng (firstclass honors) and MPhil degrees from theUniversity of Hong Kong, and the MS and PhDdegrees from Courant Institute of MathematicalSciences, New York University. He is an assis-tant professor in the Department of InformationEngineering at the Chinese University of HongKong, a position he commenced after his post-doctoral research at the Department of Combi-natorics and Optimization, University of

    Waterloo. He interned at NTT Research and Development (Tokyo),Microsoft Research (Redmond) and Fuji Xerox Palo Alto Laboratory, andhas made research visits to the University of Maryland, University ofCalgary, University of Texas, HKU, MIT, and Queensland University ofTechnology. His research interestsare applied cryptography, privacy anddistributed systems security in general. He serves on the programcommittees of several international conferences including ACNS 2012-2013, ASIACRYPT 2012-2013, WPES 2012, ASIACCS 2013, IEEE-CNS2013 and Financial Crypt. 2013

    Qian Wang received the BS degree from WuhanUniversity, China, in 2003, his MS degree fromShanghai Institute of Microsystem and Informa-tion Technology, Chinese Academy of Sciences,China, in 2006, and his PhD degree from IllinoisInstitute of Technology in 2012, all in electrical

    and computer engineering. Currently, he is anassociate professor with the School of Compu-ter, Wuhan University. His research interestsinclude wireless network security and privacy,

    cloud computing security, and applied cryptography. He is a corecipientof the Best Paper Award from IEEE ICNP 2011. He is a member of theACM and a member of the IEEE.

    374 IEEE TRANSACTIONS ON COMPUTERS, VOL. 62, NO. 2, FEBRUARY 2013

  • 8/9/2019 2013-IEEE Privacy-Preserving Public Auditing for Secure Cloud Storage.pdf

    14/14

    Kui Ren is an associate professor in theComputer Science and Engineering Departmentat the State University of New York at Buffalo.Prior to that, he was an assistant/associateprofessor in the Electrical and Computer En-gineering Department at Illinois Institute ofTechnology. He obtained his PhD degree inelectrical and computer engineering from Wor-cester Polytechnic Institute in 2007. His re-search interests include security and privacy in

    cloud computing, wireless security, and smart grid security. His researchis supported by NSF, DOE, AFRL, and Amazon. He was a recipient ofan NSF Faculty Early Career Development (CAREER) Award in 2011and a corecipient of the Best Paper Award of IEEE ICNP 2011. Heserves on the editorial boards of IEEE Transactions on InformationForensics and Security, IEEE Transactions on Smart Grid, IEEEWireless Communications, IEEE Communications Surveys and Tutor-ials, and Journal of Communications and Networks. He is a seniormember of the IEEE.

    Wenjing Lou is an associate professor atVirginia Polytechnic Institute and State Univer-sity. Prior to joining Virginia Tech in 2011, shewas on the faculty of Worcester PolytechnicInstitute from 2003 to 2011. She received herPhD in Electrical and Computer Engineering atthe University of Florida in 2003. Her currentresearch interests are in cyber security, withemphases on wireless network security and datasecurity and privacy in cloud computing. She

    was a recipient of the US National Science Foundation CAREER awardin 2008. She is a senior member of the IEEE.

    . For more information on this or any other computing topic,please visit our Digital Library at www.computer.org/publications/dlib.

    WANG ET AL.: PRIVACY-PRESERVING PUBLIC AUDITING FOR SECURE CLOUD STORAGE 375