Oruta-Privacy Preserving Public Auditing

15
Oruta: Privacy-Preserving Public Auditing for Shared Data in the Cloud Boyang Wang, Baochun Li, Member, IEEE, and Hui Li, Member, IEEE Abstract—With cloud storage services, it is commonplace for data to be not only stored in the cloud, but also shared across multiple users. However, public auditing for such shared data — while preserving identity privacy — remains to be an open challenge. In this paper, we propose the first privacy-preserving mechanism that allows public auditing on shared data stored in the cloud. In particular, we exploit ring signatures to compute the verification information needed to audit the integrity of shared data. With our mechanism, the identity of the signer on each block in shared data is kept private from a third party auditor (TPA), who is still able to publicly verify the integrity of shared data without retrieving the entire file. Our experimental results demonstrate the effectiveness and efficiency of our proposed mechanism when auditing shared data. Index Terms—Public auditing, privacy-preserving, shared data, cloud computing. 1 I NTRODUCTION C LOUD service providers manage an enterprise-class infrastructure that offers a scalable, secure and re- liable environment for users, at a much lower marginal cost due to the sharing nature of resources. It is routine for users to use cloud storage services to share data with others in a team, as data sharing becomes a standard fea- ture in most cloud storage offerings, including Dropbox and Google Docs. The integrity of data in cloud storage, however, is subject to skepticism and scrutiny, as data stored in an untrusted cloud can easily be lost or corrupted, due to hardware failures and human errors [1]. To protect the integrity of cloud data, it is best to perform public auditing by introducing a third party auditor (TPA), who offers its auditing service with more powerful computa- tion and communication abilities than regular users. The first provable data possession (PDP) mechanism [2] to perform public auditing is designed to check the correctness of data stored in an untrusted server, without retrieving the entire data. Moving a step forward, Wang et al. [3] (referred to as WWRL in this paper) is designed to construct a public auditing mechanism for cloud data, so that during public auditing, the content of private data belonging to a personal user is not disclosed to the third party auditor. We believe that sharing data among multiple users is perhaps one of the most engaging features that motivates cloud storage. A unique problem introduced during the Boyang Wang and Hui Li are with the State Key Laboratory of Integrated Services Networks, Xidian University, Xi’an, 710071, China. Boyang Wang is also a visiting Ph.D student at Department of Electrical and Computer Engineering, University of Toronto. E-mail: {bywang,lihui}@mail.xidian.edu.cn Baochun Li is with the Department of Electrical and Computer Engineer- ing, University of Toronto, Toronto, ON, M5S 3G4, Canada. E-mail: [email protected] process of public auditing for shared data in the cloud is how to preserve identity privacy from the TPA, because the identities of signers on shared data may indicate that a particular user in the group or a special block in shared data is a higher valuable target than others. For example, Alice and Bob work together as a group and share a file in the cloud. The shared file is divided into a number of small blocks, which are independently signed by users. Once a block in this shared file is modified by a user, this user needs to sign the new block using her public/private key pair. The TPA needs to know the identity of the signer on each block in this shared file, so that it is able to audit the integrity of the whole file based on requests from Alice or Bob. a block signed by Alice a block signed by Bob A A A A A A B A B B A B A A A A A A A B B B A A A A B A A A B B 8th 8th 8th Auditing Task 1 Auditing Task 2 Auditing Task 3 TPA Fig. 1. Alice and Bob share a file in the cloud. The TPA audits the integrity of shared data with existing mecha- nisms. As shown in Fig. 1, after performing several auditing tasks, some private and sensitive information may reveal to the TPA. On one hand, most of the blocks in shared file are signed by Alice, which may indicate that Alice is a important role in this group, such as a group leader. On the other hand, the 8-th block is frequently modified by different users. It means this block may contain high- value data, such as a final bid in an auction, that Alice IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL 2, NO 1, JAN.-MARCH 2014 43

description

trfg

Transcript of Oruta-Privacy Preserving Public Auditing

  • IEEE TRANSACTIONS ON XXXXXX, VOL. X, NO. X, XXXX 201X 1

    Oruta: Privacy-Preserving Public Auditingfor Shared Data in the Cloud

    Boyang Wang, Baochun Li, Member, IEEE, and Hui Li, Member, IEEE

    AbstractWith cloud storage services, it is commonplace for data to be not only stored in the cloud, but also shared across multipleusers. However, public auditing for such shared data while preserving identity privacy remains to be an open challenge. In thispaper, we propose the first privacy-preserving mechanism that allows public auditing on shared data stored in the cloud. In particular,we exploit ring signatures to compute the verification information needed to audit the integrity of shared data. With our mechanism, theidentity of the signer on each block in shared data is kept private from a third party auditor (TPA), who is still able to publicly verify theintegrity of shared data without retrieving the entire file. Our experimental results demonstrate the effectiveness and efficiency of ourproposed mechanism when auditing shared data.

    Index TermsPublic auditing, privacy-preserving, shared data, cloud computing.

    F

    1 INTRODUCTION

    C LOUD service providers manage an enterprise-classinfrastructure that offers a scalable, secure and re-liable environment for users, at a much lower marginalcost due to the sharing nature of resources. It is routinefor users to use cloud storage services to share data withothers in a team, as data sharing becomes a standard fea-ture in most cloud storage offerings, including Dropboxand Google Docs.The integrity of data in cloud storage, however, is

    subject to skepticism and scrutiny, as data stored in anuntrusted cloud can easily be lost or corrupted, dueto hardware failures and human errors [1]. To protectthe integrity of cloud data, it is best to perform publicauditing by introducing a third party auditor (TPA), whooffers its auditing service with more powerful computa-tion and communication abilities than regular users.The first provable data possession (PDP) mechanism

    [2] to perform public auditing is designed to check thecorrectness of data stored in an untrusted server, withoutretrieving the entire data. Moving a step forward, Wanget al. [3] (referred to as WWRL in this paper) is designedto construct a public auditing mechanism for cloud data,so that during public auditing, the content of privatedata belonging to a personal user is not disclosed to thethird party auditor.We believe that sharing data among multiple users is

    perhaps one of the most engaging features that motivatescloud storage. A unique problem introduced during the

    Boyang Wang and Hui Li are with the State Key Laboratory of IntegratedServices Networks, Xidian University, Xian, 710071, China. BoyangWang is also a visiting Ph.D student at Department of Electrical andComputer Engineering, University of Toronto.E-mail: {bywang,lihui}@mail.xidian.edu.cn

    Baochun Li is with the Department of Electrical and Computer Engineer-ing, University of Toronto, Toronto, ON, M5S 3G4, Canada.E-mail: [email protected]

    process of public auditing for shared data in the cloud ishow to preserve identity privacy from the TPA, becausethe identities of signers on shared data may indicate thata particular user in the group or a special block in shareddata is a higher valuable target than others.

    For example, Alice and Bob work together as a groupand share a file in the cloud. The shared file is dividedinto a number of small blocks, which are independentlysigned by users. Once a block in this shared file ismodified by a user, this user needs to sign the new blockusing her public/private key pair. The TPA needs toknow the identity of the signer on each block in thisshared file, so that it is able to audit the integrity of thewhole file based on requests from Alice or Bob.

    a block signed by Alice a block signed by Bob

    A A A A A A B A B B

    A B

    A A A A A A A B B B

    A A A A B A A A B B

    8th

    8th

    8th

    Auditing Task 1

    Auditing Task 2

    Auditing Task 3

    TPA

    Fig. 1. Alice and Bob share a file in the cloud. The TPAaudits the integrity of shared data with existing mecha-nisms.

    As shown in Fig. 1, after performing several auditingtasks, some private and sensitive information may revealto the TPA. On one hand, most of the blocks in sharedfile are signed by Alice, which may indicate that Aliceis a important role in this group, such as a group leader.On the other hand, the 8-th block is frequently modifiedby different users. It means this block may contain high-value data, such as a final bid in an auction, that Alice

    IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL 2, NO 1, JAN.-MARCH 2014 43

  • 2 IEEE TRANSACTIONS ON XXXXXX, VOL. X, NO. X, XXXX 201X

    and Bob need to discuss and change it several times.As described in the example above, the identities of

    signers on shared data may indicate which user in thegroup or block in shared data is a higher valuable targetthan others. Such information is confidential to the groupand should not be revealed to any third party. However,no existing mechanism in the literature is able to performpublic auditing on shared data in the cloud while stillpreserving identity privacy.In this paper, we propose Oruta1, a new privacy-

    preserving public auditing mechanism for shared data inan untrusted cloud. In Oruta, we utilize ring signatures[4], [5] to construct homomorphic authenticators [2], [6],so that the third party auditor is able to verify theintegrity of shared data for a group of users withoutretrieving the entire data while the identity of thesigner on each block in shared data is kept private fromthe TPA. In addition, we further extend our mechanismto support batch auditing, which can audit multipleshared data simultaneously in a single auditing task.Meanwhile, Oruta continues to use random masking[3] to support data privacy during public auditing, andleverage index hash tables [7] to support fully dynamicoperations on shared data. A dynamic operation indi-cates an insert, delete or update operation on a singleblock in shared data. A high-level comparison betweenOruta and existing mechanisms in the literature is shownin Table 1. To our best knowledge, this paper representsthe first attempt towards designing an effective privacy-preserving public auditing mechanism for shared datain the cloud.

    TABLE 1Comparison with Existing Mechanisms

    PDP [2] WWRL [3] OrutaPublic auditing Yes Yes YesData privacy No Yes YesIdentity privacy No No Yes

    The remainder of this paper is organized as follows.In Section 2, we present the system model and threatmodel. In Section 3, we introduce cryptographic prim-itives used in Oruta. The detailed design and securityanalysis of Oruta are presented in Section 4 and Section5. In Section 6, we evaluates the performance of Oruta.Finally, we briefly discuss related work in Section 7, andconclude this paper in Section 8.

    2 PROBLEM STATEMENT2.1 System ModelAs illustrated in Fig. 2, our work in this paper involves

    three parties: the cloud server, the third party auditor(TPA) and users. There are two types of users in agroup: the original user and a number of group users.The original user and group users are both members

    1. Oruta: One Ring to Rule Them All.

    of the group. Group members are allowed to accessand modify shared data created by the original userbased on access control polices [8]. Shared data and itsverification information (i.e. signatures) are both storedin the cloud server. The third party auditor is able toverify the integrity of shared data in the cloud server onbehalf of group members.

    Users Cloud Server

    Third Party Auditor (TPA)

    Shared Data Flow

    1. Auditing Request

    4. Auditing Report

    2. Auditing Message3. Auditing Proof

    Fig. 2. Our system model includes the cloud server, thethird party auditor and users.

    In this paper, we only consider how to audit theintegrity of shared data in the cloud with static groups.It means the group is pre-defined before shared datais created in the cloud and the membership of usersin the group is not changed during data sharing. Theoriginal user is responsible for deciding who is ableto share her data before outsourcing data to the cloud.Another interesting problem is how to audit the integrityof shared data in the cloud with dynamic groups anew user can be added into the group and an existinggroup member can be revoked during data sharing while still preserving identity privacy. We will leave thisproblem to our future work.When a user (either the original user or a group user)

    wishes to check the integrity of shared data, she firstsends an auditing request to the TPA. After receiving theauditing request, the TPA generates an auditing messageto the cloud server, and retrieves an auditing proof ofshared data from the cloud server. Then the TPA verifiesthe correctness of the auditing proof. Finally, the TPAsends an auditing report to the user based on the resultof the verification.

    2.2 Threat Model2.2.1 Integrity ThreatsTwo kinds of threats related to the integrity of shared

    data are possible. First, an adversary may try to corruptthe integrity of shared data and prevent users fromusing data correctly. Second, the cloud service providermay inadvertently corrupt (or even remove) data inits storage due to hardware failures and human errors.Making matters worse, in order to avoid jeopardizing itsreputation, the cloud server provider may be reluctantto inform users about such corruption of data.

    44

  • WANG et al.: ORUTA: PRIVACY-PRESERVING PUBLIC AUDITING FOR SHARED DATA IN THE CLOUD 3

    2.2.2 Privacy ThreatsThe identity of the signer on each block in shared

    data is private and confidential to the group. Duringthe process of auditing, a semi-trusted TPA, who is onlyresponsible for auditing the integrity of shared data, maytry to reveal the identity of the signer on each block inshared data based on verification information. Once theTPA reveals the identity of the signer on each block, itcan easily distinguish a high-value target (a particularuser in the group or a special block in shared data).

    2.3 Design ObjectivesTo enable the TPA efficiently and securely verify

    shared data for a group of users, Oruta should bedesigned to achieve following properties: (1) Public Au-diting: The third party auditor is able to publicly verifythe integrity of shared data for a group of users withoutretrieving the entire data. (2) Correctness: The thirdparty auditor is able to correctly detect whether there isany corrupted block in shared data. (3) Unforgeability:Only a user in the group can generate valid verificationinformation on shared data. (4) Identity Privacy: Duringauditing, the TPA cannot distinguish the identity of thesigner on each block in shared data.

    3 PRELIMINARIESIn this section, we briefly introduce cryptographic

    primitives and their corresponding properties that weimplement in Oruta.

    3.1 Bilinear MapsWe first introduce a few concepts and properties re-

    lated to bilinear maps. We follow notations from [5], [9]:

    1) G1, G2 and GT are three multiplicative cyclicgroups of prime order p;

    2) g1 is a generator of G1, and g2 is a generator of G2;3) is a computable isomorphism from G2 to G1,

    with (g2) = g1;4) e is a bilinear map e: G1 G2 GT with the

    following properties: Computability: there existsan efficiently computable algorithm for computingthe map e. Bilinearity: for all u G1, v G2 anda, b Zp, e(u

    a, vb) = e(u, v)ab. Non-degeneracy:e(g1, g2) 6= 1.

    These properties further imply two additional proper-ties: (1) for any u1, u2 G1 and v G2, e(u1 u2, v) =e(u1, v) e(u2, v); (2) for any u, v G2, e((u), v) =e((v), u).

    3.2 Complexity AssumptionsDefinition 1: Discrete Logarithm Problem. For a

    Zp, given g, h = ga G1, output a.

    The Discrete Logarithm assumption holds in G1 if not-time algorithm has advantage at least in solving theDiscrete Logarithm problem in G1, which means it is

    computational infeasible to solve the Discrete Logarithmproblem in G1.Definition 2: Computational Co-Diffie-Hellman

    Problem. For a Zp, given g2, ga2 G2 and h G1,

    compute ha G1.The co-CDH assumption holds in G1 and G2 if no t-

    time algorithm has advantage at least in solving theco-CDH problem in G1 and G2. When G1 = G2 andg1 = g2, the co-CDH problem can be reduced to thestandard CDH problem in G1. The co-CDH assumptionis a stronger assumption than the Discrete Logarithmassumption.Definition 3: Computational Diffie-Hellman Prob-

    lem. For a, b Zp, given g1, ga1 , g

    b1 G1, compute

    gab1 G1.The CDH assumption holds in G1 if no t-time al-

    gorithm has advantage at least in solving the CDHproblem in G1.

    3.3 Ring SignaturesThe concept of ring signatures is first proposed by

    Rivest et al. [4] in 2001. With ring signatures, a verifieris convinced that a signature is computed using one ofgroup members private keys, but the verifier is not ableto determine which one. This property can be used topreserve the identity of the signer from a verifier.The ring signature scheme introduced by Boneh et

    al. [5] (referred to as BGLS in this paper) is constructedon bilinear maps. We will extend this ring signaturescheme to construct our public auditing mechanism.

    3.4 Homomorphic AuthenticatorsHomomorphic authenticators (also called homomor-

    phic verifiable tags) are basic tools to construct dataauditing mechanisms [2], [3], [6]. Besides unforgeability(only a user with a private key can generate valid signa-tures), a homomorphic authenticable signature scheme,which denotes a homomorphic authenticator based onsignatures, should also satisfy the following properties:Let (pk, sk) denote the signers public/private key

    pair, 1 denote a signature on block m1 Zp, 2 denotea signature on block m2 Zp.

    Blockless verification: Given 1 and 2, two ran-dom values 1, 2 Zp and a block m

    =1m1 + 2m2 Zp, a verifier is able to check thecorrectness of block m without knowing block m1and m2.

    Non-malleability Given 1 and 2, two randomvalues 1, 2 Zp and a blockm

    = 1m1 + 2m2 Zp, a user, who does not have private key sk, is notable to generate a valid signature on block m bylinearly combining signature 1 and 2.

    Blockless verification allows a verifier to audit the cor-rectness of data stored in the cloud server with a singleblock, which is a linear combination of all the blocksin data. If the combined block is correct, the verifier

    45

  • 4 IEEE TRANSACTIONS ON XXXXXX, VOL. X, NO. X, XXXX 201X

    believes that the blocks in data are all correct. In this way,the verifier does not need to download all the blocks tocheck the integrity of data. Non-malleability indicatesthat an attacker cannot generate valid signatures oninvalid blocks by linearly combining existing signatures.Other cryptographic techniques related to homomor-

    phic authenticable signatures includes aggregate sig-natures [5], homomorphic signatures [10] and batch-verification signatures [11]. If a signature scheme isblockless verifiable and malleable, it is a homomorphicsignature scheme. In the construction of data auditingmechanisms, we should use homomorphic authenticablesignatures, not homomorphic signatures.

    4 HOMOMORPHIC AUTHENTICABLE RINGSIGNATURES4.1 OverviewIn this section, we introduce a new ring signature

    scheme, which is suitable for public auditing. Then, wewill show how to build the privacy-preserving publicauditing mechanism for shared data in the cloud basedon this new ring signature scheme in the next section.As we introduced in previous sections, we intend to

    utilize ring signatures to hide the identity of the signeron each block, so that private and sensitive informationof the group is not disclosed to the TPA. However,traditional ring signatures [4], [5] cannot be directly usedinto public auditing mechanisms, because these ringsignature schemes do not support blockless verification.Without blockless verification, the TPA has to downloadthe whole data file to verify the correctness of shareddata, which consumes excessive bandwidth and takeslong verification times.Therefore, we first construct a new homomorphic

    authenticable ring signature (HARS) scheme, which isextended from a classic ring signature scheme [5], de-noted as BGLS. The ring signatures generated by HARSis able not only to preserve identity privacy but also tosupport blockless verification.

    4.2 Construction of HARSHARS contains three algorithms: KeyGen, RingSign

    and RingVerify. In KeyGen, each user in the groupgenerates her public key and private key. In RingSign, auser in the group is able to sign a block with her privatekey and all the group members public keys. A verifieris allowed to check whether a given block is signed bya group member in RingVerify.Scheme Details. Let G1, G2 and GT be multiplicative

    cyclic groups of order p, g1 and g2 be generators of G1and G2 respectively. Let e : G1 G2 GT be a bilinearmap, and : G2 G1 be a computable isomorphismwith (g2) = g1. There is a public map-to-point hashfunction H1: {0, 1}

    G1. The global parameters are(e, , p,G1, G2, GT , g1, g2, H1). The total number of usersin the group is d. Let U denote the group that includesall the d users.

    KeyGen. For a user ui in the group U , she randomlypicks xi Zp and computes wi = g

    xi2 G2. Then, user

    uis public key is pki = wi and her private key is ski =xi.RingSign. Given all the d users public keys

    (pk1, ...,pkd) = (w1, ..., wd), a block m Zp, the iden-tifier of this block id and the private key sks for somes, user us randomly chooses ai Zp for all i 6= s, wherei [1, d], and let i = g

    ai1 . Then, she computes

    = H1(id)gm1 G1, (1)

    and sets

    s =

    (

    (

    i6=s waii )

    )1/xs G1. (2)

    And the ring signature of block m is = (1, ..., d) Gd1.RingVerify. Given all the d users public keys

    (pk1, ...,pkd) = (w1, ..., wd), a block m, an identifier idand a ring signature = (1, ..., d), a verifier firstcomputes = H1(id)g

    m1 G1, and then checks

    e(, g2)?=

    di=1

    e(i, wi). (3)

    If the above equation holds, then the given block m issigned by one of these d users in the group. Otherwise,it is not.

    4.3 Security Analysis of HARSNow, we discuss some important properties of HARS,

    including correctness, unforgeability, blockless verifica-tion, non-malleability and identity privacy.Theorem 1: Given any block and its ring signature, a

    verifier is able to correctly check the integrity of this blockunder HARS.

    Proof: To prove the correctness of HARS is equiva-lent of proving Equation (3) is correct. Based on prop-erties of bilinear maps, the correctness of this equationcan be proved as follows:

    di=1

    e(i, wi) = e(s, ws) i6=s

    e(i, wi)

    = e(

    (

    (

    i6=s waii )

    ) 1xs

    , gxs2 ) i6=s

    e(gai1 , gxi2 )

    = e(

    (

    i6=s gxiai2 )

    , g2) i6=s

    e(gaixi1 , g2)

    = e(

    i6=s g1aixi

    , g2) e(i6=s

    gaixi1 , g2)

    = e(

    i6=s g1aixi

    i6=s

    g1aixi , g2)

    = e(, g2).

    46

  • WANG et al.: ORUTA: PRIVACY-PRESERVING PUBLIC AUDITING FOR SHARED DATA IN THE CLOUD 5

    Now we prove that HARS is able to resistance toforgery. We follow the security model and the gamedefined in BGLS [5]. In the game, an adversary is givenall the d users public key (pk1, ...,pkd) = (w1, ..., wd),and is given access to the hash oracle and the ring-signing oracle. The goal of the adversary is to output avalid ring signature on a pair of block/identifier (id,m),where this pair of block/identifier (id,m) has never beenpresented to the ring-signing oracle. If the adversaryachieves this goal, then it wins the game.Theorem 2: Suppose A is a (t, )-algorithm that can

    generate a forgery of a ring signature on a group of users ofsize d. Then there exists a (t, )-algorithm that can solve theco-CDH problem with t 2t+2cG1(qH+dqs+qs+d)+2cG2dand (/(e+ eqs))

    2, where A issues at most qH hashqueries and at most qs ring-signing queries, e = limqs(1+1/qs)

    qs , exponentiation and inversion on G1 take time cG1 ,and exponentiation and inversion on G2 take time cG2 .

    Proof: The co-CDH problem can be solved by solvingtwo random instances of the following problem: Givengab1 , g

    a2 (and g1,g2), compute g

    b1. We shall construct an

    algorithm B that solves this problem. This problem iseasy if a = 0. In what follows, we assume a 6= 0.Initially, B randomly picks x2, ..., xn from Zp and sets

    x1 = 1. Then, it sets pki = wi = (ga2 )

    xi . AlgorithmA is given the public keys (pk1, ...,pkd) = (w1, ..., wd).Without loss of generality, we assume A can submitdistinct queries, which means for every ring-signingquery on a blockm and its identifier id, A has previouslyissued a hash query on block m and identifier id.On a hash query, B flips a coin that shows 0 with prob-

    ability pc and 1 otherwise, where pc will be determinedlater. Then B randomly picks r Zp, if the coin shows0, B returns (gab1 )

    r, otherwise it returns (ga2 )r.

    Suppose A issues a ring sign query on a block m andits identifier id. By the assumption, a hash query hasbeen issued on this pair of block/identifier (m, id). If thecoin B flipped for this hash query showed 0, then B failsand exits. Otherwise B has returned H(id)gm1 = (g

    a2 )

    r

    for some r. In this case, B chooses random a2, ..., ad Zp,computes a1 = r (a2x2 + ... + adxd), and returns thesignature = (ga11 , ..., g

    ad1 ).

    Eventually A outputs a forgery = (1, ..., d)on block m and identifier id. Again by the assump-tion, a hash query has been issued on this pair ofblock/identifier (m, id). If the coin flipped by B for thishash query did not show 0 then B fails. Otherwise,H(id)gm1 = g

    abr1 for some r chosen by B, and B can

    output gb1 by computing (d

    i=1 xii )

    1/r.Algorithm A cannot distinguish between Bs sim-

    ulation and real life. If A successfully forges a ringsignature, then B can output gb1. The probability thatB will not fail is pqsc (1 p), which is maximized whenpc = qs/(qs + 1), then the bound of this probability is1/(e(1+ qs)), where e = limqs(1+1/qs)

    qs . AlgorithmB requires d exponentiations on G2 in setup, one expo-nentiations on G1 for each of As hash queries, d + 1exponentiations on G1 for each of As signature queries,

    and d exponentiations on G1 in the output phase, sothe running time of B is the running time of A pluscG1(qH + dqs + qs + d) + cG2d.Because the co-CDH problem can be solved by solving

    two random instances of algorithm B. Therefore, if A isa (t, )-algorithm that can generate a forgery of ringsignature on a group of users of size d, then there existsa (t, )-algorithm that can solve the co-CDH problemwith t 2t + 2cG1(qH + dqs + qs + d) + 2cG2d and (/(e+ eqs))

    2.Theorem 3: For an adversary, it is computational infeasi-

    ble to forge a ring signature under HARS.Proof: As we already proved in Theorem 2, if an

    adversary can forge a ring signature, then we can find a(t, )-algorithm to solve the co-CDH problem in G1 andG2, which contradicts to the fact that the co-CDH prob-lem in G1 and G2 is hard. Therefore, for an adversary,it is computational infeasible to forge a ring signatureunder HARS.Then, based on Theorem 1 and 3, we show that HARS

    is a homomorphic authenticable ring signature scheme.Theorem 4: HARS is a homomorphic authenticable ring

    signature scheme.Proof: To prove HARS is a homomorphic authenti-

    cable ring signature scheme, we first prove that HARS isable to support blockless verification, which we definedin Section 3. Then we show HARS is also non-malleable.Given all the d users public keys (pk1, ...,pkd) =

    (w1, ..., wd), two identifiers id1 and id2, two ring signa-tures 1 = (1,1, ..., 1,d) and 2 = (2,1, ..., 2,d), and tworandom values y1, y2 Zp, a verifier is able to check thecorrectness of a combined block m = y1m1+ y2m2 Zpwithout knowing block m1 and m2 by verifying:

    e(H1(id1)y1H1(id2)

    y2gm

    1 , g2)?=

    di=1

    e(y11,i y22,i, wi).

    Based on Theorem 1, the correctness of the aboveequation can be proved as:

    e(H1(id1)y1H1(id2)

    y2gm

    1 , g2)

    = e(H1(id1)y1gy1m11 , g2) e(H1(id2)

    y2gy2m21 , g2)

    = e(1, g2)y1 e(2, g2)

    y2

    =

    di=1

    e(1,i, wi)y1

    di=1

    e(2,i, wi)y2

    =d

    i=1

    e(y11,i y22,i, wi).

    If the combined block m is correct, the verifier also be-lieves that block m1 and m2 are both correct. Therefore,HARS is able to support blockless verification.Meanwhile, an adversary, who does not have any

    users private key, cannot generate a valid ring signature on an invalid block m by linearly combining 1 and2 with y1 and y2. Because if an element

    i in

    iscomputed as i =

    y11,i

    y22,i, the whole ring signature

    = (1, ..., d) cannot pass Equation (3) in RingVerify.

    47

  • 6 IEEE TRANSACTIONS ON XXXXXX, VOL. X, NO. X, XXXX 201X

    More specifically, if block m1 and m2 are signed by thesame user, for example, user us, then

    s can be computed

    as

    s = y11,s

    y22,s =

    (y11

    y22

    i6=s wy1a1,i1,i w

    y2a2,i2,i

    )1/xs.

    For all i 6= s, i = y11,i

    y22,i = g

    (y1a1,i+y2a2,i)1 , where a1,i

    and a2,i are random values. When ring signature =

    (1, ..., d) is verified with the invalid block m

    usingEquation (3):

    di=1

    e(i, wi) = e(y11

    y22 , g2) 6= e(

    , g2),

    which means it always fails to pass the verification.Because y11

    y22 = H(id1)

    y1H(id2)y2gm

    1 is not equal to = H(id)gm

    1 .If block m1 and m2 are signed by different users, for

    example, user us and user ut, then s and

    t can be

    presented as

    s =

    (y11

    i6=s wy1a1,ii

    )1/xs g

    y2a2,s1 ,

    t = gy1a1,t1

    (y22

    i6=t wy2a2,ii

    )1/xt.

    For all i 6= s and i 6= t, i = y11,i

    y22,i = g

    (y1a1,i+y2a2,i)1 ,

    where a1,i and a2,i are random values. When ring sig-nature = (1, ...,

    d) is verified with the invalid block

    m using Equation (3):

    di=1

    e(i, wi) = e(y11

    y22 , g2) 6= e(

    , g2),

    which means it always fails to pass the verification.Therefore, an adversary cannot output valid ring signa-tures on invalid blocks by linearly combining existingsignatures, which indicates that HARS is non-malleable.Because HARS is not only blockless verifiable and butalso non-malleable, it is a homomorphic authenticablesignature scheme.Now, following the theorem in [5], we show that

    a verifier cannot distinguish the identity of the signeramong a group of users under HARS.Theorem 5: For any algorithm A, any group U with d

    users, and a random user us U , the probability Pr[A() =us] is at most 1/d under HARS, where is a ring signaturegenerated with user uss private key sks.

    Proof: For any h G1, and any s, 1 s d, the

    distribution {ga11 , ..., gad1 : ai

    R Zp for i 6= s, as chosen

    such thatd

    i=1 gai1 = h} is identical to the distribution

    {ga11 , ..., gad1 :

    di=1 g

    ai1 = h}. Therefore, given =

    (1, ..., d), the probability algorithm A distinguishes s,which indicates the identity of the signer, is at most 1/d.Details of the proof about identity privacy can be foundin [5].

    5 PRIVACY-PRESERVING PUBLIC AUDITINGFOR SHARED DATA IN THE CLOUD5.1 OverviewUsing HARS and its properties we established in the

    previous section, we now construct Oruta, our privacy-preserving public auditing mechanism for shared data inthe cloud. With Oruta, the TPA can verify the integrityof shared data for a group of users without retrievingthe entire data. Meanwhile, the identity of the signer oneach block in shared data is kept private from the TPAduring the auditing.

    5.2 Reduce Signature StorageAnother important issue we should consider in the

    construction of Oruta is the size of storage used forring signatures. According to the generation of ringsignatures in HARS, a block m is an element of Zpand its ring signature contains d elements of G1, whereG1 is a cyclic group with order p. It means a |p|-bitblock requires a d |p|-bit ring signature, which forcesusers to spend a huge amount of space on storing ringsignatures. It is very frustrating for users, because cloudservice providers, such as Amazon, will charge usersbased on the storage space they used. To reduce thestorage for ring signatures and still allow the TPA toaudit shared data efficiently, we exploit an aggregatedapproach from [6]. Specifically, we aggregate a block

    mmmj = (mj,1, ...,mj,k) Zkp in shared data as

    kl=1

    mj,l

    instead of computing gm1 in Equation (1), where 1, ..., kare random values of G1. With the aggregation, thelength of a ring signature is only d/k of the length ofa block. Similar methods to reduce the storage space ofsignatures can also be found in [7]. Generally, to obtaina smaller size of a ring signature than the size of a block,we choose k > d. As a trade-off, the communication costwill be increasing with an increase of k.

    5.3 Support Dynamic OperationsTo enable each user in the group to easily modify

    data in the cloud and share the latest version of datawith the rest of the group, Oruta should also supportdynamic operations on shared data. An dynamic opera-tion includes an insert, delete or update operation on asingle block. However, since the computation of a ringsignature includes an identifier of a block (as presentedin HARS), traditional methods, which only use the indexof a block as its identifier, are not suitable for supportingdynamic operations on shared data. The reason is that,when a user modifies a single block in shared data byperforming an insert or delete operation, the indices ofblocks that after the modified block are all changed (asshown in Figure 3 and 4), and the changes of theseindices require users to re-compute the signatures ofthese blocks, even though the content of these blocksare not modified.

    48

  • WANG et al.: ORUTA: PRIVACY-PRESERVING PUBLIC AUDITING FOR SHARED DATA IN THE CLOUD 7

    Index Block

    1 mmm12 mmm23 mmm3...

    .

    .

    .

    n mmmn

    Insert

    Index Block

    1 mmm12 mmm

    2

    3 mmm24 mmm3...

    .

    .

    .

    n+ 1 mmmn

    Fig. 3. After inserting block mmm2, all the indices after blockmmm2 are changed

    Index Block

    1 mmm12 mmm23 mmm34 mmm4...

    .

    .

    .

    n mmmn

    Delete

    Index Block

    1 mmm12 mmm33 mmm4...

    .

    .

    .

    n 1 mmmn

    Fig. 4. After deleting block mmm2, all the indices after blockmmm1 are changed

    By utilizing index hash tables [7], our mechanism canallow a user to efficiently perform a dynamic operationon a single block, and avoid this type of re-computationon other blocks. Different from [7], in our mechanism, anidentifier from the index hash table is described as idj ={vj , rj}, where vj is the virtual index of block mmmj , andrj is a random generated by a collision-resistance hashfunction H2 : {0, 1}

    Zq with rj = H2(mmmj ||vj). Here, qis a much smaller prime than p. The collision-resistanceof H2 ensures that each block has a unique identifier. Thevirtual indices are able to ensure that all the blocks inshared data are in the right order. For example, if vi < vj ,then blockmmmi is ahead of blockmmmj in shared data. Whenshared data is created by the original user, the initialvirtual index of blockmmmj is computed as vj = j , where is a system parameter decided by the original user.If a new block mmmj is inserted, the virtual index of thisnew block mmmj is v

    j = (vj1 + vj)/2. Clearly, if block

    mmmj and block mmmj+1 are both originally created by theoriginal user, the maximal number of inserted blocks thatis allowed between block mmmj and block mmmj+1 is . Theoriginal user can estimate and choose a proper value of based on the original size of shared data, the number ofusers in the group, the subject of content in shared dataand so on, Generally, we believe = 10, 000 or 100, 000 isgood enough for the maximal insert blocks between twoblocks, which are originally created by the original user.Examples of different dynamic operations on shared datawith index hash tables are described in Figure 5 and 6.

    5.4 Construction of OrutaNow, we present the details of our public auditing

    mechanism, Oruta. It includes five algorithms: KeyGen,SigGen, Modify, ProofGen and ProofVerify. In Key-Gen, users generate their own public/private key pairs.

    Insert

    Index Block V R1 mmm1 r12 mmm

    23/2 r

    2

    3 mmm2 2 r24 mmm3 3 r3...

    .

    .

    ....

    .

    .

    .

    n+ 1 mmmn n rn

    Index Block V R1 mmm1 r12 mmm2 2 r23 mmm3 3 r3...

    .

    .

    ....

    .

    .

    .

    n mmmn n rn

    Fig. 5. Insert block mmm2 into shared data using an indexhash table as identifiers.

    UpdateIndex Block V R1 mmm

    1 r

    1

    2 mmm2 2 r23 mmm4 4 r44 mmm5 5 r5...

    .

    .

    ....

    .

    .

    .

    n 1 mmmn n rn

    Index Block V R1 mmm1 r12 mmm2 2 r23 mmm3 3 r34 mmm4 4 r45 mmm5 5 r5...

    .

    .

    ....

    .

    .

    .

    n mmmn n rn

    Delete

    Fig. 6. Update block mmm1 and delete block mmm3 in shareddata using an index hash table as identifiers.

    In SigGen, a user (either the original user or a groupuser) is able to compute ring signatures on blocks inshared data. Each user in the group is able to performan insert, delete or update operation on a block, andcompute the new ring signature on this new block inModify. ProofGen is operated by the TPA and the cloudserver together to generate a proof of possession ofshared data. In ProofVerify, the TPA verifies the proofand sends an auditing report to the user.

    Note that the group is pre-defined before shared datais created in the cloud and the membership of the groupis not changed during data sharing. Before the originaluser outsources shared data to the cloud, she decides allthe group members, and computes all the initial ringsignatures of all the blocks in shared data with herprivate key and all the group members public keys.After shared data is stored in the cloud, when a groupmember modifies a block in shared data, this groupmember also needs to compute a new ring signature onthe modified block.

    Scheme Details. Let G1, G2 and GT be multiplicativecyclic groups of order p, g1 and g2 be generators ofgroups G1, G2, respectively. Let e : G1 G2 GTbe a bilinear map, and : G2 G1 be a com-putable isomorphism with (g2) = g1. There are threehash functions H1: {0, 1}

    G1, H2 : {0, 1}

    Zq and h : G1 Zp. The global parameters are(e, , p, q,G1, G2, GT , g1, g2, H1, H2, h). The total numberof users in the group is d. Let U denote the group thatincludes all the d users.

    Shared dataM is divided into n blocks, and each blockmmmj is further divided into k elements in Zp. Therefore,

    49

  • 8 IEEE TRANSACTIONS ON XXXXXX, VOL. X, NO. X, XXXX 201X

    shared data M can be described as a n k matrix:

    M =

    mmm1...

    mmmn

    =

    m1,1 . . . m1,k...

    . . ....

    mn,1 . . . mn,k

    Znkp .

    KeyGen. For user ui in the group U , she randomlypicks xi Zp and computes wi = g

    xi2 . The user uis

    public key is pki = wi and her private key is ski =xi. The original user also randomly generates a publicaggregate key pak = (1, ..., k), where l are randomelements of G1.SigGen. Given all the d group members public keys

    (pk1, ...,pkd) = (w1, ..., wd), a blockmmmj = (mj,1, ...,mj,k),its identifier idj , a private key sks for some s, user uscomputes the ring signature of this block as follows:

    1) She first aggregates block mmmj with the public ag-gregate key pak, and computes

    j = H1(idj)

    kl=1

    mj,ll G1. (4)

    2) After computing j , user us randomly choosesaj,i Zp and sets j,i = g

    aj,i1 , for all i 6= s. Then

    she calculates

    j,s =

    (j

    (

    i6=s waj,ii )

    )1/xs G1. (5)

    The ring signature of block mmmj is j =(j,1, ..., j,d) G

    d1.

    Modify. A user in the group modifies the j-th blockin shared data by performing one of the following threeoperations:

    Insert. This user inserts a new blockmmmj into shareddata. She computes the new identifier of the in-serted block mmmj as id

    j = {v

    j , r

    j}. The virtual index

    vj = (vj1 + vj)/2, and rj = H2(mmm

    j ||v

    j). For the

    rest of blocks, the identifiers of these blocks arenot changed (as explained in Figure 5). This useroutputs the new ring signature j of the insertedblock mmmj with SigGen, and uploads {mmm

    j , id

    j ,

    j}

    to the cloud server. The total number of blocks inshared data increases to n+ 1.

    Delete. This user deletes block mmmj , its identifier idjand ring signature j from the cloud server. Theidentifiers of other blocks in shared data are remainthe same. The total number of blocks in shared datadecreases to n 1.

    Update. This user updates the j-th block in shareddata with a new block mmmj . The virtual index of thisblock is remain the same, and rj is computed asrj = H2(mmm

    j ||vj). The new identifier of this updated

    block is idj = {vj , rj}. The identifiers of other

    blocks in shared data are not changed. This useroutputs the new ring signature j of this new blockwith SigGen, and uploads {mmmj , id

    j ,

    j} to the cloud

    server. The total number of blocks in shared data isstill n.

    ProofGen. To audit the integrity of shared data, auser first sends an auditing request to the TPA. Afterreceiving an auditing request, the TPA generates anauditing message [2] as follows:

    1) The TPA randomly picks a c-element subset J ofset [1, n] to locate the c selected blocks that will bechecked in this auditing process, where n is totalnumber of blocks in shared data.

    2) For j J , the TPA generates a random value yj Zq.

    Then, the TPA sends an auditing message {(j, yj)}jJ tothe cloud server (as illustrated in Fig. 7).

    Third Party Auditor Cloud Server

    {(j, yj)}jJ

    Fig. 7. The TPA sends an auditing message to the cloudserver.

    After receiving an auditing message {(j, yj)}jJ , thecloud server generates a proof of possession of selectedblocks with the public aggregate key pak. More specifi-cally:

    1) The cloud server chooses a random element rl Zq, and calculates l =

    rll G1, for l [1, k].

    2) To hide the linear combination of selected blocksusing random masking, the cloud server computesl =

    jJ yjmj,l + rlh(l) Zp, for l [1, k].

    3) The cloud server aggregates signatures as i =jJ

    yjj,i, for i [1, d].

    After the computation, the cloud server sends an au-diting proof {,,, {idj}jJ } to the TPA, where =(1, ..., k), = (1, ..., k) and = (1, ..., d) (as shownin Fig. 8).

    Third Party Auditor Cloud Server

    {,,, {idj}jJ }

    Fig. 8. The cloud server sends an auditing proof to theTPA.

    ProofVerify. With an auditing proof {,,, {idj}jJ}, an auditing message {(j, yj)}jJ , public aggregatekey pak = (1, ..., k), and all the group memberspublic keys (pk1, ...,pkd) = (w1, ..., wd), the TPA verifiesthe correctness of this proof by checking the following

    50

  • WANG et al.: ORUTA: PRIVACY-PRESERVING PUBLIC AUDITING FOR SHARED DATA IN THE CLOUD 9

    equation:

    e(jJ

    H1(idj)yj

    kl=1

    ll , g2)

    ?=

    (d

    i=1

    e(i, wi)

    ) e(

    kl=1

    h(l)l , g2). (6)

    If the above equation holds, then the TPA believes thatthe blocks in shared data are all correct, and sends apositive auditing report to the user. Otherwise, it sendsa negative one.

    Discussion. Based on the properties of bilinear maps,we can further improve the efficiency of verification bycomputing d+2 pairing operations in verification insteadof computing d+3 pairing operations with Equation (6).Specifically, Equation (6) can also be described as

    e(jJ

    H1(idj)yj

    kl=1

    ll (

    kl=1

    h(l)l )

    1, g2)?=

    di=1

    e(i, wi).

    (7)

    In the construction of Oruta, we leverage randommasking [3] to support identity privacy. If a user wantsto protect the content of private data in the cloud, thisuser can also encrypt data before outsourcing it intothe cloud server with encryption techniques, such asattribute-based encryption (ABE) [8], [12]. With samplingstrategies [2], the TPA can detect any corrupted block inshared data with a high probability by only choosing asubset of all blocks in each auditing task. Previous work[2] has already proved that, if the total number of blocksin shared data is n = 1, 000, 000 and 1% of all the blocksare lost or removed, the TPA can detect these corruptedblocks with a probability greater than 99% by choosing460 random blocks.

    5.5 Security Analysis of Oruta

    Now, we discuss security properties of Oruta, includ-ing its correctness, unforgeability, identity privacy anddata privacy.

    Theorem 6: During an auditing task, the TPA is able tocorrectly audit the integrity of shared data under Oruta.

    Proof: To prove the correctness of Oruta is equivalentof proving Equation (6) is correct. Based on properties ofbilinear maps and Theorem 1, the right-hand side (RHS)

    of Equation (6) can be expanded as follows:

    RHS =

    d

    i=1

    e(jJ

    yjj,i, wi)

    e( k

    l=1

    h(l)l , g2)

    =

    d

    i=1

    (jJ

    e(yjj,i, wi))

    e( k

    l=1

    rlh(l)l , g2)

    =

    jJ

    (d

    i=1

    e(j,i, wi)yj )

    e( k

    l=1

    rlh(l)l , g2)

    =

    jJ

    e(j , g2)yj

    e( k

    l=1

    rlh(l)l , g2)

    = e(jJ

    (H1(idj)

    kl=1

    mj,ll )

    yj , g2) e(

    kl=1

    rlh(l)l , g2)

    = e(jJ

    H1(idj)yj

    kl=1

    jJ mj,lyjl

    kl=1

    rlh(l)l , g2)

    = e(jJ

    H1(idj)yj

    kl=1

    jJ mj,lyj+rlh(l)

    l , g2)

    = e(jJ

    H1(idj)yj

    kl=1

    ll , g2).

    Theorem 7: For an untrusted cloud, it is computationalinfeasible to generate an invalid auditing proof that can passthe verification under Oruta.

    Proof: As proved in Theorem ??, for an untrustedcloud, if co-CDH problem in G1 and G2 is hard, it iscomputational infeasible to compute a valid ring signa-ture on an invalid block under HARS.

    In Oruta, besides generating valid ring signatures onarbitrary blocks, if the untrusted cloud can win Game 1,it can generate an invalid auditing proof for corruptedshared data, and successfully pass the verification. Wedefine Game 1 as follows:

    Game 1: The TPA sends an auditing message{j, yj}jJ to the cloud, the correct auditing proof shouldbe {,,, {idj}jJ }, which can pass the verificationwith Equation (6). The untrusted cloud generates an in-valid proof as {,,, {idj}jJ }, where

    = (1, ..., k).

    Define l = l l for 1 l k, and at least one

    element of {l}1lk is nonzero. If this invalid proofstill pass the verification, then the untrusted cloud wins.Otherwise, it fails.

    Now, we prove that, if the untrusted cloud can winGame 1, we can find a solution to the Discrete Logarithmproblem, which contradicts to the fact. We first assumethe untrusted cloud can win Game 1. Then, according toEquation (6), we have

    e(jJ

    H1(idj)yj

    kl=1

    ll , g2) = (

    di=1

    e(i, wi))e(kl=1

    h(l)l , g2).

    51

  • 10 IEEE TRANSACTIONS ON XXXXXX, VOL. X, NO. X, XXXX 201X

    Because {,,, {idj}jJ } is a correct auditing proof, wealso have

    e(jJ

    H1(idj)yj

    kl=1

    ll , g2) = (

    di=1

    e(i, wi))e(

    kl=1

    h(l)l , g2).

    Then, we can learn that

    kl=1

    ll =

    kl=1

    ll ,

    kl=1

    ll = 1.

    For two elements g, h G1, there exists x Zp thatg = hx because G1 is a cyclic group. Without loss ofgenerality, given g, h G1, each l is able to randomlyand correctly generated by computing l = g

    lhl G1,where l and l are random values of Zp. Then, we have

    1 =kl=1

    ll =kl=1

    (glhl)l = gk

    l=1 ll hk

    l=1 ll .

    Clearly, we can find a solution to the discrete logarithmproblem. More specifically, given g, hx G1, we cancompute

    h = g

    kl=1

    llkl=1

    ll = gx.

    unless the denominator is zero. However, as we definedin Game 1, at least one element of {l}1lk is nonzero,and l is random element of Zp, therefore, the denomi-nator is zero with probability of 1/p, which is negligible.It means, if the untrusted cloud wins Game 1, we canfind a solution to the Discrete Logarithm problem witha probability of 1 1/p, which contradicts the fact thatthe Discrete Logarithm problem is hard in G1.Therefore, for an untrusted cloud, it is computational

    infeasible to win Game 1 and generate an invalid proof,which can pass the verification.Now, we show that the TPA is able to audit the

    integrity of shared data, but the identity of the signeron each block in shared data is not disclosed to the TPA.Theorem 8: During an auditing task, the probability for

    the TPA to distinguish the identities of all the signers on thec selected blocks in shared data is at most 1/dc.

    Proof: With Theorem 5, we have, for any algorithmA, the probability to reveal the signer on one blockin shared data is 1/d. Because the c selected blocksin an auditing task are signed independently, the totalprobability that the TPA can distinguish all the signersidentities on the c selected blocks in shared data is atmost 1/dc.Let us reconsider the example in Sec. 1. With Oruta,

    the TPA knows each block in shared data is either signedby Alice or Bob, because it needs both users publickeys to verify the correctness of shared data. However,it cannot distinguish who is the signer on a single block(as shown in Fig. 9). Therefore, this third party cannotobtain private and sensitive information, such as whosigns the most blocks in shared data or which block isfrequently modified by different group members.

    a block signed by a user in the group

    N N N N N N N N N N

    N

    N N N N N N N N N N

    N N N N N N N N N N

    8th

    8th

    8th

    Auditing Task 1

    Auditing Task 2

    Auditing Task 3

    TPA

    Fig. 9. Alice and Bob share a file in the cloud, and theTPA audit the integrity of shared data with Oruta.

    Following a similar theorem in [3], we show that ourscheme is also able to support data privacy.

    Theorem 9: Given an auditing proof = {,,, {idj}jJ}, it is computational infeasible for the TPA to reveal anyprivate data in shared data under Oruta.

    Proof: If the combined element

    jJ yjmj,l, whichis a linear combination of elements in blocks, is directlysent to the TPA, the TPA can learn the content of databy solving linear equations after collecting a sufficientnumber of linear combinations. To preserve private datafrom the TPA, the combined element is computed withrandom masking as l =

    jJ yjmj,l+ rlh(l). In order

    to still solve linear equations, the TPA must know thevalue of rl Zp. However, given l G1 l =

    rll G1,

    computing rl is as hard as solving the Discrete Loga-rithm problem in G1, which is computational infeasible.Therefore, give and , the TPA cannot directly obtainany linear combination of elements in blocks, and cannotfurther reveal any private data in shared data M bysolving linear equations.

    5.6 Batch Auditing

    With the usage of public auditing in the cloud, the TPAmay receive amount of auditing requests from differentusers in a very short time. Unfortunately, allowing theTPA to verify the integrity of shared data for these usersin several separate auditing tasks would be very ineffi-cient. Therefore, with the properties of bilinear maps, wefurther extend Oruta to support batch auditing, whichcan improve the efficiency of verification on multipleauditing tasks.

    More concretely, we assume there are B auditing tasksneed to be operated, the shared data in all the B auditingtasks are denoted asM1, ...,MB and the number of userssharing data Mb is described as db, where 1 b B. Toefficiently audit these shared data for different users in asingle auditing task, the TPA sends an auditing messageas {(j, yj)}jJ to the cloud server. After receiving theauditing message, the cloud server generates an auditingproof {b,b,b, {idb,j}jJ } for each shared data Mb aswe presented in ProofGen, where 1 b B, 1 l k,

    52

  • WANG et al.: ORUTA: PRIVACY-PRESERVING PUBLIC AUDITING FOR SHARED DATA IN THE CLOUD 11

    1 i db and

    b,l = rb,lb,l

    b,l =

    jJ yjmb,j,l + rb,lh(b,l)

    b,i =

    jJ yjb,j,i

    Here idb,j is described as idb,j = {fb, vj , rj}, where fb isthe identifier of shared data Mb, e.g. the name of shareddata Mb. Clearly, if two blocks are in the same shareddata, these two blocks have the same identifier of shareddata. As before, when a user modifies a single block inshared data Mb, the identifiers of other blocks in shareddata Mb are not changed.After the computation, the cloud server sends all the

    B auditing proofs together to the TPA. Finally, the TPAverifies the correctness of these B proofs simultaneouslyby checking the following equation with all the

    Bb=1 db

    users public keys:

    e(

    Bb=1

    jJ

    H(idb,j)yj

    kl=1

    b,lb,l

    , g2)

    ?=

    (Bb=1

    dbi=1

    e(b,i, wb,i)

    ) e(

    Bb=1

    kl=1

    h(b,l)b,l , g2), (8)

    where pkb,i = wb,i. If the above verification equationholds, then the TPA believes that the integrity of all theB shared data is correct. Otherwise, there is at least oneshared data is corrupted.Based on the correctness of Equation (6), the correct-

    ness of batch auditing can be presented as follows:(Bb=1

    dbi=1

    e(b,i, wb,i)

    ) e(

    Bb=1

    kl=1

    h(b,l)b,l , g2)

    =

    (Bb=1

    dbi=1

    e(b,i, wb,i)

    )

    Bb=1

    e(

    kl=1

    h(b,l)b,l , g2)

    =Bb=1

    ((

    dbi=1

    e(b,i, wb,i)) e(kl=1

    h(b,l)b,l , g2)

    )

    =

    Bb=1

    e(jJ

    H(idb,j)yj

    kl=1

    b,lb,l , g2)

    = e(

    Bb=1

    jJ

    H(idb,j)yj

    kl=1

    b,lb,l

    , g2).

    If all the B auditing requests on B shared data arefrom the same group, the TPA can further improve theefficiency of batch auditing by verifying

    e(

    Bb=1

    jJ

    H(idb,j)yj

    kl=1

    b,lb,l

    , g2)

    ?=

    (d

    i=1

    e(

    Bb=1

    b,i, wi)

    ) e(

    Bb=1

    kl=1

    h(b,l)b,l , g2). (9)

    Note that batch auditing will fail if at least oneincorrect auditing proof exists in all the B auditing

    proofs. To allow most of auditing proofs to still passthe verification when there is only a small number ofincorrect auditing proofs, we can utilize binary search[3] during batch auditing. More specifically, once thebatch auditing of the B auditing proofs fails, the TPAdivides the set of all the B auditing proofs into twosubsets, which contains B/2 auditing proofs in eachsubset, and re-checks the correctness of auditing proofsin each subset using batch auditing. If the verificationresult of one subset is correct, then all the auditing proofsin this subset are all correct. Otherwise, this subset isfurther divided into two sub-subsets, and the TPA re-checks the correctness of auditing proofs in the eachsub-subsets with batch auditing until all the incorrectauditing proofs are found. Clearly, when the numberof incorrect auditing proofs increases, the efficiency ofbatch auditing will be reduced. Experimental results inSection 6 shows that, when less than 12% of auditingproofs among all the B auditing proofs are incorrect,batching auditing is still more efficient than verifyingthese auditing proofs one by one.

    6 PERFORMANCEIn this section, we first analysis the computation and

    communication costs of Oruta, and then evaluate theperformance of Oruta in experiments.

    6.1 Computation CostThe main cryptographic operations used in Oruta in-

    clude multiplications, exponentiations, pairing and hash-ing operations. For simplicity, we omit additions in thefollowing discussion, because they are much easier to becomputed than the four types of operations mentionedabove.During auditing, the TPA first generates some random

    values to construct the auditing message, which only in-troduces a small cost in computation. Then, after receiv-ing the auditing message, the cloud server needs to com-pute a proof {,,, {idj}jJ }. The computation cost ofcalculating a proof is about (k + dc)ExpG1 + dcMulG1 +ckMulZp + kHashZp , where ExpG1 denotes the cost ofcomputing one exponentiation in G1, MulG1 denotesthe cost of computing one multiplication in G1, MulZpand HashZp respectively denote the cost of computingone multiplcation and one hashing operation in Zp. Tocheck the correctness of the proof {,,, {idj}jJ }, theTPA verifies it based on Equation (6). The total cost ofverifying the proof is (2k + c)ExpG1 + (2k + c)MulG1 +dMulGT + cHashG1 + (d+ 2)PairG1,G2 . We use PairG1,G2to denote the cost of computing one pairing operationin G1 and G2.

    6.2 Communication CostThe communication cost of Oruta is mainly intro-

    duced by two factors: the auditing message and theauditing proof. For each auditing message {j, yj}jJ , the

    53

  • 12 IEEE TRANSACTIONS ON XXXXXX, VOL. X, NO. X, XXXX 201X

    0 5 10 15 20

    d: the size of the group

    0

    20

    40

    60

    80

    100

    Generation time (ms)

    k=100

    k=200

    Fig. 10. Impact of d on signaturegeneration time (ms).

    0 50 100 150 200

    k: the number of elements per block

    0

    10

    20

    30

    40

    50

    60

    70

    80

    Generation time (ms)

    d=10

    d=20

    Fig. 11. Impact of k on signaturegeneration time (ms).

    0 5 10 15 20

    d: the size of the group

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    3.5

    4.0

    Auditing time (s)

    c=460

    c=300

    Fig. 12. Impact of d on auditingtime (s), where k = 100.

    0 50 100 150 200

    k: the number of elements per block

    0.0

    0.5

    1.0

    1.5

    2.0

    2.5

    3.0

    Auditing time (s)

    c=460

    c=300

    Fig. 13. Impact of k on auditingtime (s), where d = 10.

    0 5 10 15 20

    d: the size of the group

    0

    2

    4

    6

    8

    10

    12

    14

    Communication cost (KB)

    c=460

    c=300

    Fig. 14. Impact of d on communi-cation cost (KB), where k = 100.

    0 20 40 60 80 100

    k: the number of elements per block

    0

    2

    4

    6

    8

    10

    12

    14

    Communication cost (KB)

    c=460

    c=300

    Fig. 15. Impact of k on communi-cation cost (KB), where d = 10.

    0 20 40 60 80 100

    B: the number of audting tasks

    1150

    1175

    1200

    1225

    1250

    1275

    1300

    1325

    Average auditing time (ms)

    Seperate Auditing

    Batch Auditing-D

    Batch Auditing-S

    Fig. 16. Impact of B on the ef-ficiency of batch auditing, wherek = 100 and d = 10.

    0 2 4 6 8 10 12 14 16

    A: the number of incorrect proofs

    1150

    1175

    1200

    1225

    1250

    1275

    1300

    1325

    Average auditing time (ms)

    Seperate Auditing

    Batch Auditing-S

    Fig. 17. Impact of A on the ef-ficiency of batch auditing, whereB = 128.

    0 2 4 6 8 10 12 14 16

    A: the number of incorrect proofs

    1290

    1295

    1300

    1305

    1310

    1315

    1320

    1325

    Average auditing time (ms)

    Seperate Auditing

    Batch Auditing-D

    Fig. 18. Impact of A on the ef-ficiency of batch auditing, whereB = 128.

    communication cost is c(|q| + |n|) bits, where |q| is thelength of an element of Zq and |n| is the length of anindex. Each auditing= {,,, {idj}jJ } contains (k+d)elements of G1, k elements of Zp and c elememts of Zq ,therefore the communication cost of one auditing proofis (2k + d)|p|+ c|q| bits.

    6.3 Experimental ResultsWe now evaluate the efficiency of Oruta in experi-

    ments. To implement these complex cryptographic op-erations that we mentioned before, we utilize the GNUMultiple Precision Arithmetic (GMP)2 library and Pair-ing Based Cryptography (PBC)3 library. All the followingexperiments are based on C and tested on a 2.26 GHzLinux system over 1, 000 times.Because Oruta needs more exponentiations than pair-

    ing operations during the process of auditing, the elliptic

    2. http://gmplib.org/3. http://crypto.stanford.edu/pbc/

    curve we choose in our experiments is an MNT curvewith a base field size of 159 bits, which has a betterperformance than other curves on computing exponen-tiations. We choose |p| = 160 bits and |q| = 80 bits.We assume the total number of blocks in shared datais n = 1, 000, 000 and |n| = 20 bits. The size of shareddata is 2 GB. To keep the detection probability greaterthan 99%, we set the number of selected blocks in anauditing task as c = 460 [2]. If only 300 blocks areselected, the detection probability is greater than 95%.We also assume the size of the group d [2, 20] in thefollowing experiments. Certainly, if a larger group sizeis used, the total computation cost will increase due tothe increasing number of exponentiations and pairingoperations.

    6.3.1 Performance of Signature GenerationAccording to Section 5, the generation time of a ring

    signature on a block is determined by the number ofusers in the group and the number of elements in each

    54

  • WANG et al.: ORUTA: PRIVACY-PRESERVING PUBLIC AUDITING FOR SHARED DATA IN THE CLOUD 13

    block. As illustrated in Fig. 10 and Fig. 11, when k isfixed, the generation time of a ring signature is linearlyincreasing with the size of the group; when d is fixed, thegeneration time of a ring signature is linearly increasingwith the number of elements in each block. Specifically,when d = 10 and k = 100, a user in the group requiresabout 37 milliseconds to compute a ring signature on ablock in shared data.

    6.3.2 Performance of AuditingBased on our proceeding analysis, the auditing perfor-

    mance of Oruta under different detection probabilities isillustrated in Fig. 1215, and Table 2. As shown in Fig. 12,the auditing time is linearly increasing with the size ofthe group. When c = 300, if there are two users sharingdata in the cloud, the auditing time is only about 0.5seconds; when the number of group member increasesto 20, it takes about 2.5 seconds to finish the sameauditing task. The communication cost of an auditingtask under different parameters is presented in Fig. 14and Fig. 15. Compare to the size of entire shared data,the communication cost that the TPA consumes in anauditing task is very small. It is clear in Table 2 thatwhen maintaining a higher detection probability, theTPA needs to consume more computation and commu-nication overhead to finish the auditing task. Specifically,when c = 300, it takes the TPA 1.32 seconds to audit thecorrectness of shared data, where the size of shared datais 2 GB; when c = 460, the TPA needs 1.94 seconds toverify the integrity of the same shared data.

    TABLE 2Performance of Auditing

    System Parameters k = 100, d = 10,Storage Usage 2GB + 200MB (data + signatures)Selected Blocks c 460 300Communication Cost 14.55KB 10.95KBAuditing Time 1.94s 1.32s

    6.3.3 Performance of Batch AuditingAs we discussed in Section 5, when there are multiple

    auditing tasks, the TPA can improve the efficiency of ver-ification by performing batch auditing. In the followingexperiments, we choose c = 300, k = 100 and d = 10.We can see from Fig. 16 that, compare to verifying theseauditing tasks one by one, if these B auditing tasks arefrom different groups, batching auditing can save 2.1%of the auditing time per auditing task on average. Ifthese B auditing tasks are from the same group, batchingauditing can save 12.6% of the average auditing time perauditing task.Now we evaluate the performance of batch auditing

    when incorrect auditing proofs exist in the B auditingproofs. As we mentioned in Section 5, we can use binarysearch in batch auditing, so that we can distinguish theincorrect ones from the B auditing proofs. However,the increasing number of incorrect auditing proofs will

    reduce the efficiency of batch auditing. It is important forus to find out the maximal number of incorrect auditingproofs exist in the B auditing proofs, so that the batchauditing is still more efficient than separate auditing.In this experiment, we assume the total number of

    auditing proofs in the batch auditing is B = 128 (becausewe leverage binary search, it is better to set B as apower of 2), the number of elements in each block isk = 100 and the number of users in the group is d = 10.Let A denote the number of incorrect auditing proofs.In addition, we also assume that it always requiresthe worst-case algorithm to detect the incorrect auditingproofs in the experiment. According to Equation (8) and(9), the extra computation cost in binary search is mainlyintroduced by extra pairing operations. As shown in Fig.17, if all the 128 auditing proofs are from the same group,when the number of incorrect auditing proofs is less than16 (12% of all the auditing proofs), batching auditingis still more efficient than separate auditing. Similarly,in Fig. 18, if all the auditing proofs are from differentgroups, when the number of incorrect auditing proofsis more than 16, batching auditing is less efficient thanverifying these auditing proofs separately.

    7 RELATED WORKProvable data possession (PDP), first proposed by

    Ateniese et al. [2], allows a verifier to check the correct-ness of a clients data stored at an untrusted server. Byutilizing RSA-based homomorphic authenticators andsampling strategies, the verifier is able to publicly auditthe integrity of data without retrieving the entire data,which is referred to as public verifiability or publicauditing. Unfortunately, their mechanism is only suitablefor auditing the integrity of static data. Juels and Kaliski[13] defined another similar model called proofs of re-trievability (POR), which is also able to check the correct-ness of data on an untrusted server. The original file isadded with a set of randomly-valued check blocks calledsentinels. The verifier challenges the untrusted serverby specifying the positions of a collection of sentinelsand asking the untrusted server to return the associatedsentinel values. Shacham and Waters [6] designed twoimproved POR schemes. The first scheme is built fromBLS signatures, and the second one is based on pseudo-random functions.To support dynamic operations on data, Ateniese et

    al. [14] presented an efficient PDP mechanism based onsymmetric keys. This mechanism can support updateand delete operations on data, however, insert opera-tions are not available in this mechanism. Because itexploits symmetric keys to verify the integrity of data, itis not public verifiable and only provides a user with alimited number of verification requests. Wang et al. uti-lized Merkle Hash Tree and BLS signatures [9] to supportfully dynamic operations in a public auditing mecha-nism. Erway et al. [15] introduced dynamic provable datapossession (DPDP) by using authenticated dictionaries,

    55

  • 14 IEEE TRANSACTIONS ON XXXXXX, VOL. X, NO. X, XXXX 201X

    which are based on rank information. Zhu et al. exploitedthe fragment structure to reduce the storage of signa-tures in their public auditing mechanism. In addition,they also used index hash tables to provide dynamicoperations for users. The public mechanism proposed byWang et al. [3] is able to preserve users confidential datafrom the TPA by using random maskings. In addition,to operate multiple auditing tasks from different usersefficiently, they extended their mechanism to enablebatch auditing by leveraging aggregate signatures [5].Wang et al. [16] leveraged homomorphic tokens to

    ensure the correctness of erasure codes-based data dis-tributed on multiple servers. This mechanism is ablenot only to support dynamic operations on data, butalso to identify misbehaved servers. To minimize com-munication overhead in the phase of data repair, Chenet al. [17] also introduced a mechanism for auditingthe correctness of data with the multi-server scenario,where these data are encoded by network coding insteadof using erasure codes. More recently, Cao et al. [18]constructed an LT codes-based secure and reliable cloudstorage mechanism. Compare to previous work [16], [17],this mechanism can avoid high decoding computationcost for data users and save computation resource foronline data owners during data repair.To prevent special attacks exist in remote data storage

    system with deduplication, Halevi et al. [19] introducedthe notation of proofs-of-ownership (POWs), which al-lows a client to prove to a server that she actually holdsa data file, rather than just some hash values of the datafile. Zheng et al. [20] further discussed that POW andPDP can co-exist under the same framework.Recently, Franz et al. [21] proposed an oblivious out-

    sourced storage scheme based on Oblivious RAM tech-niques, which is able to hide users access patterns onoutsourced data from an untrusted cloud. Vimercati etal. [22] utilize shuffle index structure to protect usersaccess patterns on outsourced data.

    8 CONCLUSIONIn this paper, we propose Oruta, the first privacy-

    preserving public auditing mechanism for shared datain the cloud. We utilize ring signatures to constructhomomorphic authenticators, so the TPA is able to auditthe integrity of shared data, yet cannot distinguish whois the signer on each block, which can achieve identityprivacy. To improve the efficiency of verification for mul-tiple auditing tasks, we further extend our mechanismto support batch auditing. An interesting problem in ourfuture work is how to efficiently audit the integrity ofshared data with dynamic groups while still preservingthe identity of the signer on each block from the thirdparty auditor.

    REFERENCES[1] M. Armbrust, A. Fox, R. Griffith, A. D.Joseph, R. H.Katz, A. Kon-

    winski, G. Lee, D. A. Patterson, A. Rabkin, I. Stoica, and M. Za-haria, A View of Cloud Computing, Communications of the ACM,vol. 53, no. 4, pp. 5058, Apirl 2010.

    [2] G. Ateniese, R. Burns, R. Curtmola, J. Herring, L. Kissner, Z. Peter-son, and D. Song, Provable Data Possession at Untrusted Stores,in Proc. ACM Conference on Computer and Communications Security(CCS), 2007, pp. 598610.

    [3] C. Wang, Q. Wang, K. Ren, and W. Lou, Privacy-PreservingPublic Auditing for Data Storage Security in Cloud Computing,in Proc. IEEE International Conference on Computer Communications(INFOCOM), 2010, pp. 525533.

    [4] R. L. Rivest, A. Shamir, and Y. Tauman, How to Leak a Secret,in Proc. International Conference on the Theory and Application ofCryptology and Information Security (ASIACRYPT). Springer-Verlag, 2001, pp. 552565.

    [5] D. Boneh, C. Gentry, B. Lynn, and H. Shacham, Aggregate andVerifiably Encrypted Signatures from Bilinear Maps, in Proc. In-ternational Conference on the Theory and Applications of CryptographicTechniques (EUROCRYPT). Springer-Verlag, 2003, pp. 416432.

    [6] H. Shacham and B. Waters, Compact Proofs of Retrievability,in Proc. International Conference on the Theory and Application ofCryptology and Information Security (ASIACRYPT). Springer-Verlag, 2008, pp. 90107.

    [7] Y. Zhu, H. Wang, Z. Hu, G.-J. Ahn, H. Hu, and S. S.Yau, DynamicAudit Services for Integrity Verification of Outsourced Storage inClouds, in Proc. ACM Symposium on Applied Computing (SAC),2011, pp. 15501557.

    [8] S. Yu, C. Wang, K. Ren, and W. Lou, Achieving Secure, Scalable,and Fine-grained Data Access Control in Cloud Computing, inProc. IEEE International Conference on Computer Communications(INFOCOM), 2010, pp. 534542.

    [9] D. Boneh, B. Lynn, and H. Shacham, Short Signature from theWeil Pairing, in Proc. International Conference on the Theory andApplication of Cryptology and Information Security (ASIACRYPT).Springer-Verlag, 2001, pp. 514532.

    [10] D. Boneh and D. M. Freeman, Homomorphic Signatures forPolynomial Functions, in Proc. International Conference on theTheory and Applications of Cryptographic Techniques (EUROCRYPT).Springer-Verlag, 2011, pp. 149168.

    [11] A. L. Ferrara, M. Green, S. Hohenberger, and M. . Pedersen,Practical Short Signature Batch Verification, in Proc. RSA Con-ference, the Cryptographers Track (CT-RSA). Springer-Verlag, 2009,pp. 309324.

    [12] V. Goyal, O. Pandey, A. Sahai, and B. Waters, Attribute-BasedEncryption for Fine-Grained Access Control of Encrypted Data,in Proc. ACM Conference on Computer and Communications Security(CCS), 2006, pp. 8998.

    [13] A. Juels and B. S. Kaliski, PORs: Proofs pf Retrievability for LargeFiles, in Proc. ACM Conference on Computer and CommunicationsSecurity (CCS), 2007, pp. 584597.

    [14] G. Ateniese, R. D. Pietro, L. V. Mancini, and G. Tsudik, Scalableand Efficient Provable Data Possession, in Proc. InternationalConference on Security and Privacy in Communication Networks(SecureComm), 2008.

    [15] C. Erway, A. Kupcu, C. Papamanthou, and R. Tamassia, DynamicProvable Data Possession, in Proc. ACM Conference on Computerand Communications Security (CCS), 2009, pp. 213222.

    [16] C. Wang, Q. Wang, K. Ren, and W. Lou, Ensuring Data StorageSecurity in Cloud Computing, in Proc. IEEE/ACM InternationalWorkshop on Quality of Service (IWQoS), 2009, pp. 19.

    [17] B. Chen, R. Curtmola, G. Ateniese, and R. Burns, Remote DataChecking for Network Coding-based Distributed Stroage Sys-tems, in Proc. ACM Cloud Computing Security Workshop (CCSW),2010, pp. 3142.

    [18] N. Cao, S. Yu, Z. Yang, W. Lou, and Y. T. Hou, LT Codes-based Secure and Reliable Cloud Storage Service, in Proc. IEEEInternational Conference on Computer Communications (INFOCOM),2012.

    [19] S. Halevi, D. Harnik, B. Pinkas, and A. Shulman-Peleg, Proofs ofOwnership in Remote Storage Systems, in Proc. ACM Conferenceon Computer and Communications Security (CCS), 2011, pp. 491500.

    [20] Q. Zheng and S. Xu, Secure and Efficient Proof of Storage withDeduplication, in Proc. ACM Conference on Data and ApplicationSecurity and Privacy (CODASPY), 2012.

    [21] M. Franz, P. Williams, B. Carbunar, S. Katzenbeisser, and R. Sion,Oblivious Outsourced Storage with Delegation, in Proc. Finan-cial Cryptography and Data Security Conference (FC), 2011, pp. 127140.

    [22] S. D. C. di Vimercati, S. Foresti, S. Paraboschi, G. Pelosi, andP. Samarati, Efficient and Private Access to Outsourced Data,

    56

  • WANG et al.: ORUTA: PRIVACY-PRESERVING PUBLIC AUDITING FOR SHARED DATA IN THE CLOUD 15

    in Proc. IEEE International Conference on Distributed ComputingSystems (ICDCS), 2011, pp. 710719.

    57