EncKV: An Encrypted Key-value Store with Rich Querieslibrary.usc.edu.ph/ACM/SIGSAC...

EncKV: An Encrypted Key-value Store with Rich Queries

Xingliang Yuan†‡, Yu Guo†, Xinyu Wang†‡, Cong Wang†‡, Baochun Li§, Xiaohua Jia†† Department of Computer Science, City University of Hong Kong, China‡ City University of Hong Kong Shenzhen Research Institute, China

§ Department of Electrical and Computer Engineering, University of Toronto, Canada{xl.y, congwang,csjia}@cityu.edu.hk, {y.guo, xy.w}@my.cityu.edu.hk, [email protected]

ABSTRACTDistributed data stores have been rapidly evolving to servethe needs of large-scale applications such as online gamingand real-time targeting. In particular, distributed key-valuestores have been widely adopted due to their superior per-formance. However, these systems do not guarantee to pro-vide strong protection of data confidentiality, and as a resultfall short of addressing serious privacy concerns raised frommassive data breaches.

In this paper, we introduce EncKV, an encrypted key-value store with secure rich query support. First, EncKVstores encrypted data records with multiple secondary at-tributes in the form of encrypted key-value pairs. Second,it leverages the latest practical primitives for searching overencrypted data, i.e., searchable symmetric encryption andorder-revealing encryption, and provides encrypted indexeswith guaranteed security to support exact-match and range-match queries via secondary attributes of data records. Third,it carefully integrates these indexes into a distributed indexframework to facilitate secure query processing in parallel.To mitigate recent inference attacks on encrypted databasesystems, EncKV protects the order information during rangequeries, and presents an interactive batch query mechanismto further hide the associations across data values on dif-ferent attributes. We implement an EncKV prototype on aRedis cluster, and conduct an extensive set of performanceevaluations on the Amazon EC2 public cloud platform. Ourresults show that EncKV effectively preserves the efficiencyand scalability of plaintext distributed key-value stores.

KeywordsEncrypted Key-value Store; Searchable Encryption; Order-revealing Encryption

1. INTRODUCTIONIn the last decade, a new group of distributed storage sys-

tems — also known as NoSQL data stores — are rapidlyevolving to handle data in large-scale applications, such as

Permission to make digital or hard copies of all or part of this work for personal orclassroom use is granted without fee provided that copies are not made or distributedfor profit or commercial advantage and that copies bear this notice and the full cita-tion on the first page. Copyrights for components of this work owned by others thanACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re-publish, to post on servers or to redistribute to lists, requires prior specific permissionand/or a fee. Request permissions from [email protected].

ASIA CCS ’17, April 02-06, 2017, Abu Dhabi, United Arab Emiratesc© 2017 ACM. ISBN 978-1-4503-4944-4/17/04. . . $15.00

DOI: http://dx.doi.org/10.1145/3052973.3052977

online gaming and product recommendation [18,32]. Amongothers, key-value (KV) stores are considered as one of themost popular types of distributed data stores, due to theirstrength in terms of both performance and scalability. Ex-emplary systems include Redis [29], DynamoDB [9], andRAMCloud [22]. Recent advances on KV stores have fur-ther leveraged secondary indexes to enrich their features,i.e., supporting rich queries via secondary attributes otherthan the primary key [11,15].

However, privacy concerns are becoming increasingly moreserious with large volumes of data stored in distributed KVstores, from the public clouds to private data warehouses,with recent incidents of massive data breaches [13]. In-deed, these KV stores do not provide strong protections ofdata confidentiality. Conventional mechanisms resort to ac-cess control that specifies the access scope at the user orgroup levels [7], or transparent server-side encryption thatasks the servers (not the data owners) to encrypt data [21].These mechanisms are not able to fully defend against seri-ous threats of stealing data.

Recent work that seeks to protect the data while preserv-ing the query functionality falls into two categories. In thefirst category, generic cryptographic primitives have beendesigned to enable specific query functions over encrypteddata, such as searchable encryption [4, 8, 14] for keywordsearch, and order-revealing encryption1 [1, 10, 19] for rangesearch. In the second category, encrypted database systemsthat utilized various primitives to support a wide range ofquery functions have been implemented. Representative sys-tems include CryptDB [27], BlindSeer [24], Arx [25], andSeabed [23]. Depending on the primitives adopted, theyhave different trade-offs on security, functionality, and ef-ficiency. Unfortunately, neither category is specifically tai-lored for distributed KV stores.

To bridge this gap, we start from a very recent encryptedKV store design [33] that preserves advantages of modernKV stores while initiating an encrypted local index frame-work for efficient queries via secondary attributes of data.The core idea of this framework is to co-locate the encrypteddata records and the corresponding encrypted indexes inthe same nodes, so as to avoid inter-node interaction duringquery protocols and facilitate parallel query processing. Un-fortunately, this initial framework serves only as a blueprint,as it did not present how to securely support different typesof queries over distributed encrypted data records.

1Early results [1] that only support numeric comparisons arealso called order-preserving encryption.

423

Our objectives in this paper are to address the followingtwo challenges. First, appropriate cryptographic primitivesneed to be carefully chosen to balance security and practi-cality, and further customized and integrated into the afore-mentioned index framework. The selected primitives shouldbe both sufficiently efficient for deployment scenarios andprovably secure with guaranteed strength. Meanwhile, theirintegration should not diminish the advantages of the localindex framework.

Second, leakage in our system should be minimized. Drivenby recent inference attacks [16, 20] and leakage-abuse at-tacks [3, 12], we follow the latest primitives with the “best-possible” security notion in the literature. Further, we notethat implementing rich queries in real systems might re-veal auxiliary information which is not considered in genericprimitives. For example, the associations between data val-ues on different attributes are sometimes exposed unneces-sarily, and have been shown to be exploitable, leading tocompromised data confidentiality [10, 20]. Therefore, ourproposed query protocols should protect such information.

In this paper, we propose EncKV, an encrypted key-valuestore with secure rich query support. We first classify com-mon queries of KV stores into two categories, i.e., exact-match queries (keyword search, equality test and counting)and range-match queries (range search and prefix match).To support these two types of queries, EncKV leveragestwo primitives respectively: searchable symmetric encryp-tion (SSE) [4, 8] and order-revealing encryption (ORE) [6,19]. For low latency queries, EncKV follows the guidelineof the encrypted local index framework given in [33]; thatis, the client needs to track the location of each data recordwhen it builds the local encrypted indexes2 that index thedata records on each node respectively.

For exact-match queries, EncKV carefully integrates Cashet al.’s elegant and efficient SSE scheme [4] into the localindex framework, and customizes it to support exact-matchqueries via encrypted single or multiple secondary attributesof data. As a result, EncKV’s encrypted local indexes holdthe security of SSE, and can readily be stored in any KVstore back end for easy deployment.

For range-match queries, EncKV utilizes Lewi and Wu’sORE scheme [19], which achieves the “best-possible” secu-rity notion for practical ORE. Like exact-match queries, thechosen ORE scheme is firstly customized and integrated intoEncKV’s index framework. Furthermore, we observe thatthe order information (i.e., “>” and “<”) is not necessarilyrevealed to for the correctness of range queries. Accordingly,EncKV enhances the ORE scheme by using randomized or-ders in ciphertext comparisons, and only allows the serverto know whether the query condition is matched.

Finally, EncKV provides an engineered approach to mit-igate the inference attacks. Our empirical observation isthat a certain query is commonly conducted in two phases.The first is to process the corresponding encrypted indexesto find matched primary keys (i.e., record IDs) on a givenquery condition, and then the second phase is to fetch thespecified data values associated. And provisioning the serveran ability to complete the two phases seamlessly will ex-pose data correlations across different attributes. Therefore,EncKV splits the two and introduces an interactive batchquery mechanism to reduce the leakage.

2In distributed data stores, “local index” implies that eachnode stores an index that only indexes its local records [15].

Figure 1: The architecture of EncKV.

EncKV has the following salient features:

• It stores data records with multiple attributes in theform of encrypted key-value pairs, and distributes themto the nodes via a standard data partitioning algo-rithm.

• It supports rich queries over distributed encrypted datarecords. The supported queries include keyword search,equality, count, join, range, like, sum, average, groupby, max, and min.

• It offers guaranteed security, i.e., SSE’s security no-tion for exact-match queries, and ORE [19]’s securitynotion for range-match queries.

• It preserves linear scalability of KV stores with respectto their performance. The query throughput increaseslinearly with the number of nodes in the cluster. Mean-while, it enables parallel query processing. The querylatency decreases when more nodes are deployed.

The paper is organized as follows. Section 2 presentsour system architecture and threat assumptions. Section 3presents our system design. Our security analysis is con-ducted in Section 4, and an extensive array of evaluationresults is shown in Section 5. Finally, we introduce relatedwork in Section 6, and conclude the paper in Section 7.

2. OVERVIEW

2.1 System ArchitectureThe system architecture of EncKV is shown in Figure 1,

containing two entities, the client and server node. EncKVserves the clients who wish to store their sensitive datarecords in KV stores. A cluster of server nodes can be leasedfrom the public cloud or be deployed at on-premise datacenters. These nodes store encrypted data records and pro-vide secure query functions to the clients. Correspondingly,EncKV implements two separate modules for the client andserver nodes. The client module performs data encryptionand decryption, encrypted index construction, as well asquery token generation. It maintains the client’s masterkey which is used to derive different private keys for thefunctions above. A node in the server module handles queryrequests from the client. It processes query tokens over theencrypted indexes, and utilizes the APIs of the underlyingKV stores to put/get encrypted data records.

To insert a data record to EncKV, the client uses its pri-vate key to generate encrypted label-value (LV) pair(s)3. Ifthe data is formated in the rich data model other than the

3We use the term“label” instead of“key”to avoid ambiguity.

424

simple key-value model, it will randomly be mapped into aset of encrypted LV pairs. This treatment allows EncKVto use the standard data partitioning algorithm (i.e., consis-tent hashing [9]) to distribute encrypted data records evenlyacross the nodes.

For the purpose of building local indexes, EncKV asksthe client to maintain a small-sized consistent hashing ringwhich indicates the label range associated with each node.As a result, the client can directly insert LV pairs to targetednodes and build the encrypted indexes for them on eachnode. To submit a secure query via secondary attributes ofdata, the client first generates a token set from the querycondition attribute, and then broadcasts the tokens to eachnode respectively. After that, each node processes the to-kens on its local index, and returns the matched, encryptedrecord IDs. Finally, the client decrypts the record IDs andgenerates labels to fetch the encrypted result values.

Remarks: (1) In our current implementation, EncKV doesnot add dummy records, while they can be inserted to mit-igate inference attacks and leakage-abuse attacks (see ourdiscussion in Section 4.3). (2) EncKV’s query protocols re-quire two rounds of interaction. The first is to obtain theencrypted record IDs, and the second is to fetch the matchedresults. This treatment facilitates an immediate security im-provement to hide the associations between data values ondifferent attributes (see more details in Section 3.5). (3)EncKV’s index framework needs the client to generate querytokens for all the nodes, and each node produces partial re-sults. Nevertheless, we show that this broadcast will notintroduce too much overhead (see our evaluation in Sec-tion 5.2). One salient advantage of local indexes is thatall nodes can process query tokens in parallel.

2.2 Threat AssumptionFollowing most of the prior studies on search over en-

crypted data [8, 19, 25, 27], EncKV considers the case thatthe client is secure and trusted. It will not expose the keys toserver nodes, and the keys are securely stored at the client.EncKV assumes that the attackers will never have access toa client’s private keys, but they can dump all the encryptedindexes and KV pairs from server nodes. They can alsomonitor the query protocols and learn about the query to-kens, accessed index entries, and encrypted results. EncKVdoes not consider the case where attackers can access thebackground information about the queries and datasets, e.g.,the partial (entire) distribution or the content of queries orrecords [3,20]. Nevertheless, discussions on how to mitigatethose threats are conducted in Section 4.3. Finally, EncKVdoes not handle the case where malicious attackers modifyor delete the indexes and records intentionally, which hasbeen addressed by orthogonal studies like [2].

2.3 Cryptographic PrimitivesA symmetric encryption scheme (KGen,Enc,Dec) con-

tains three algorithms: The key generation algorithm KGentakes a security parameter λ to return a secret key k. Theencryption algorithm Enc takes a key k and a value v ∈{0, 1}n to return a ciphertext v∗ ∈ {0, 1}n; The decryptionalgorithm Dec takes k and v∗ to return v.

Define a family of pseudo-random functions F : {0, 1}λ ×{0, 1}m → {0, 1}n, if for all probabilistic polynomial-time

distinguishers D, |Pr[DF (k,·) = 1|k $← {0, 1}λ] − Pr[Dg =

Figure 2: Construction summarized from [33]

1|g $← {Func[m,n]}]| < negl(λ), where negl(λ) is a negligi-ble function in λ.

3. THE ENCKV DESIGNThis section presents the designs of EncKV’s encrypted

exact-match and range-match indexes in detail, based onwhich several types of queries are then instantiated. Fea-tures such as batch queries and incremental updates are alsopresented for security and practical considerations.

3.1 The Underlying Encrypted KV StoreEncKV builds on top of an encrypted KV store [33]. This

prior design has two features. First, it proposes a secure datapartition algorithm that dispatches encrypted data recordsacross distributed nodes, while preserving horizontal scala-bility. Second, it sketches an encrypted local index frame-work towards efficient queries via secondary attributes ofdata in distributed data stores. EncKV’s index designs arecarefully integrated into this framework for the practical per-formance of secure rich queries. Before introducing EncKVin greater detail, we shall first summarize the underlyingencrypted KV store proposed in [33].

As an example, Figure 2 illustrates the column-orienteddata model in [33], and other data models are supportedas well. The core idea of its secure data partition algo-rithm is to map data records into encrypted label-valuepairs. Specifically, each LV pair in EncKV is constructedas 〈P (kl, C||R), Enc(kv, v)〉, where kl, kv are private keys,P is a secure PRF, R is the record ID, C is a column (sec-ondary) attribute, v is a value on C, and Enc is a symmetrickey encryption algorithm.

Different from [33] which uses P (kl, C||R) as the labelfor partition, EncKV uses the unique record ID R insteadto preserve the locality for queries via multiple attributes.As a result, all the encrypted values for a given record arestored at the same node, but they are still fully scrambledto protect the auxiliary information such as the number ofcolumns and the associations between the underlying values.

Regarding the encrypted local index framework, the clientis asked to maintain a consistent hashing ring so that it cantrace the locations of values and build encrypted indexesthat index the values stored on the same node. The benefitsare two-fold: 1) Inter-node interaction is avoided during thequery process, because if we directly adopt generic prim-itives, additional dedicated nodes are needed to store theencrypted global indexes. 2) Nodes can process the queriesin parallel. The query workload is also balanced.

425

Algorithm 1 Buildext: build exact-match indexes

Input: Private key ke; secure PRFs {G1, G2, H1, H2}; val-ues {v1, · · · , vm} on attribute C.

Output: Encrypted indexes {Iext1 , · · · , Iextn }.1: Initialize a hash table S to maintain counters;2: for vj ∈ {v1, · · · , vm} do3: i← route(R); // R is vj ’s ID, i ∈ {1, n} is node ID4: t1 ← G1(ke, C||vj ||i);5: t2 ← G2(ke, C||vj ||i);6: if S.find(i||j) = ⊥ then7: cji ← 0;8: else9: cji ← S.find(i||j);

10: end if11: α← H1(t1, c

ji );

12: β ← H2(t2, cji )⊕ Enc(kR, R);

13: cji + +;

14: S.put(i||j, cji );15: Iexti .put(α, β);16: end for

3.2 Exact-match Index and Query ProtocolIn this subsection, we will articulate the designs of EncKV’s

encrypted exact-match indexes and query protocols.

3.2.1 Encrypted Index DesignAs mentioned, the construction of EncKV’s encrypted in-

dexes for secure exact-match queries is inspired by a recentlyproposed (and elegant) SSE scheme [4]. This design uses KVpairs to index files that match the same keyword, and eachfile of this keyword is distinguished by a stateful counter.EncKV adopts this idea and indexes the record IDs thatmatch the same values on a certain column attribute. Tointegrate the design into the distributed local index frame-work, EncKV’s client tracks the values and maintains thecounters for each distinct value on different nodes duringthe index building procedure.

The detailed algorithm to index values {v1, · · · , vm} fora given column attribute C is shown in Algorithm 1. Thisprocedure is executed at the client. For each vj for j from 1to m, n counters are first initialized, where n is the numberof nodes. Then the client finds the target node i for vj basedon the position of its record ID R on the consistent hashingring. After that, it generates two tokens by embedding thevalue securely via secure PRF, i.e., t1 = G1(ke, C||vj ||i) andt2 = G2(ke, C||vj ||i), and further uses the correspondingcounter cji to generate the encrypted index entry, i.e., 〈α =

H1(t1, cji ), β = H2(t2, c

ji )⊕ Enc(kR, R)〉.

The encrypted index above holds the security notion ofSSE. The index size is known to the nodes. Without query-ing, no other information about the underlying content islearned. Note that the counters will not be used to generatequery tokens at the client, and thus can be dropped afterthe indexes are uploaded. If new records are incrementallyadded, those counters can be cached at the nodes in theirencrypted form to generate new index entries. More detailscan be found in Section 3.6.

3.2.2 Secure Query ProtocolThe corresponding query protocol following the index con-

struction is executed between the client and nodes as pre-

Algorithm 2 Queryext: secure exact-match query protocol

Input: Private key ke; query condition value v; query con-dition attribute Cv; result value attribute Cr.

Output: Encrypted matched results {vr}.Client.Token

1: for i ∈ {1, · · · , n} do2: t1 ← G1(ke, Cv||v||i);3: t2 ← G2(ke, Cv||v||i);4: Send (t1, t2) to node i;5: end for

Nodei.ExtQuery

1: ci ← 0;2: α← H1(t1, ci);3: while find(α) 6= ⊥ do4: β ← find(α);5: r ← Iexti .get(H2(t2, ci)⊕ β);6: ci + +;7: α← H1(t1, ci);8: Return r to client for decryption;

Client9: R← Dec(kR, r);

10: l← P (kl, Cr||R);11: Fetch vr via l;12: end while13: // Implementation note: all matched {r} are sent back

in a batch, and {vr} are also fetched in a batch.

sented in Algorithm 2. Given a query via two attributes,the client wants to find all the values {vr} in attributeCr on the matching condition such that another attributeCv’s value should exactly be the value v. First, the clientgenerates query tokens for each node {t1, t2}, where t1 =G1(ke, Cv||v||i) and t2 = G2(ke, Cv||v||i). Each node pro-cesses these tokens in parallel. In particular, each node in-crements a counter ci to locate all the matched index entriesvia H1(t1, ci) till no entry is returned, and each entry is un-masked via XORing H2(t2, ci) to get r the encrypted recordID. After that, all matched {r} are sent back to the clientfor decryption. For each decrypted ID R, the client gen-erates the corresponding label via P (kl, Cr||R) to fetch theencrypted result value.

During the query procedure, data values and attributesare strongly protected. Each node only learns the querytokens, accessed index entries, and encrypted result values.Due to the deterministic property of tokens, it also learnsthe repeated queries on the same attribute. Note that theproposed query protocol requires two rounds of interactionbetween the client and each node. Such treatment preventsthe server from generating labels of other values which arenever queried. Each node only learns the matched values as-sociated to the same column attribute, and it will not learnthe associations between values in different attributes of arecord, thereby effectively addressing inference attacks. For-mal security analysis will later be conducted in Section 4.1.

3.3 Range-match Index and Query ProtocolIn this subsection, we will present EncKV’s encrypted

range-match indexes and the corresponding query protocols.

426

Algorithm 3 Buildrng: build range-match indexes

Input: Private keys kr, ko; secure PRFs {G1, G2, H1, H3};values {v1, · · · , vm} on attribute C.

Output: Encrypted indexes {Irng1 , · · · , Irngn }.1: Initialize a hash table S to maintain counters;2: for vj ∈ {v1, · · · , vm} do3: i← route(R); // R is v’s ID, i ∈ {1, n} is node ID4: t1 ← G1(kr, C||i);5: t2 ← G2(kr, C||i);6: if S.find(i) = ⊥ then7: ci ← 0;8: else9: ci ← S.find(i);

10: end if11: α← H1(t1, ci);12: ctR ← OREenc(ko, vj , ci); // shown in Algorithm 413: β ← H3(t2, ci)⊕ (ctR||Enc(kR, R));14: ci + +;15: S.put(i, ci);16: Irngi .put(α, β);17: end for

3.3.1 Design RationaleIn the current literature, a challenge of enabling secure

range queries is how to design secure comparison schemeswith minimized leakage. Most of the existing ORE prim-itives leak more information (i.e., order information) thanSSE. Recent attacks show that order information can be ex-ploited to recover the underlying values of ciphertexts [10,12,20]. To address this issue, the desired ORE primitive mustachieve a strong security notion; that is, the ciphertextsshould be semantically-secure encryptions of their underly-ing values [19]. Then if the attackers obtain the encrypteddatabase, they can never learn any useful information.

However, only achieving the requirement above is not nec-essarily secure, because attackers still know order informa-tion during the queries. For example, attackers can learnwhich values are smaller than or greater than the queryvalue. Such information can be combined with partial knowl-edge of the query distribution and record distribution tocompromise the database [16].

To this end, we start from one of the latest ORE schemes [19]that defend against inference attacks, and it is also the mostefficient ORE scheme currently. As this scheme does notfully address the second issue, we further enhance it to pro-tect the order information. Here, we consider the order as“>” or “<”. Our observation is that secure range queries canstill be supported without leaving the order in cleartext. Thecore idea of our design is to hide the order in both query to-ken generation and ciphertext comparison. As a result, theonly known information is that the index entries match anencrypted order condition. The attackers will learn neitherthe order of the underlying values on a column attribute,nor whether two different queries are conducted in the sameorder condition.

3.3.2 Encrypted Index DesignFirst of all, the construction of encrypted range-match

indexes follows the same treatment as the encrypted exact-matched indexes. For security, each index entry should bestrongly encrypted, and the information on which entries as-

Algorithm 4 OREenc: enhanced ORE encryption

Input: Private key ko; secure PRFs {F1, F2, F3}; securePRP π; value v; counter c;

Output: ORE ciphertext ctR

1: Derive k1, k2, k3 from ko;2: Generate a nonce γ;3: for i ∈ {1, B} do4: for j ∈ {1, 2b} do5: j∗ ← π−1(F2(k2, v|i−1), j);6: if CMP (j∗, v|i) 6= 0 then7: si,j ← F3(k3, CMP (j∗, v|i))||C||j);8: zi,j ← Q1(si,j , c) +Q2(F1(k1, v|i−1||j), γ);9: else

10: zi,j ← ”equal” +Q2(F1(k1, v|i−1||j), γ);11: end if12: end for13: ctR|i ← zi,1, · · · , zi,2b ;14: end for15: ctR ← γ, ctR|1, · · · , ctR|B ;

Algorithm 5 OREcmp: ORE compare operation

Input: ORE query token ctL; ORE ciphertext ctR;Output: true or false.

1: γ, u′1, ..u′B ← ctR;

2: u1, .., uB ← ctL;3: for i ∈ {1, · · ·B} do4: xi, v|i, qi ← ui;5: zi,1, ..., zi,2b ← u′i;6: si ← zi,v|i −Q2(xi, γ);

7: if si 6= 0 and si = Q1(qi, c) then8: return true; // condition matched9: end if

10: end for11: return false;

sociated with the same column attribute should also be hid-den before querying. This objective can be achieved throughsearchable encryption techniques. For index and data local-ity, EncKV’s client is still required to track the locations ofdata values. Algorithm 3 presents the index building proce-dure. For each value vj on a column attribute C, the clientfirst locates the node where the record is stored, and thengenerates the encrypted index entry 〈α, β〉 by securely em-bedding C and the counter ci. Note that the underlyingcontent of β also contains the ORE ciphertext ctR whichis computed from our enhanced ORE scheme introduced inthe next paragraph. As a result, the encrypted range-matchindex is integrated into EncKV’s local index framework.

Enhanced ORE scheme: The idea of the ORE schemeproposed in [19] is to split a message into bit blocks withequal length, and conduct comparison from the significantleast blocks of two messages. For example, if the messagespace is 4 bits, the block size is 2 bits, each message willthen be encrypted into 2 blocks. Specifically, each blockhas total 22 possible values {00, 01, 10, 11}. The messageblock, say “10” to be encrypted, will be transformed to 4sub blocks, where the order information {>,>,=, <} to

427

Algorithm 6 Queryrng :secure range-match query protocol

Input: Private key kr, ko; query condition value v; ordercondition cmp ∈ {>,<}; query condition attribute Cv;result value attribute Cr.

Output: encrypted matched results {vr}.Client.Token

1: for i ∈ {1, · · · , n} do2: t1 ← G1(kr, Cv||i);3: t2 ← G2(kr, Cv||i);4: for i ∈ {1, B} do5: v|i ← π(F2(k2, v|i−1), v|i));6: qi ← F3(k3, cmp||C||v|i);7: ui ← F1(k1, v|i−1||v|i), v|i, qi;8: end for9: ctL ← (u1, · · · , uB);

10: Send (t1, t2, ctL) to node i;11: end for

Nodei.RngQuery

1: ci ← 0;2: α← H1(t1, ci);3: while find(α) 6= ⊥ do4: β ← find(α);5: r ← Irngi .get(H3(t2, ci)⊕ β);6: Parse r as rx ← Enc(kR, R), ry ← ctR;7: ci + +;8: // ORE compare operation shown in Algorithm 59: if OREcmp(ctL, ctR) = true then

10: Return rx to the client;11: end if12: α← H1(t1, ci);13: end while14: // Note: we ignore the steps to fetch final results, which

is the same in Line 7 to 10 in Algorithm 2.

each value above is securely embedded with its prefix block4.Here, the order cmp is defined as the output of the compar-ison CMP (m1,m2) for block m1 and m2. For more details,we refer the readers to [19].

We note that the original scheme reveals the order be-tween the query value and each ciphertext on the column.Such leakage tells partial order information between cipher-texts, i.e., some ciphertexts are smaller than or greater thanothers. Even the order is transformed as a pseudo-randomtag, such tags should be sent along the queries, which isexploitable if the attackers know the query distribution.

To minimize the leakage, we propose to protect the or-der by embedding it securely via PRF with the column at-tribute, the block index, and the stateful counter, as shownin Line 7 and Line 8 of Algorithm 4, i.e., our enhancedORE encryption algorithm. The sub block j in block iis encrypted as Q1(si,j , c) + Q2(F1(k1, v|i−1||j), γ), wheresi,j = F3(k3, CMP (j∗, v|i))||C||j), v|i is the block value,v|i−1 is the prefix block value, C is the column attribute,and c is the counter of this value on the column. j∗ is thesecurely permuted j in one of the possible values to thisblock, where j ∈ [1, 2b], b is the bit length of each block, andB is the number of blocks.

4The prefix block will be set as “null”, if the encrypted blockis the first block [19].

Figure 3: Our proposed ORE compare operation

This improved construction guarantees that the order ineach sub block is different, and the order conditions for dif-ferent values and attributes are also different. Due to thedeterministic property of PRF, the query comparison canstill correctly be performed via token matching, which willlater be introduced in the query protocol.

3.3.3 Secure Query ProtocolBased on the index construction, we present the range-

match query protocol in details in Algorithm 6. Given aquery via two attributes, the client wants to find all val-ues {vr} in attribute Cr on the matching condition suchthat another attribute Cv’s value should smaller than thevalue v. Similar to the exact-match query, the client gener-ates query tokens for each node {t1, t2} from Cv. For OREcomparison, the client needs to compute another token ctLwhich contains the encrypted blocks {u1, · · · , uB} with dis-tinct encrypted order condition qi of each block. Each nodeprocesses {t1, t2} in parallel, i.e., unmasking the correspond-ing ORE index entries via incremental counters. After that,each node calls the ORE compare operation OREcmp tocompare ctL and ctR in each entry above, as presented inAlgorithm 5. The process is conducted from the most sig-nificant block. Symmetric to the block encryption, the en-crypted order is obtained via si = zi,v|i − Q2(xi, γ), wherezi,v|i is the block of ctR, xi is the corresponding block of ctL,and γ is the nonce of this ciphertext. With the encryptedquery condition qi, Q1(qi, c) is computed via counter c tocheck whether it is matched to si. If matched, the encryptedrecord ID will be sent to the client to fetch the final resultvalues on attribute Cr.

Note that this design reveals the repeated queries, and theequality of query values and ciphertexts. It also indicates theposition of the first block in which two values differ, whichis the same to the adopted ORE scheme [19]. More detailedsecurity analysis will be given in Section 4.1. The query timecomplexity in the current treatment is O(mC), where mC isthe number of values on attribute C at a certain node. Theperformance can further be improved via binary search, i.e.,sorting values before encryption as indicated in [19].

3.4 Secure Rich Query InstantiationEncKV’s encrypted indexes readily enable rich queries

supported in existing NoSQL data stores [9,11]. These storesimplement SQL-like query language for easy data manage-

428

(a) Exact-match query (b) Range-match query

Figure 4: Secure query protocol illustration

ment. Accordingly, this section will use SQL-like query ex-amples to introduce how EncKV support those queries.

Keyword Search, Equality and Count queries: Givena keyword search or equality query“SELECT nameWHEREcity = LA ”in Figure 4(a), the client first generates a pairof tokens {t1, t2} for each node based on the keyword con-dition “city = LA”, where t1 = G1(ke, city||LA||i), t2 =G2(ke, city||LA||i), and i is the node ID. Then, each nodeprocesses these tokens in parallel. Specifically, all the matchedindex entries are located via H1(t1, ci) in node i, where ciis a counter. The encrypted record ID (Enc(kR, 01) andEnc(kR, 02)) after XORing H2(t2, ci) are returned. Finally,the client decrypts them and obtains the encrypted resultvalues via labels P (kl, name||01), P (kl, name||02). Regard-ing count queries, the client needs to count the values re-turned from each node, and aggregate the counts .

Range and Like queries: In terms of range-match queriessuch as “SELECT name WHERE age >20” in Figure 4(b).The client firstly generates tokens t1 = G1(kr, age||i) , t2 =G2(kr, age||i) with ORE ciphertext of ctL(20) containingrandomized “ >′′. Each node scans index entries viaH1(t1, ci)with incremented ci, and gets ctR by XORingH3(t2, ci) withentries. Then the node i returns the encrypted recored ID(Enc(kR, 02)) to the client as OREcmp(ctL(20), ctR) algo-rithm outputs true. Finally, the client fetches the resultvalue (E(kv, bob)) via label P (kl, name||02).

EncKV also supports LIKE (aka prefix) query. For in-stance, the query“SELECT City WHERE City LIKE ′A%′”obtains answers like “Argentina” or “Australia” and so on.Our adopted ORE scheme [19] supports comparison on bothnumeric numbers and alphanumeric strings. Recall thateach ORE ciphertext is encrypted by blocks, and previousblock content is embedded in the current block ciphertextfor prefix matching. During the comparison, the first dif-ferent blocks between ctL and ctR tell that their previousblocks are the same.

Join Queries: EncKV supports Join queries such that anyattributes of two tables can be joined together. Here, we de-fine a generic join query statement, FROM T1 JOIN T2 ONT1.C1 = T2.C2 WHERE field, where T1 and T2 are two ta-bles being joined, C1 and C2 are attributes of T1 and T2, andfield is a join condition such as an exact-match or range-match operation. For example, given a query “SELECT

T2.GPA from T1 JOIN T2 on T1.id=T2.id where T1.age >20”, the client parses the query and performs“SELECT T1.idfrom T1 WHERE T1.age > 20” via the range-match indexes,which derive the matched record ID set R. Then, the clientgenerates label P (kl, T2.GPA||Ri)Ri∈R to get final results.

Sum and Average queries: Following the treatment inprior encrypted databases, nodes in EncKV can perform ag-gregation on encrypted data values by using addictive homo-morphic encryption (HOM) scheme [27]. Values to be aggre-gated are encrypted via a certain HOM encryption scheme,i.e., 〈P (kl, C||R||HOM), HEnc(kv, v)〉 When the client is-sues a query in the form “SELECT SUM(score) FROMScore WHERE age > 20”, it firstly queries the matchedrecord ID set R via the range-match indexes from “SE-LECT stu ID FROM Score WHERE age > 20”. Theneach node locates and aggregates the HOM ciphertext vialabel {P (kl, score||Ri||HOM)}Ri∈R. This procedure can bedone in parallel under the local index framework. Finally,each node returns intermediate results to the client for ag-gregation. And the average value can further be computedat the client.

Group By queries: EncKV performs Group By queries viacombining exact-match queries with aggregation computa-tion. Suppose the client issues a group by request as “SE-LECT city, sum(age) GROUP BY city”. It firstly finds thespecific record ID set R for each group such as “LA” via theexact-match query “SELECT stu ID where city=LA”. Here,we assume that the client knows all the distinct city names.Then it can use the HOM label {P (kl, age||Rj ||HOM)}Rj∈Rto ask the corresponding nodes to compute the aggregationof age values which associate “LA”. Aggregation results arealso finalized at the client just like Sum queries. For othergroup entries, it follows the same treatment.

Max and Min queries: To support Max and Min queries,the client inserts a specific LV pairs for the MAX/MIN datavalues on a column attribute. For example, if the maximumvalue of the attribute “age” is “100”, the client generatesLV pair: 〈P (kl, age||MAX), E(kv, 100)〉. When the clientwants to query the maximum data, it computes the labelP (kl, age||MAX) to get the maximum value. Obtaining theminimum value can be realized in the same way.

429

3.5 Batch QueriesEncKV’s query protocols across different attributes are

conducted in two phases. The first is to find the encryptedrecord IDs that matches the query condition on a specificquery attribute, and the second phase is to fetch the val-ues of these matched records on a targeted attribute. Asmentioned in Section 3.2.2, EncKV implements those twophases in two rounds respectively to hide the associations ofencrypted KV pairs in same records. However, we observethat such treatment still might allow the nodes to knowthe associations between index entries and result values (seeanalysis in Section 4).

To reduce the above leakage, an engineered approach is toconduct queries in a batched manner. For a batch of queries,the client first parses the query conditions in a way that theoverlapped or repeated query conditions will be queries onlyonce in this batch. After receiving the encrypted result IDs,the client decrypts them and generates the labels from dis-tinct IDs and the targeted attributes. In addition, the clientpermutes those labels and fetches final result values from thecorresponding nodes. We note that such improvement canbe realized via a dedicated query planner which is also usedfor improving performance in [25, 31]. Based on the aboveapproach, the associations between values and index entrieson same records can be well protected.

3.6 Secure Update OperationsEncKV provides two ways of update operations when new

data records are added, i.e., bulk update and incremental up-date. Bulk update is suitable for the case of adding a largenumber of records, i.e., migrating an unencrypted databaseto EncKV. Encrypted exact-match and range-match indexescan be built via their index building functions introduced inAlgorithm 1 and Algorithm 3 respectively. Yet, we notethat building new indexes require the client to generate ad-ditional tokens for new indexes. Thus, periodical index con-silidations are demanded to preserve the complexity of tokengeneration.

Incremental update is suitable for the case when datarecords are occasionally inserted into EncKV. As a result,new index entries need to be added to existing indexes.To implement incremental update, the state information(i.e., counters) on each indexed attribute should carefullybe maintained either at the client or at the nodes in the en-crypted form so that the client can generate the correspond-ing index entries without affecting the following queries. Wenote that this operation will leak additional information justlike most of the prior dynamic SSE schemes [4, 14]. Oncethe attributes are queried, the nodes will know whetherthe newly inserted index entires are associated with thoseattributes. One recent scheme addresses this issue via to-ken permutation [2], and we will integrate this technique toEncKV in future.

Currently, we consider the single-client setting. The multi-client setting will be addressed as future work. In additionto access control, concurrent query operations will also beinvestigated when encrypted index entries and data recordsare accessed from different clients simultaneously.

4. SECURITY ANALYSISThis section will analyze EncKV’s security strength. The

values of different attributes in each record are stored as

encrypted LV pairs. The encrypted underlying data stor-age defends against offline inference attacks [20], becauseauxiliary information such as schema information is com-pletely hidden even if the attacker obtains the entire en-crypted database. Our security analysis focuses on our pro-posed secure exact-match queries and secure range-matchqueries. We will follow the primitives we adopted, SSE [4]and ORE [19], to quantify the security guarantees of EncKV’squery protocols respectively. In addition, we will discussEncKV on the protection against a series of recent attackson search over encrypted data.

4.1 Security on Exact-match QueriesSince EncKV’s secure exact-match queries are realized in

the framework of SSE [8], the nodes only learn the controlledleakage, but never learn the underlying contents of queriesand results. Basically, the index size will be learned oncethe index is uploaded to the server. Search and access pat-tern will be learned along the queries, where search patternindicates the repeated queries, and access pattern indicatesthe accessed ciphertexts. In our targeted queries which con-tain multiple query attributes, and thus access pattern alsoincludes the associations between values of those attributes.Following the notion of SSE, we first define the leakage func-tions in EncKV as follows:

Lext1 (C) = ({mi}n, 〈|α|, |β|〉)

where C is the set of secondary attributes, mi is the sizeof local index Iexti of node i, n is the number of nodes, and|α|, |β| are the lengths of label and value in the index entry.

Lext2 (vC , Cv, Cr) = ({ti1, ti2}n, {{〈α, β〉, 〈l, v∗〉}ci}n)

where vC is the query value, Cv is the attribute of vC , Cr isthe attribute of result values, and {ti1, ti2}n are tokens for nnodes respectively. Given a query, the matched index entriesand results {〈α, β〉, 〈l, v∗〉}ci at each node are known.

Lext3 (Q) = (Mq×q, Tv∗→α)

where Q is q number of adaptive queries, and Mq×q is asymmetric bit matrix to trace the same queries. Mi,j andMj,i are equal to 1 if ti1 = tj1 for i, j ∈ [1, q]. Otherwise,they are equal to 0. Tv∗→α is an inverted list that tracesindex entries that match each result value, which is alsoknown as inference information [20]. For each posting listv∗|{α1, · · · , αa} in T , the associations between the queriedindex entries of different attributes and each queried resultvalue are learned.

In terms of the quantified leakage, we present the secu-rity definition of exact-match queries in Definition 1 in Ap-pendix, and give the following theorem.

Theorem 1. Ext is adaptively secure with (Lext1 , Lext2 ,Lext3 ) under the random-oracle model if G1, G2, H1, H2, Pare secure PRF.

We leave the detailed proof in Appendix.The security notion of EncKV’s exact-match queries is

stronger than deterministic encryption (DET), which is usedin several existing encrypted databases [27]. DET-based de-signs expose the server all the same values on an attribute,while EncKV will not tell that information. For other aux-iliary information, associations between values across at-tributes (aka inter-column and intra-column associations)are directly exposed in existing encrypted databases with

430

legacy compatibility, while such information in EncKV willgreatly be reduced. On the one hand, the attribute is se-cretly embedded in the encrypted index, the server will neverlearn whether the tokens of different query values on thesame attribute or not. On the other hand, EncKV’s batchquery mechanism further hides the associations of columns.

4.2 Security on Range-match QueriesEncKV’s secure range-match queries are designed on top

of the ORE scheme proposed in [19]. Therefore, the secu-rity achieves the same level as the scheme in [19]. That isbriefly, the ciphertexts are semantically secure, and the firstdifferent block that differs between two values in the com-parison. To achieve general protection and integrate the in-dexes into the local index framework, EncKV leverages SSEtechniques as an overlay to mask ORE ciphertexts. Similarto exact-match queries, inference information will also belearned since queries may involve multiple query attributes.Accordingly, we define the leakage functions as follows:

Lrng1 (C) = ({mi}n, 〈|α|, |β|〉)

where C is the set of secondary attributes, mi is the size oflocal index Irngi of node i, n is the number of nodes, and|α|, |β| are the lengths of label and value in the index entry.

Lrng2 (vC , Cv, Cr) = ({ti1, ti2}n, ctL, {{〈α, β〉, 〈l, v〉}ci}n)

where vC is the query value, Cv is the query attribute, Cr isthe attribute of result value , ctL is the token for ORE com-parison, and {ti1, ti2}n are tokens for n nodes respectively.Given a query, the matched index entries and result pairs{〈α, β〉, 〈l, v∗〉}ci at each server node are known. In addi-tion, the rest of index entries on this column will also belearned.

Lrng3 (vC , cmp) = ({{bdif}ci}n)

where bdif is the first block that differs in the comparison ofmatched ORE ciphertexts.

Lrng4 (Q) = (Mq×q, Tv∗→α)

where Q is q number of queries, and Mq×q is a symmetric bitmatrix to trace the same queries. Mi,j and Mj,i are equalto 1 if ti1 = tj1 for i, j ∈ [1, q]. Otherwise, they are equal to0. Tv∗→α is an inverted list that indicates the associationsbetween the index entries of different attributes and the re-sult values as defined in exact-match queries. Accordingly,we present the security definition of range-match queries inDefinition 2 in Appendix and give the theorem below.

Theorem 2. Rng is non-adaptively secure with (Lrng1 ,Lrng2 , Lrng3 , Lrng4 ) if G1, G2, H1, H3, P, F1, F2, F3 are se-cure PRF.

We give the proof in Appendix.Our enhanced ORE scheme protects the order information

in queries and ciphertexts along the comparison. Recall thatin line 6 of Algorithm 6, the query order is protected in theORE query token, i.e., qi = F2(k3, cmp||C||v|i), where cmpis the order, C is the query attribute, and v|i is the ithblock of query value. As a result, different query valuesor attributes will result in different query tokens. Then qiis used by the server node to compute Q1(qi, c) in line 7of Algorithm 5. If the output is matched with si in theciphertext, this entry will be considered to be matched.

4.3 Further DiscussionRemark on inference attacks: Recent attacks [16, 20]use auxiliary information to compromise the confidentialityof encrypted databases. In [20], reference databases, tablestructures, order information, and frequency statistics areexploited to attack databases in property-preserving encryp-tion, i.e., order-preserving encryption (OPE) and determin-istic encryption (DET).

We note that EncKV effectively defends against these at-tacks. Our exact-match and range-match indexes are builtvia SSE [8] and ORE [19], which are semantically secure.Exact-match queries will not reveal matched values withoutquerying, and range-match queries via our enhanced OREscheme will not even tell the order information. In addition,the associations between data values are also protected inthe proposed batch query mechanism. Again, note that onemay always add dummy records to improve security.

In [16], a generic attack is proposed, which only relies onprior knowledge on query distribution. The attackers canrecover the database without knowing the orders of queryresults. This attack samples queries and observes the sizeof results for each to find the order of ciphertext in highprobability. We are aware that none of existing encrypteddatabases design can address this attack. Nevertheless, weargue that attackers need to make more efforts in EncKV,i.e., compromising all EncKV’s nodes to obtain the full sizeof results in each query. Otherwise, this attack cannot belaunched in the first place.

Remark on leakage-abuse attacks: In [3], several at-tacks on SSE are proposed by abusing search and accesspatterns. Despite their effectiveness, these attacks heavilyrely on the amount of prior information about the databasesand queries. Besides, as shown in their countermeasure eval-uation, random padding largely reduces the reconstructionratio of queries and data. A very recent attack is proposedto compromise SSE schemes that support legacy applica-tions [28]. The targeted schemes leak more information thanschemes that build an encrypted index with minimized leak-age, and thus that attack is not applicable to EncKV.

Attacks on ORE schemes are also proposed in [10, 12].However, we note that those attacks target on specific schemeswith specific leaky information [1,6,17], i.e, orders of under-lying ciphertexts or orders of some bit of ciphertexts, whichwill not be learned in our adopted ORE scheme. Therefore,they are not applicable to EncKV. Besides, they require aux-iliary information such as inter-column correlations, whichcan be protected in EncKV’s batch query mechanism. Andintra-column correlations are also somehow protected, be-cause query tokens of the same query attributes are not thesame for different nodes. If the attackers only get the par-tial views from some of the server nodes, those attacks arefurther mitigated effectively. In the future, we will analyzeEncKV’s strength on defending those attacks empirically.

5. EXPERIMENTAL EVALUATION

5.1 Prototype ImplementationTo assess the performance of EncKV, we implement a pro-

totype and deploy it to Amazon Web Services. We create 4AWS M4-xlarge instances as the clients, and a Redis (v3.2.0)cluster that consists of 9 AWS M4-xlarge instances as thenodes to store encrypted indexes and records. Each instance

431

1000 2000 4000 8000 160000

0.003

0.006

0.009

0.012

0.015

Number of data values

Tim

e c

ost

(s)

Ext index

(a) Ext index build

1000 2000 4000 8000 160000

0.3

0.6

0.9

1.2

1.5

Number of data values

Tim

e co

st (

s)

Rng index

(b) Rng index build

12 16 20 24 28 32 360

60K

120K

180K

240K

300K

Number of cores

En

trie

s /s

Ext

(c) Ext query throughput

12 16 20 24 28 32 360

60K

120K

180K

240K

300K

Number of cores

En

trie

s /s

Rng

(d) Rng query throughput

12 16 20 24 28 32 360

11

22

33

44

55

Number of cores

Tim

e c

ost

(s)

8K

16K

32K

(e) Ext query latency

2000 4000 8000 16000 320000

7

14

21

28

35

Number of result values

Tim

e c

ost

(s)

Ext

YWWQL16

(f) Ext query comparison

12 16 20 24 28 32 360

11

22

33

44

55

Number of cores

Tim

e c

ost

(s)

8K

16K

32K

(g) Rng query latency

1000 2000 4000 8000 160000

3

6

9

12

15

Number of newly added values

Inse

rtio

n t

ime

(s)

No index

Rng index

Ext index

(h) Insertion latency

Figure 5: Performance

# Indexed values 400K 600K 800KSize (GB) 0.012 0.018 0.024

(a) Encrypted exact-match index# Indexed values 400K 600K 800K

Size (2bit block) (GB) 0.209 0.313 0.417Size (4bit block) (GB) 0.399 0.599 0.799Size (8bit block) (GB) 3.070 4.604 6.139

(b) Encrypted range-match index

Table 1: Encrypted index space consumption

is assigned with 4 vcores (2.4 GHz Intel Xeonr E5-2676v3 CPU ), 16GB RAM and 40GB SSD, and Ubuntu server14.04 are installed. EncKV’s prototype utilizes Apache Thrift(v0.9.2) to implement the remote procedure call (RPC).

EncKV uses OpenSSL (v1.01f) for the implementation ofcryptographic building blocks. Secure PRF is implementedvia AES cipher (128 bits). The enhanced ORE scheme isimplemented on top of the implementation5 of the OREscheme [19]. In this evaluation, we set 8 bits as the block sizefor ORE encryption. In the future, more parameter settingswill be evaluated. The encrypted exact-match and range-match indexes are integrated into the implementation6 ofthe distributed encrypted index framework [33]. In total,EncKV contains about 10144 lines of C++ code.

5.2 Performance EvaluationThe evaluation on EncKV mainly focuses on the encrypted

index and query performance.Index evaluation: We first report the index space con-sumption in Table 1. For the encrypted exact-match index,the size of each entry 〈α, β〉 is 256 bits, where α and β are128-bit long. Table 1(a) shows that the index size increaseslinearly from 0.012 GB (400K indexed values) to 0.024 GB(800K indexed values). For the encrypted range-match in-dex, each entry also needs to store ORE ciphertext ctR forcomparison. As an ORE ciphertext is encrypted by blocks,the size of ctR depends on the length of block b. And each

5An implementation of order-revealing encryption: onlineat https://github.com/kevinlewi/fastore.6An encrypted, distributed, and searchable key-value store:online at https://github.com/CongGroup/BlindDB.

block ciphertext contains 2b sub blocks, where each is 64bit-long (truncated from AES cipher output). With α, β, a128-bit nonce γ, and ctR, the size of an entry for a 32-bitvalue is 128 + 128 + 64× 2b× 32/b+ 128 bits. As mentionedin [19], there is a tradeoff in security and space. The largerblock size has stronger security while introducing more spacecost, which is also shown in Table 1(b).

Figure 5(a) and Figure 5(b) measure the index buildingtime at the client. Both time cost increases linearly withthe number of indexed values. The range-match index takesmore time because it needs to generate ORE ciphertexts inaddition to masking the encrypted record IDs.Query evaluation: To evaluate the scalability of EncKV,we evaluate the query throughput in Figure 5(c) and Fig-ure 5(d). The result shows that the total number of in-dex entries processed per second in both indexes increaseswith the number of cores. Specifically, we can find thatthe throughput of exact-match queries achieves up to 255Kentries per second in 9 nodes, while the throughput of range-match queries is lower, 213K entries per second. The over-head comes from the cost of PRP and PRF operations incompared blocks during the ORE comparison. The resultsconfirm that EncKV performs satisfactorily at scale.

To gain a deeper understanding on the query performanceof EncKV, we evaluate the latency for exact-match andrange-match queries, respectively. In Figure 5(e), we canfind that as the number of nodes increases, the latency ofexact-match queries that return a fixed number of results isreduced dramatically in similar proportions. The latency ofexact-match queries with 32 cores is roughly half of the la-tency with 16 cores for returning 32K matched encryptedvalues. Likewise, Figure 5(g) shows that the latency ofrange-match queries follows a similar downward trend asthe number of cores increases. When the number of com-pared ciphertexts is 32K, the query latency with 36 cores isaround 17s which is almost one-third of the latency with 12cores. Thus, we can confirm that EncKV benefits from theencrypted local index framework and can effectively handlequeries in parallel.

Figure 5(f) compares the exact-match query performancewith the scheme proposed in [33] denoted as YWWQL16,

432

4000 8000 12000 16000 200000

0.4%

0.8%

1.2%

1.6%

2.0%


Rati

o(%

)

50 nodes

30 nodes

10 nodes

(a) Exact-match query

4000 8000 12000 16000 200000

4%

8%

12%

16%

20%


Rati

o(%

)

50 nodes

30 nodes

10 nodes

(b) Range-match query

Figure 6: Query token bandwidth overhead

which conducts token matching by enumerating all values ona column. Here, we pre-insert around 70K records for bothdesigns. As seen, the query time of our design only scalesin the number of matched results, while YWWQL16 scansthe entire column no matter how many values are matched.Specifically, the number of matched values from 2K to 32Kincreases the query time from approximately 0.92s to 14.4s.In contrast, the query time of YWWQL16 is a constant-timeoperation, about 32.7s.

Recall that EncKV also supports incremental updates fornewly added records. We accordingly evaluate the cost forindex entry insertion in Figure 5(h). The comparison re-sult shows that inserting new records into both encryptedindexes will not introduce too much overhead compared tothe case without indexing. Note that such cost includes net-work transmission for each new indexed value, and thus itis much higher than the index building cost (bulk update).For 1000 values, it takes 0.686s and 0.875s to index entriesfor exact-match and range-match indexes respectively.

The adopted local index framework requires the client togenerate query tokens for each node. To understand thebandwidth overhead, we show the ratio between the querytoken size and the result size in Figure 6. The result fromFigure 6(a) indicates that the ratio of exact-match query de-creases gradually with the increased size of results. Specif-ically, the ratio for 50 nodes drops from about 1.25% toapproximately 0.25% when the number of retrieved resultvalues increases from 4K to 20K. On the other hand, wecan find that the increasing number of nodes can render arise in the bandwidth. The ratio of 8K result size increasesfrom around 0.13% to about 0.63% when the number ofnodes rises from 10 to 50. Nevertheless, the bandwidth over-head of query tokens is negligible to the size of results. Therange-match queries follow a similar trend as provided inFigure 6(b), but the corresponding ratio is higher than theexact-match queries. The reason is that the range query to-ken contains an additional ORE ciphertext ctL to performORE comparison, which enlarges the size of the query token.As shown, the ratio reaches 10.3% for 50 nodes to return 4Kresults.

6. RELATED WORKEncrypted database systems: To enable rich queriesover encrypted data records, a line of encrypted databasesystems are implemented [24, 25, 27] (just to list a few).The first fully functional system is CryptDB [27], whichuses property-preserving encryption (PPE) schemes and adedicated query planner [31] to support legacy SQL queries.Although CryptDB applies randomized encryption on topof PPE ciphertexts, the security strength after queries isreduced to the security of PPE such as DET and OPE.

BuildSeer [24] devises secure rich query protocols based onsecure multi-party computation, which requires an assump-tion on non-colluded index and data servers. Arx [25] is pro-posed to minimize the leakage in range queries. It designsa scheme based on Yao’s garbled circuits, where tree (rangeindex) nodes are compared with the encoded query valuevia circuit evaluation. Because circuits cannot be reused forsecurity, the circuits on tree nodes need to be updated everytime they are accessed. This might introduce considerablebandwidth overhead, and limit the potential throughput ofrange queries. Very recently, an encrypted data analyticssystem called Seabed [23] is designed. It leverages a sym-metric key based homomorphic encryption scheme for fastaggregation over encrypted data records. Additionally, it de-signs a customized schema to partition one column into mul-tiple columns to address inference attacks. However, thatcustomization requires prior knowledge on query and datadistribution for partition, and thus it is hardly dynamic.Also, it introduces large overhead from random padding,e.g., a storage overhead of about 10×.Searchable Symmetric Encryption: Another line of re-lated work [4,8,14,30] (just to list a few) is a cryptographicprimitive for keyword search over encrypted data, i.e., search-able symmetric encryption (SSE). In [8], the security notionof SSE is formalized. And later in [14], the notion of dy-namic SSE is further formalized. In [4], encrypted docu-ments are encrypted into keyword and document ID pairs,and packing mechanisms are proposed to improve the readefficiency when the index is too large to be in memory. Otherschemes like [30] considers the multi-client setting of SSE.The scheme in [30] proposed for boolean queries makes ex-isting SSE multi-client query protocols non-interactive so asto reduce the communication overhead.Order-revealing Encryption: Order-revealing encryption(ORE) is the primitive to enable comparison on ciphertextsfor secure range queries [1,5,6,17,19,26] (just to list a few).The early result [1], also known as order-preserving encryp-tion (OPE), only supports numeric comparison, and theorders are directly learned from ciphertexts. To improvethe security, new OPE schemes are designed [17, 26], butthese schemes introduce multiple rounds of interaction (i.e.,O(logN), N is the number of indexed values) for each query.The first ORE scheme is proposed in [6], while this schemeleaks the first bit where two messages differ. Very recently,a new ORE scheme is proposed in [19], where the cipher-texts achieve semantic security. The comparison only showsthe first different block. Concurrently, another secure OREscheme is introduced in [5] based on pairings.

7. CONCLUSIONThis paper introduces a functionally rich key-value store,

called EncKV, with guaranteed data protection. EncKVstores encrypted data records with multiple secondary at-tributes in the form of encrypted key-value pairs. It lever-ages the latest cryptographic primitives (i.e., SSE and ORE)to design encrypted indexes for exact-match and range-matchqueries, respectively. For practical query performance, EncKVintegrates those indexes into an encrypted local index frame-work so that each node can process queries in parallel. Aformal security analysis is given to quantify the securitystrength of the proposed secure query protocols. EncKV’sprototype is deployed on a Redis cluster, and our evaluationon Amazon AWS demonstrates its efficiency.

433

AcknowledgmentThis work was supported in part by the Research GrantsCouncil of Hong Kong (Project CityU 11276816 and CityU11205014), the Natural Science Foundation of China (ProjectNo. 61572412), Innovation and Technology Commission ofHong Kong under ITF Project ITS/307/15, and an AWS inEducation Research Grant award. The authors would like tothank David J. Wu for providing us their preliminary OREcode in the early stage of EncKV’s implementation.

8. REFERENCES

[1] A. Boldyreva, N. Chenette, Y. Lee, and A. O’Neill.Order-Preserving Symmetric Encryption. InProc. EUROCRYPT, 2009.

[2] R. Bost, P.-A. Fouque, and D. Pointcheval. VerifiableDynamic Symmetric Searchable Encryption:Optimality and Forward Security. Cryptology ePrintArchive, Report 2016/062, 2016.

[3] D. Cash, P. Grubbs, J. Perry, and T. Ristenpart.Leakage-Abuse Attacks against SearchableEncryption. In Proc. ACM CCS, 2015.

[4] D. Cash, J. Jaeger, S. Jarecki, C. Jutla, H. Krawczyk,M.-C. Rosu, and M. Steiner. Dynamic SearchableEncryption in Very Large Databases: Data Structuresand Implementation. In Proc. NDSS, 2014.

[5] D. Cash, F.-H. Liu, A. O’Neill, and C. Zhang.Reducing the Leakage in Practical Order-RevealingEncryption. Cryptology ePrint Archive, Report2016/661, 2016.

[6] N. Chenette, K. Lewi, S. A. Weis, and D. J. Wu.Practical Order-Revealing Encryption with LimitedLeakage. In Proc. FSE, 2016.

[7] C. Coronel, S. Morris, and P. Rob. Database Systems:Design, Implementation, and Management. CengageLearning, 2009.

[8] R. Curtmola, J. A. Garay, S. Kamara, andR. Ostrovsky. Searchable symmetric encryption:Improved definitions and efficient constructions.Journal of Computer Security, 19(5):895–934, 2011.

[9] G. DeCandia, D. Hastorun, M. Jampani,G. Kakulapati, A. Lakshman, A. Pilchin,S. Sivasubramanian, P. Vosshall, and W. Vogels.Dynamo: Amazon’s Highly Available Key-ValueStore. ACM SIGOPS Operating Systems Review,41(6):205–220, 2007.

[10] F. B. Durak, T. M. DuBuisson, and D. Cash. Whatelse is revealed by order-revealing encryption? InProc. ACM CCS, 2016.

[11] FoundationDB. FoundationDB: Data Modeling.Online at http://www.odbms.org/wp-content/uploads/2013/11/data-modeling.pdf, 2013.

[12] P. Grubbs, K. Sekniqi, V. Bindschaedler, M. Naveed,and T. Ristenpart. Leakage-Abuse Attacks againstOrder-Revealing Encryption. Cryptology ePrintArchive, Report 2016/895, 2016.

[13] Information is Beautiful. World’s Biggest DataBreaches. Online athttp://www.informationisbeautiful.net/visualizations/worlds-biggest-data-breaches-hacks/, 2016.

[14] S. Kamara, C. Papamanthou, and T. Roeder.Dynamic searchable symmetric encryption. InProc. ACM CCS, 2012.

[15] A. Kejriwal, A. Gopalan, A. Gupta, Z. Jia, S. Yang,and J. Ousterhout. Slik: Scalable low-latency indexesfor a key-value store. In Proc. USENIX ATC, 2016.

[16] G. Kellaris, G. Kollios, K. Nissim, and A. O’Neill.Generic Attacks on Secure Outsourced Databases.Proc. ACM CCS, 2016.

[17] F. Kerschbaum. Frequency-hiding order-preservingencryption. In Proc. ACM CCS, 2015.

[18] N. Leavitt. Will NoSQL Databases Live Up to TheirPromise? IEEE Computer, 43(2):12–14, 2010.

[19] K. Lewi and D. J. Wu. Order-Revealing Encryption:New Constructions, Applications, and Lower Bounds.In Proc. ACM CCS, 2016.

[20] M. Naveed, S. Kamara, and C. V. Wright. InferenceAttacks on Property-Preserving Encrypted Databases.In Proc. ACM CCS, 2015.

[21] Oracle. Transparent Data Encryption. Online athttp://www.oracle.com/technetwork/database/options/advanced-security/index-099011.html/, 2016.

[22] J. Ousterhout, A. Gopalan, A. Gupta, A. Kejriwal,C. Lee, B. Montazeri, D. Ongaro, S. J. Park, H. Qin,M. Rosenblum, et al. The RAMCloud Storage System.ACM TOCS, 33(3):7, 2015.

[23] A. Papadimitriou, R. Bhagwan, N. Chandran,R. Ramjee, A. Haeberlen, H. Singh, A. Modi, andS. Badrinarayanan. Big Data Analytics overEncrypted Datasets with Seabed. In Proc. USENIXOSDI, 2016.

[24] V. Pappas, B. Vo, F. Krell, S. Choi, V. Kolesnikov,A. Keromytis, and T. Malkin. Blind Seer: A ScalablePrivate DBMS. In Proc. IEEE S&P, 2014.

[25] R. Poddar, T. Boelter, and R. A. Popa. Arx: Astrongly encrypted database system. CryptologyePrint Archive, Report 2016/591, 2016.

[26] R. A. Popa, F. H. Li, and N. Zeldovich. Anideal-security protocol for order-preserving encoding.In Proc. IEEE S& P, 2013.

[27] R. A. Popa, C. Redfield, N. Zeldovich, andH. Balakrishnan. CryptDB: protecting confidentialitywith encrypted query processing. In Proc. ACMSOSP, 2011.

[28] D. Pouliot and C. V. Wright. The shadow nemesis:Inference attacks on efficiently deployable, efficientlysearchable encryption. In Proc. ACM CCS, 2016.

[29] Redis. An advanced key-value cache and store. Onlineat http://redis.io/, 2015.

[30] S.-F. Sun, J. K. Liu, A. Sakzad, R. Steinfeld, andT. H. Yuen. An efficient non-interactive multi-clientsearchable encryption with support for booleanqueries. In Proc. ESORICS, 2016.

[31] S. Tu, M. F. Kaashoek, S. Madden, and N. Zeldovich.Processing analytical queries over encrypted data. InProc. VLDB, 2013.

[32] K.-Y. Whang. NoSQL vs. Parallel DBMS forLarge-scale Data Management. DASFAA Panel, 2011.

[33] X. Yuan, X. Wang, C. Wang, C. Qian, and J. Lin.Building an encrypted, distributed, and searchablekey-value store. In Proc. ACM AsiaCCS, 2016.

434

AppendixDefinition 1. Let Ext = (KGen,Buildext, Queryext) be

EncKV’s encrypted exact-match index construction. Givenleakage Lext1 , Lext2 and Lext3 , and a probabilistic polynomialtime (PPT) adversary A and a PPT simulator S, define thefollowing experiments.

RealA(k): The client calls KGen(1k) to output a privatekey K. A selects a dataset D and asks the client to build{Iext1 , · · · , Iextn } via Buildext. Then A performs a polyno-mial number of q adaptive queries, and asks the client fortokens and ciphertexts. Finally, A outputs a bit.

IdealA,S(k): A selects D. S generates {I ′ext1 , · · · , I ′extn }for A based on Lext1 . A performs a polynomial number ofadaptive q queries. From Lext2 and Lext3 , S returns the sim-ulated ciphertexts and tokens. Finally, A outputs a bit.Ext is adaptively secure with (Lext1 ,Lext2 ,Lext3 ) if for all

PPT adversaries A, there exists a simulator S such thatPr[RealA(k) = 1]− Pr[IdealA,S(k) = 1] ≤ negl(k), wherenegl(k) is a negligible function in k.

Theorem 1. Ext is adaptively secure with (Lext1 , Lext2 ,Lext3 ) under the random-oracle model if G1, G2, H1, H2, Pare secure PRF.

Proof. The objective is to prove that the adversary Acannot differentiate the real game from the simulated gamedefined in Definition 1. We define {HG1,HG2,HH1,HH2,HP }are random oracles.

The simulator S simulates the encrypted exact-match in-dexes {I ′ext1 , · · · , I ′extn } on each of n nodes. For node i, Sobtains mi, 〈|α|, |β|〉 from Lext1 , where mi is the number ofindex entries, and |α| and |β| are the bit length of encryptedlabel-value (LV) pair of each entry. Based on the above in-formation, S inserts total mi random LV pairs {α, β}mi withthe same length of the real index entry to I ′exti .S simulates the first query (vC , Cv, Cr) that returns the

encrypted values on attribute Cr if their records match vCon attribute Cv. In particular, for node i, S generates sim-ulated query tokens t′1 = (HG1(k′e||Cv||vC ||i)) and t′2 =(HG2(k′e||Cv||vC ||i)), where k′e is a random string. Afterthat, from learning the number of matched entries ci inLext2 , S locates ci index entries from I ′exti by computing α′ =HH1(t′1, ci) from 0 to ci, and returns R′∗ = HH2(t′2, ci)⊕ β′at the same time. R′∗ can be derived from (λ,HR(k′R||λ)⊕R)), where λ and k′R are random stings, HR is a randomoracle, and R is the record ID. Next, with a random stringk′l, S simulate l′ = HP (k′l||Cr||R), and then generates a ran-

dom v∗′

with the same length of v∗. From Lext3 , S updatesM ′1,1 = 1 in a matrix M ′q×q. Besides, it creates an inverted

list T ′v∗′→α′ and inserts v∗

′|α′ for each v∗

′.

For the subsequent jth query (j from 2 to q), if the queryis conducted before, say the same as the first query, thecorresponding element in M ′1,j and M ′j,1 need to be updatedto “1”. All simulated query tokens and results of this querycan directly be copied from the tokens and results simulatedfrom the first query. If the query is not conducted before,the tokens and results will be simulated in the above stepas shown in the first query. Note that because Tv∗→α fromLext3 tells the repeated result values queried from different

attributes, 〈l′, v∗′〉 appeared before can be obtained from

T ′v∗′→α′ .Due to the semantic security of secure PRF, A cannot

differentiate the simulated tokens and results from the realtokens and results.

Definition 2. Let Rng = (KGen,Buildrng, Queryrng)be EncKV’s encrypted exact-match index construction. Givenleakage Lrng1 , Lrng2 , Lrng3 , and Lrng4 , and a PPT adversaryA and a PPT simulator S, define the following experiments.

RealA(k): The client calls KGen(1k) to output a privatekey K. A selects a dataset D and asks the client to build{Irng1 , · · · , Irngn } via Buildrng. Then A performs a polyno-mial number of q adaptive queries, and asks the client fortokens and ciphertexts. Finally, A outputs a bit.

IdealA,S(k): A selects D. S generates {I ′rng1 , · · · , I ′rngn }for A based on Lrng1 . A performs a polynomial number ofnon-adaptive q queries. From Lrng2 , Lrng3 , and Lrng4 , S re-turns the simulated ciphertexts and tokens. Finally, A out-puts a bit.Rng is non-adaptively secure with (Lrng1 , Lrng2 , Lrng3 , Lrng4 )

if for all PPT adversaries A, there exists a simulator S suchthat Pr[RealA(k) = 1] − Pr[IdealA,S(k) = 1] ≤ negl(k),where negl(k) is a negligible function in k.

Theorem 2. Rng is non-adaptively secure with (Lrng1 ,Lrng2 , Lrng3 , Lrng4 ) if G1, G2, H1, H3, P, F1, F2, F3 are se-cure PRF.

Proof. The objective is to prove that the adversary Acannot differentiate the real game from the simulated gamedefined in Definition 2.

For each of n nodes, S iterates over q queries all at once.S generates random keys {k′r, k′o, k′R, k′1, k′2, k′3, k′l}. Regard-ing the jth query (cmp, vC , Cv, Cr) (cmp ← {<,>}) thatreturns the encrypted values on attribute Cr if their recordsmatch (cmp, vC) on attribute Cv, S simulates tokens andciphertexts from Lrng2 , Lrng3 , Lrng4 ). For node i, S computest′1 = G1(k′r, Cv||i), t′2 = G1(k′r, Cv||i), and then computesα = H1(t′1, ci) from 0 to ci.

To simulate the ORE query token ct′L, S splits vC into bblocks, and generates simulated encrypted blocks as v′|i =

π(F (k′2, v|i−1), v|i)), q′i = F2(k′3, cmp||Cv||v′|i), and u′i =

F1(k′1, v|i−1||v′|i), v′|i, q′i. To simulate an ORE ciphertext ct′R

on Cv, S obtains bdif the first block that differs betweenctL and ctR from Lrng3 . Then S simulates 2b sub blocks ineach block. For bdif , S simulates the matched sub blockas z′ = Q1(q′i, c) + Q2(u′i, γ

′) where c is the counter ofctR, and generate random strings for the rest of sub blocks.For the previous blocks, S simulates equal sub blocks asz′ = Q2(u′i, γ

′), and generates random strings for the rest.Likewise, S generates random strings for blocks after bdif .

After that, S computes β′ = H3(t′2)⊕ (ct′R||Enc(K′R, R)),and inserts {α′, β′} to I ′rngi . Total ci index entries willbe inserted. For the matched index entries, S computesl′ = P (k′l, R||Cr). In the meantime, S can also generate to-kens and ciphertexts appeared before from Lrng4 ) just like insimulating exact-match queries. When all queries are simu-lated, S inserts random index entries till I ′rngi hasmi entries,where mi is obtained from Lrng1 .

Due to the semantic security of secure PRF, A cannotdifferentiate the simulated tokens and results from the realtokens and results.

435

EncKV: An Encrypted Key-value Store with Rich Querieslibrary.usc.edu.ph/ACM/SIGSAC...

Documents

Transcript of EncKV: An Encrypted Key-value Store with Rich Querieslibrary.usc.edu.ph/ACM/SIGSAC...