Post on 01-Jan-2016
Relational-Based Encryption for Efficient Data Sharing on Encrypted Cloud Relational
Databases
Introduction
• Encrypting sensitive data items in the database is necessary, especially on cloud database
Cloud database service provider (SP)Company
Item_ID Cost Wholesale_price
Egask5 A42fgs 2S46Dg
asD3j64 139ASs Dd3fj2
Store data on cloud
Get back data
Item_ID Cost Wholesale_price
1076 10 20
3308 15 50
Note: SP does not have key
Introduction
• CryptDB, TrustedDB, Cipherbase are 3 recent encrypted cloud relational database systems supporting querying
• Conventional encryption scheme(s) are used to encrypt data without considering data structure – relational form
• For instance, all the above 3 systems use AES as the underlying encryption scheme with semantic security– (CryptDB also uses some other encryptions at the same
time, e.g., OPES, Pailier system. But these methods are either less secure or slower than AES.)
Inflexibility of conventional encryption schemes, e.g., AES, in data sharing
Item_ID Cost Wholesale_price
Egask5 A42fgs 2S46Dg
asD3j64 139ASs Dd3fj2
Alice SP Bob
Bob is my business partner, I want to let him know the wholesale price
of some of my selected products.
Alice’s options1. Send decryption key to Bob But Bob is
then able to see other data that are not intended to be shared (Infeasible)
2. Send decrypted data to Bob High processing cost and communication cost to Alice (Doable but expensive, baseline)
Alice’s data
Data in blue: to be shared with Bob
Application of data sharing
1. Alice is a company user of SP. Now, Alice hires Bob, who is a data analytics expert to perform analysis. Alice has to share some of her data with Bob
2. Alice and Bob are two business partners. They share some data for gaining advantages, e.g., more market information.
Data sharing problem• Adversary Model
– SP and Bob are both compromised by the same attacker, i.e., the attacker can observe everything seen by SP and Bob and control the actions by SP and Bob
• Sharing goal:– Alice defines a subset of data DS to be shared with Bob
– Functional: Bob observes plain values of DS – Security:
• Bob and the attacker cannot observe plain values of data outside DS
• SP cannot observe any data
– Performance: Low processing cost and communication cost to AliceAlice SP Bob
Attacker
Overview of our schemes
• Proposed encryption framework – relational-based encryption for supporting data sharing
• Basic scheme: Hash-based construction• Constructions with pre-computed index– Static Index– Trapdoor-based index for arbitrary sharing
Relational-based encryption (RBE)
• Idea: Use individual key to encrypt each data item
A B C
a1 b1 c1
a2 b2 c2
A B C
k11 k12 k13
k21 k22 k23
A B C
a’1 b’1 c’1
a’2 b’2 c’2
+
Plain values Value key table Encrypted values
To share b1
Give k12 to BobBob can only decrypt b’1, other values are safe since Bob does not have other value keys
How to maintain the value key table?
• Assumption:– We assume there is only one table in the database
• Extension to multiple tables is obvious
– Model• The table has n tuples and m columns• Each tuple/column has a tuple/column ID• T is the set of tuple ID, C is the set of column ID• Tuple/column ID is not name of tuple/column
– Tuple/column ID is not supposed to be changed once generated– Update of tuple/column ID will lead to re-encryption of entire
tuple/column
How to maintain the value key table?
• Individual key for each data item
Tuple ID
t1
t2
A B C
Column ID c1 c2 c3
k11 = ValueKeyGen(t1, c1, K)
Master key held by AliceSo, only Alice can do it
Value key
One-way function: one cannot recover input from outputExample: encryption/one-way hash
Solution framework
k11 k12
k21 k22
Tuple ID
t1
t2
A B C
Column ID c1 c2 c3
1. K = KeyGen()
Alice SP
Randomly generate IDsStore at SP
2. kij = ValueKeyGen(ti, cj, K)3. v’ij = Enc(vij, kij)v11 v12
v21 v22
v'11 v'12
v'21 v'224. vij = Dec(v’ij, kij)
5. DS = ShareProtocol()
A Protocol between Alice, Bob and SP to let Bob observe DS
Hash-based construction
• Assume the data to be shared with Bob DS is of relational form, i.e., DS is also a table with a set of tuple TS and a set of column CS
– TS is a subset of T
– CS is a subset of C
• We will show how to remove this assumption later, in the last scheme we propose
Hash-based construction
• 1. K = KeyGen()– Generate a random bitstring KB
• 2. kij = ValueKeyGen(ti, cj, K)kij = h(h(ti xor KB) xor h(cj xor KB))
• 3. v’ij = Enc(vij, kij)v’ij = vij xor kij
• 4. vij = Dec(v’ij, kij) vij = v’ij xor kij
Note that kij is ued to encrypt tuple ti at column cj only, this is known as one-time pad
Sharing protocol
• Alice:– For every ti in TS, compute t’i = h(ti xor KB), send t’i
to Bob– For every cj in CS, compute c’j = h(cj xor KB), send c’j
to Bob• Bob– Compute value key kij = h(t’i xor c’j)
– Use kij to decrypt data
Background on construction
• Assumption: Random Oracle Modelh is a secure hash function– One-way: cannot derive input from output– Random: the output is “random” (an adversary
cannot distinguish it from a real random number)– From h(a xor b), an attacker cannot know what a or b
or a xor b is• Random Oracle Model is used to prove many
other schemes used in practice, e.g., RSA encryption (with OAEP), RSA signature
Background on one-time pad
• Each key is used only once– Note: length of key should not be less than
message length• Perfect secrecy can be achieved– Secure even against adversaries with infinite
computational power• Simple function for encryption/decryption– v’ = v xor k
Security
• Adversary (with SP’s and Bob’s views together)– Encrypted data– t’i = h(ti xor KB)
– c’j = h(cj xor KB)
– Tuple ID: ti
– Column ID: cj
– Shared data? ?
? k12
? ?
? v22
v'11 v'12
v'21 v'22
Plain data
Value key
Encrypted data
Can’t derive information about plain data or value key from only encrypted value (One-time pad)
Know ti and t’i but cannot derive KB
(Random oracle model)
Optimization
• kij = h(h(ti xor KB) xor h(cj xor KB))
• To encrypt/decrypt the entire table– Compute t’i = h(ti xor KB) for all tuples
– Compute c’j = h(cj xor KB) for all columns
– kij = h(t’i xor c’j)
Same for all values of the same tuple
Same for all values of the same column
n
m
mn
Cost analysisRBE-HB AES
KeyGen O(1) O(1)
Encryption/decryption mn + m + n hash* mn encryption/decryption*
Sharing: Alice’s computation
m+n mn decryption
Sharing: Alice’s communication
2m+2n (one between SP and Alice; one between Alice and Bob)
2mn values (one between SP and Alice; one between Alice and Bob)
Bob’s decryption cost mn Nil
*SHA-256 vs AES performance: similarAbout 100k operation per second*Any encryption function can be our hash function
Need for indexing support
• Problem of the basic construction– Number of tuples, n, is usually a big number– Still a high cost to Alice during data sharing
• The problem cannot be resolved without an index– Alice needs a way to define the sharing space DS
– Number of possible combination of different tuples: 2n
– Minimum average size to denote one combination: n = lg(2n)
Limiting sharing options by a hierarchy
• Assume there is a known hierarchy such that data in sharing can mostly be described by the hierarchy, e.g., share all chocolate product sales order (tuples)– Otherwise, just stick to the basic scheme
• B+-tree or any other structure can also be used, e.g., sharing is mostly related to time, we can use a B+-tree ordered by time
2014 2015
All
2013
Biscuits Chocolate Candy… …
Limiting sharing options by a hierarchy
• Alice can choose several nodes in the hierarchy (tree)– All tuples under the chosen nodes are shared with Bob– The leaf node of the hierarchy is a tuple
• Assume the number of nodes |N| selected by Alice is small• Our idea:
– Alice computes an index Δ and sends Δ to SP– In sharing, Alice “shares” with Bob with the selected nodes and
Bob is able to communicate with SP and observe all descendant nodes but not any other nodes
– |N| << |TS|– Alice’s cost in sharing can be significantly reduced
Index structure
E(t’1, kn12) E(t’2, kn12) E(kn12,K)
t3 t4 t5 t6 t7 t8
Leaf level t1 t2
E(kn12, kn14) E(kn34, kn14) E(kn14,K)
E(kn14, kn18) E(kn58, kn18) E(kn18,K)
K: master key owned by AliceKB is part of KE: encryption, e.g., AESknij: node keyt’i = h(ti xor KB)
Index maintenance
• Too cumbersome to discuss everything here• Key issues– Inserting an entry: Re-encrypt the entry using the new
parent node key– Deleting an entry: trivial– Bob can see the original shared data after any update –
version control by SP– Avoiding Bob to see more data after an unshared entry
adds to a shared node• The parent node needs to re-generate the node key and re-
encrypt all entries
Sharing using index
Alice SP Bob
1. Alice maintains an index at SP
2. Alice issues a query to SP. E.g., Alice wishes to share data of sales records of Jan 2015.
2. SP finds out the shared nodes in the indexShared nodes: all tuples covered by these nodes are to be shared with Bob
3. Alice retrieves shared node information
4. Alice processes the shared node information and sends to Bob 5. Bob communicates with SP
and decrypts all shared data
Sharing
E(t’1, kn12) E(t’2, kn12) E(kn12,K)
t3 t4 t5 t6 t7 t8t1 t2
E(kn12, kn14) E(kn34, kn14) E(kn14,K)
E(kn14, kn18) E(kn58, kn18) E(kn18,K)
Shared tuples
Just need to share this node
Sharing
E(t’1, kn12) E(t’2, kn12) E(kn12,K)
t3 t4 t5 t6 t7 t8t1 t2
E(kn12, kn14) E(kn34, kn14) E(kn14,K)
E(kn14, kn18) E(kn58, kn18) E(kn18,K)
Shared tuples
Alice retrieves E(kn14, K)
Alice sends kn14 to Bob
With kn14, Bob can decrypt the nodes
Sharing
• Bob obtainst’1 = h(t1 xor KB)
t’2 = h(t2 xor KB)
t’3 = h(t3 xor KB)
t’4 = h(t4 xor KB)
• At the same time, Alice sends the same column information as hash-based construction
• Data decryption same as hash-based construction
Security
• K is never given out, E(x, K) is not useful to attacker• knij is known to Bob, i.e., attacker iff it is shared,
otherwise, knij is not derivable by Bob/attacker
E(t’1, kn12) E(t’2, kn12) E(kn12,K)
E(kn12, kn14) E(kn34, kn14) E(kn14,K)
E(kn14, kn18) E(kn58, kn18) E(kn18,K)
CostRBE-Index RBE-HB AES
KeyGen O(1) O(1) O(1)
Encryption / decryption
O(mn) O(mn) O(mn)
Index construction (all tuples at once)
O(n) - -
Sharing: Alice’s computation
O(x+m) O(m+n) O(mn)
Sharing: Alice’s communication
O(x + m) O(m+n) O(mn)
Bob’s decryption cost
O(mn) O(mn) -
x: number of shared nodes
Sharing of data in arbitrary form
• Can be described as multiple sub-tablesA B C
a1 b1 c1
a2 b2 c2
A
a1
a2
B
b1= +
A
a1
a2
B
b1h(t1 xor KB)h(t2 xor KB)
h(t1 xor KB)
h(ca xor KB) h(cb xor KB)
b2
Reveal more information in basic schemes
Naïve idea
• Use multiple index, each on one column
E(k11, kn12) E(k21, kn12) E(kn12,K)
E(kn12, kn14) E(kn34, kn14) E(kn14,K)
E(kn14, kn18) E(kn58, kn18) E(kn18,K)
Value key instead of row ID
Significant space overhead, index maintenance overhead
Trapdoor-based index
• One single index for all columns• The index is like a function/trapdoor• RSA-basedTo get the value key of column B
A B C
a1 b1 c1
a2 b2 c2
t1 t2
Some hint hx
A B C
k11 k12 k13
k21 k22 k23
Plaindata
Value key
k12 k22
Not other value keys of other columns / tuples
RBE-Trapdoor-based index
• 1. K = KeyGen()– Generate two big primes p, q – Set n = pq (Overloading the symbol n a bit)– Remember Φ(n) = (p-1)(q-1)• SP and Bob does not know Φ(n)
– RSA:• Generate e relatively prime to Φ(n)• d = e-1 mod Φ(n)• xed mod n = x for any x
Attacker does not know Φ(n) and how to do inverse
RBE-Trapdoor-based index
• 2. kij = ValueKeyGen(ti, cj, K)kij = cj
ti mod n
• 3. v’ij = Enc(vij, kij)v’ij = vij xor kij
• 4. vij = Dec(v’ij, kij) vij = v’ij xor kij
• Note: Tuple ID ti and column ID cj are stored at SP as encrypted
Same as basic scheme
Index structure
• Alice generates a random rx to each node x– rx is not known by SP, Bob
– rx is co-prime to Φ(n)r1
r2 r3
Leaflevel
r2 t1 mod Φ(n)|E(r2) r3 t2 mod Φ(n)|E(r3)
r1 r2-1 mod Φ(n) | r1 r3
-1 mod Φ(n) | E(r1)
= r1 t1 mod Φ(n)
Need r1-1 r3
(decryption key of r1 r3
-1)
Index maintenance
• A bit more complex than basic scheme– Inserting/moving an entry: simple– Deleting an entry: trivial– Bob can see the original shared data after any
update – version control by SP– Avoiding Bob to see more data after an unshared
entry adds to a shared node• Need to go to the root of shared sub-tree. The node
needs to re-generate the node key and re-generate all entries, descendant nodes not affected.
Sharing
• To share t1, t2 on column c1
r1
r2 r3
c1 r1-1
r2 t1 mod Φ(n)|E(r2) r3 t2 mod Φ(n)|E(r3)
r1 r2-1 mod Φ(n) | r1 r3
-1 mod Φ(n) | E(r1)
Attacker does not know c1 or r1
Summary: With this hint, attacker cannot derive value keys of other tuples/columns
r1 t1 mod r1 Φ(n)
kij = cjti mod n
Sharing summary
• Alice’s work:– Generate ca r1-1
– Generate cb r2-1
A B C
a1 b1 c1
a2 b2 c2
r1
r2 r3
r2 t1 mod Φ(n)|E(r2) r3 t2 mod Φ(n)|E(r3)
r1 r2-1 mod Φ(n) | r1 r3
-1 mod Φ(n) | E(r1)
CostRBE-TBI RBE-Index RBE-HB AES
KeyGen O(1) O(1) O(1) O(1)
Encryption / decryption
O(mn) O(mn) O(mn) O(mn)
Index construction (all tuples at once)
O(n) O(n) - -
Sharing: Alice’s computation
O(xm) O(x+m) O(m+n) O(mn)
Sharing: Alice’s communication
O(xm) O(x + m) O(m+n) O(mn)
Bob’s decryption cost
O(mn) O(mn) O(mn) -
x: number of shared nodes
Integration with existing encrypted cloud relational database
• CryptDB– Already a family of encryption schemes, so there could be multiple
copies of the same data, each encrypted by a different encryption scheme
– Just use our method as another encryption to provide data sharing service to users
• TrustedDB, Cipherbase– Use Trusted hardware– Trusted hardware can take the role of Alice– Query computation is independent to the underlying encryption
method– Replace AES by our scheme to reduce the load of trusted hardware
• Trusted hardware is having much less power than a usual computer
Experiment planRBE-TBI RBE-Index RBE-HB AES
Encryption / decryption speed test
High – (RSA) Low Low Low
Sharing testFollowing index
Varying sharing size, measure on Alice’s computation cost, communication cost, SP/Bob’s cost, index on X, query a<X<b
Sharing testNot following index
Varying sharing size, measure on Alice’s computation cost, communication cost, SP/Bob’s cost, query a<X<b and c<Y<d(may be dropped if performance is too bad)
Index maintenance (B+-tree for the test)
Should be efficient enough, moderate overhead
Should be efficient enough, low overhead
- -
Dataset: TPC-? Data generatorProbably 10m tupleShare about 1% to 50%
May pick X as some meaningful columns, e.g., SalesDate
Expected experiment results
• Encryption/decryption is efficient enough, comparable to AES for basic cases
• Sharing cost to Alice is significantly reduced compared to AES– Even the sharing data does not follow the index
structure well, hopefully, not worse than AES• Support efficient update to the index
Implementation progressRBE-TBI RBE-Index RBE-HB AES
Encryption / decryption speed test
Not started Started Optimizing Optimizing
BackupKey generationTime taken: 466.62msData generationTime taken: 691.82msData encryptionTime taken: 1793.86msTrial data decryptionTime taken: 1543.86msHint generationTime taken: 39.59msPeer viewer decryptionTime taken: 1495.78ms
Key generationTime taken: 489.27msData generationTime taken: 664.14msData encryptionTime taken: 823.85msTrial data decryptionTime taken: 786.83msHint generationTime taken: 799.33msPeer viewer decryptionTime taken: 29.67ms
#Row: 100k#Column: 20Random dataAll data shared