Private Information Retrieval
description
Transcript of Private Information Retrieval
Private Information Retrieval
Based on the talk by
Yuval Ishai, Eyal Kushilevitz, Tal Malkin
Outline Introduction Common approaches
Information-theoretical Computational
Summary
Private Information Retrieval (PIR):assumptions
Semi-honest assumption on servers Server is trustable in terms of honestly
following the protocol Server knows every bit of the data Server may record client’s
requests/queries
Malicious servers Drop messages Change messages Collude with other parties
Private Information Retrieval (PIR): intro
Goal: allow user to query database while hiding the identity of the data-items she is after.
Note: hides identity of data-items; not the existence of interaction with the user.
Motivation: patent databases; stock quotes; web access; many more....
Paradox(?): imagine buying in a store without the seller knowing what you buy.
(Encrypting requests is useful against third parties; not against the owner of data.)
Modeling Server: holds n-bit string x n should be thought of as very large
User: wishes to retrieve xi
and to keep i private
Remark: it is the most basic version; the building block for involved retrieval.
Server sends entire database x to User.
Information theoretic privacy.
Communication: n
SERVER
xi
USER
x =x1,x2 , . . ., xn
x1,x2 , . . ., xn
Trivial Private Protocol
Is this optimal?
Obstacle
Theorem [CGKS]: In any 1-server PIR with informationtheoretic privacy the communication is
at least n.
Information theoretic privacy/security:The ciphertext gives no information about the
plaintext
More “solutions” User asks for additional random indices.
Pick a few random indices to hide the real one Drawback: can be estimated
Employ general crypto protocols to compute xi privately. 1-out-N Oblivious Transfer Drawback: highly inefficient (polynomial in n).
Anonymity (e.g., via Anonymizers). Note: address different problems: hides
identity of user; not the fact that xi is retrieved.
Two Approaches
Information-Theoretic PIR [CGKS95,Amb97,...] Replicate database among k servers.
servers do collude.
Computational PIR [CG97,KO97,CMS99,...] Computational privacy, based on cryptographic assumptions – NP hard to break the approach
Known Communication Upper Bounds
Multiple servers, information-theoretic PIR: 2 servers, comm. n1/3 [CGKS95]
k servers, comm. n1/(k) [CGKS95, Amb96,…,BIKR02]
log n servers, comm. Poly( log(n) ) [BF90, CGKS95]
Single server, computational PIR: Comm. Poly( log(n) ), n is the # of items Under appropriate computational assumptions
[KO97,CMS99]
Approach I: k-Server PIR
Correctness: User obtains xi
Privacy: No single server gets information about i
U
S1
x {0,1}n
S2x {0,1}n
i
x {0,1}n Sk
Information-Theoretic 2-Server PIR
Best Known Protocol: comm. n1/3 Let’s look at an example with comm.
cost n1/2
Two protocols:Protocol I: n bit queries, 1 bit answersProtocol II: n1/2 bit queries, n1/2 bit
answers
Protocol I: 2-server O(n) PIR
S2
i
U
i
n
Q1{1,…,n}
S1
Q2=Q1 i
0 0 1 1 0 011 10 00
*User sent O(n) bits
1 + 1 = 0
0 + 0 = 0
1 + 0 = 1
Meaning of Q2=Q1 i
Q1 is a random subset
Protocol I: 2-server PIR
S2
i
U
i
n
Q1 {1,…,n}
S1
11
Qa x
Q2=Q1 i
0 1 0 0 1 1 0 1 0 0 010
*Server replies 1 bit
Protocol I: 2-server PIR
S2
i
U
i
n
Q1 {1,…,n}
S1
a1a2=xi
11
Qa x
2
2Q
a x
Q2=Q1 i
0 1 0 0 1 0 1 0 0 01 110
Protocol II: PIR with O(n1/2) Communication
S2
j,i
U
i
X
m=n1/2
m=n1/2
Q1 {1,…,m}
S1
1,1 1,2 1,, ,..., ma a a
Q2=Q1 i
j
2,1 2,2 2,, ,..., ma a a
1, ja
1,2a1,1a
a1,ja2,j=xj,i
Make the n-bit vector as a n1/2 * n1/2 matrix
Apply ex-or sum to each row
Computational PIR with O(n1/2) Comm. Based on encryption
Quadratic Residue (QR)
N = p*q , p,q are primes.Q(y, N) = 0 iff exists w such that w^2 = y (mod N) 1 otherwise
Security:
if p, q is unknown, it is computationally impossible
to determine Q(y,N).
If p,q is known, Q(y,N) can be determined in O(|N|^3)
Understanding modulo: w^2 = y+k*N, k can be any integer
Example: 2^2 = 4 (mod 10), 4^2 = 6 (mod 10)
Example Quadratic Residue:
E(0) QRN
E(1) NQRN
Properties:
QR QR = QR
NQR QR = NQR
For any y, y^2 is QR.
Computational PIR with O(n1/2) Comm.
0 1 1 0 1 1 1 0 1 1 0 0 0 0 0 1
n1/2
n1/2
b
Goal: user wants to know entry M(a,b)
1.User picks N=pq and sends N to server
2.User picks uniformly at random s=n1/2 numbers, from the set Z={x|1<=x<=N, gcd(N, x)=1}, such that yb is NQR and yj (j!=b) is QR, and sends them to server
3. For each row r, server calculateW(r,j) = yj
2 if M(r,j) =0, yj if M(r,j)=1
a
),(1
jrwZt
jr
5. Zr is a QR iff M(r,b) =0
4. Server sends Z1,…,ZsTo User, and User checks with Za is QR
Related work Can be a building block for high-level
security protocols Has connection with “Locally
Decodable Codes” (LDC) It was proved that the three
problems: PIR, LDC, and Circuit-based SMC are equivalent.
Summary
Focus so far: communication complexityObstacle: time complexity All existing protocols require high
computation by the servers (linear computation per query).
Are there methods to reduce server cost?