Private Information Retrieval

Private Information Retrieval

Based on the talk by

Yuval Ishai, Eyal Kushilevitz, Tal Malkin

Outline Introduction Common approaches

Information-theoretical Computational

Summary

Private Information Retrieval (PIR):assumptions

Semi-honest assumption on servers Server is trustable in terms of honestly

following the protocol Server knows every bit of the data Server may record client’s

requests/queries

Malicious servers Drop messages Change messages Collude with other parties

Private Information Retrieval (PIR): intro

Goal: allow user to query database while hiding the identity of the data-items she is after.

Note: hides identity of data-items; not the existence of interaction with the user.

Motivation: patent databases; stock quotes; web access; many more....

Paradox(?): imagine buying in a store without the seller knowing what you buy.

(Encrypting requests is useful against third parties; not against the owner of data.)

Modeling Server: holds n-bit string x n should be thought of as very large

User: wishes to retrieve xi

and to keep i private

Remark: it is the most basic version; the building block for involved retrieval.

Server sends entire database x to User.

Information theoretic privacy.

Communication: n

SERVER

xi

USER

x =x1,x2 , . . ., xn

x1,x2 , . . ., xn

Trivial Private Protocol

Is this optimal?

Obstacle

Theorem [CGKS]: In any 1-server PIR with informationtheoretic privacy the communication is

at least n.

Information theoretic privacy/security:The ciphertext gives no information about the

plaintext

More “solutions” User asks for additional random indices.

Pick a few random indices to hide the real one Drawback: can be estimated

Employ general crypto protocols to compute xi privately. 1-out-N Oblivious Transfer Drawback: highly inefficient (polynomial in n).

Anonymity (e.g., via Anonymizers). Note: address different problems: hides

identity of user; not the fact that xi is retrieved.

Two Approaches

Information-Theoretic PIR [CGKS95,Amb97,...] Replicate database among k servers.

servers do collude.

Computational PIR [CG97,KO97,CMS99,...] Computational privacy, based on cryptographic assumptions – NP hard to break the approach

Known Communication Upper Bounds

Multiple servers, information-theoretic PIR: 2 servers, comm. n1/3 [CGKS95]

k servers, comm. n1/(k) [CGKS95, Amb96,…,BIKR02]

log n servers, comm. Poly( log(n) ) [BF90, CGKS95]

Single server, computational PIR: Comm. Poly( log(n) ), n is the # of items Under appropriate computational assumptions

[KO97,CMS99]

Approach I: k-Server PIR

Correctness: User obtains xi

Privacy: No single server gets information about i

U

S1

x {0,1}n

S2x {0,1}n

i

x {0,1}n Sk

Information-Theoretic 2-Server PIR

Best Known Protocol: comm. n1/3 Let’s look at an example with comm.

cost n1/2

Two protocols:Protocol I: n bit queries, 1 bit answersProtocol II: n1/2 bit queries, n1/2 bit

answers

Protocol I: 2-server O(n) PIR

S2

i

U

i

n

Q1{1,…,n}

S1

Q2=Q1 i

0 0 1 1 0 011 10 00

*User sent O(n) bits

1 + 1 = 0

0 + 0 = 0

1 + 0 = 1

Meaning of Q2=Q1 i

Q1 is a random subset

Protocol I: 2-server PIR

S2

i

U

i

n

Q1 {1,…,n}

S1

11

Qa x

Q2=Q1 i

0 1 0 0 1 1 0 1 0 0 010

*Server replies 1 bit

Protocol I: 2-server PIR

S2

i

U

i

n

Q1 {1,…,n}

S1

a1a2=xi

11

Qa x

2

2Q

a x

Q2=Q1 i

0 1 0 0 1 0 1 0 0 01 110

Protocol II: PIR with O(n1/2) Communication

S2

j,i

U

i

X

m=n1/2

m=n1/2

Q1 {1,…,m}

S1

1,1 1,2 1,, ,..., ma a a

Q2=Q1 i

j

2,1 2,2 2,, ,..., ma a a

1, ja

1,2a1,1a

a1,ja2,j=xj,i

Make the n-bit vector as a n1/2 * n1/2 matrix

Apply ex-or sum to each row

Computational PIR with O(n1/2) Comm. Based on encryption

Quadratic Residue (QR)

N = p*q , p,q are primes.Q(y, N) = 0 iff exists w such that w^2 = y (mod N) 1 otherwise

Security:

if p, q is unknown, it is computationally impossible

to determine Q(y,N).

If p,q is known, Q(y,N) can be determined in O(|N|^3)

Understanding modulo: w^2 = y+k*N, k can be any integer

Example: 2^2 = 4 (mod 10), 4^2 = 6 (mod 10)

Example Quadratic Residue:

E(0) QRN

E(1) NQRN

Properties:

QR QR = QR

NQR QR = NQR

For any y, y^2 is QR.

Computational PIR with O(n1/2) Comm.

0 1 1 0 1 1 1 0 1 1 0 0 0 0 0 1

n1/2

n1/2

b

Goal: user wants to know entry M(a,b)

1.User picks N=pq and sends N to server

2.User picks uniformly at random s=n1/2 numbers, from the set Z={x|1<=x<=N, gcd(N, x)=1}, such that yb is NQR and yj (j!=b) is QR, and sends them to server

3. For each row r, server calculateW(r,j) = yj

2 if M(r,j) =0, yj if M(r,j)=1

a

),(1

jrwZt

jr

5. Zr is a QR iff M(r,b) =0

4. Server sends Z1,…,ZsTo User, and User checks with Za is QR

Related work Can be a building block for high-level

security protocols Has connection with “Locally

Decodable Codes” (LDC) It was proved that the three

problems: PIR, LDC, and Circuit-based SMC are equivalent.

Summary

Focus so far: communication complexityObstacle: time complexity All existing protocols require high

computation by the servers (linear computation per query).

Are there methods to reduce server cost?

Private Information Retrieval

Documents

Transcript of Private Information Retrieval