Private Information Retrieval Yuval Ishai Computer Science Department Technion.

40
Private Information Retrieval Yuval Ishai Computer Science Department Technion

Transcript of Private Information Retrieval Yuval Ishai Computer Science Department Technion.

Private Information Retrieval

Yuval IshaiComputer Science Department

Technion

Talk Overview

• Intro to PIR– Motivation and problem definition– Toy examples– State of the art

• Relation with other primitives– Locally Decodable Codes– (Oblivious Transfer, CRHF)

• Constructions

• Open problems

Private Information Retrieval (PIR) [CGKS95]

• Goal: allow a user to access a database while hiding what she is after.

• Motivation: patent databases, web searches, etc.• Paradox(?): imagine buying in a store without the

seller knowing what you buy. Note: Encrypting requests is useful against third parties;

not against server holding the data.

Server

User

n}1,0{x

},...,1{ ni

xi

???

Modeling

Some “solutions”

1. User downloads entire database. Drawback: n communication bits (vs. logn+1 w/o privacy).Main research goal: minimize communication complexity.

2. User masks i with additional random indices.Drawback: gives a lot of information about i.

3. Enable anonymous access to database.Addresses a different concern: hides identity of user, not the fact that xi is retrieved.

Fact: PIR as described so far requires (n) communication bits.

Two Approaches

Information-Theoretic PIR [CGKS95,Amb97,...] Replicate database among k servers.

Unconditional privacy against t servers.

Default: t=1

Computational PIR [KO97,CMS99,...] Computational privacy, based on cryptographic assumptions.

Model for I.T. PIR

S1

User

X

i

S2

X

Sk

xi

??? ??? ???

X

Information-Theoretic PIR for Dummies

S2

i

U

i

X

n1/2

n1/2

q2 q1

a2=X·q2 a1=X·q1

S1

q1 + q2 = ei

2-server PIR with O(n1/2) communication

a1+a2=X·ei

0 1 1 0 1 1 1 0 1 1 0 0 0 0 0 1

Computational PIR for DummiesTool: homomorphic encryption

Protocol:

a b a+b =

n1/2

n1/2

i

X=

• User sends E(ei)

E(0) E(0) E(1) E(0) (=c1 c2 c3 c4)

• Server replies with E(X·ei)c2c3

c1 c2c3

c1c2

c4

• User recovers ith column of X PIR with ~ O(n1/2) communication

Bounds for Computational PIR

servers comm. assumption

[CG97] 2 O(n) one-way function

[KO97] 1 O(n) QRA /

[CMS99] 1 polylog(n) -hiding

… DCRA [Lipmaa]

[KO00] 1 n-o(n) trapdoor permutation

homomorphic encryption

Bounds for I.T. PIR Upper bounds: – O(log n / loglog n) servers, polylog(n) [BF90,BFKR91,CGKS95]– 2 servers, O(n1/3); k servers, O(n1/k) [CGKS95]– k servers, O(n1/(2k-1)) [Amb97,Ito99, IK99, BI01,WY05]

• t-private, O(nt/(2k-1)) [BI01,WY05] – k servers, O(ncloglogk /(klogk)) [BIKR02].

Lower bounds:– log n +1 (no privacy)– 2 servers, ~5log n ; k servers, ck log n [Man98,WdW04]– Better for restricted 2-server protocols [CGKS95, GKST02, BFG02, KdW03, WdW04]

CGKS

AMB

IKBI

BIKR

efficient

“clean”

“dirty”

inefficient

WY

Why Information-Theoretic PIR?

Cons:• Requires multiple servers

• Privacy against limited collusions• Worse asymptotic complexity (with const. k):

O(nc) vs. polylog(n)

Pros:• Neat question

• Unconditional privacy

• Better “real-life” efficiency• Allows very short queries or very short answers (+apps [DIO98,BIM99] )

• Closely related to a natural coding problem [KT00]

Locally Decodable Codes [KT00]

Requirements:• High fault-tolerance• Local decoding

x y

i

Question: how large should m(n) be in a k-query LDC?

n}1,0{ m

k=2: 2(n) k=3: 2O(n^ 0.5) (n2)

Recover from m faults…

… with ½+ probability

From I.T. PIR to LDC

k-server PIR with -bit queries and -bit answers

k-query LDC of length 2

over ={0,1}

• Converse relation also holds.• Binary LDC PIR with one answer bit per server• Best known LDC are obtained from PIR protocols.

– const. q: m=exp(nc loglogq / qlogq)

y[q]=Answer(x,q)

A Question about MPC

Beaver, Micali, Rogaway, 1990B., Feigenbaum, Kilian, R., 1990

Can k computationally unbounded players compute an arbitrary f with communication = poly(input-length)?

Open question:

… or with work = poly(formula-size) and constant rounds [BB89,…]

Can this be done using a constant number of rounds?

Ben-Or, Goldwasser, Wigderson, 1988Chaum, Crépeau, Damgård, 1988

k3 players can compute any function f of their inputs with total work = poly(circuit-size)

Information-theoretic MPC is feasible!

Question ReformulatedIs the communication complexity of MPC strongly correlated with the computational complexity of the function being computed?

efficientlycomputable

functions

All functions

=communication-efficient MPC

=no communication-efficient MPC

Connecting MPC and LDC

[KT00]

1990 1995 2000

• The three problems are “essentially equivalent”– up to considerable deterioration of parameters

[IK04]

cPIR and the Crypto World

cPIR

CRHF*OT

SecureComputation

NI UHCommitment

KA

Homomorphic Encryption

TrapdoorPermutation*

PIR as a Building Block• Private storage [OS98]

• Sublinear-communication secure computation– 1-out-of-n Oblivious Transfer (SPIR) [GIKM98,NP99,…]

– Keyword search [CGN99,FIPR05]

– Statistical queries [CIKRRW01]

– Approximate distance [FIMNSW01, IW04]

– Communication-preserving secure function evaluation [NN01]

Time Complexity of PIRFocus so far: communication complexity

Obstacle: time complexity • Server/s must spend at least linear time.

Workarounds:• Preprocessing [BIM00]

• Amortization [BIM00, IKOS04]

Single user

Multiple users

AdaptiveNon-adaptive

? ?

?

Protocols

• High level structure of all known protocols– User maps i into a point zFm

– User secret-shares z between servers using some t-threshold LSSS over F

– Server j responds with a linear function of x determined by its share of z.

• Two types of protocols:

– Polynomial-based [BF90,BFKR91,CGKS95,…,WY05]

• LSSS = Shamir• Scale well with k,t

– Replication-based [IK99,BI01,BIKR02]

• LSSS = CNF• Do not scale well with k,t - involve (k choose t) replication overhead• However, dominate over polynomial-based up to (k choose t) factor [CDI05]

• Best known protocols for constant k

Polynomial-Based Protocols

• Step 1: Arithmetization– Fix a degree parameter d (will be determined by k)

• Goal: Communication = O(n1/d)

– User maps i[n] into a weight-d vector z of length m=O(n1/d).

• 1 11100….0 2 11010…0 n 00…0111

– Servers view x as a degree-d m-variate polynomial P(Z1,…,Zm)= x1Z1Z2Z3 + x2Z1Z2Z4 + … + xnZm-2Zm-1Zm

– Privately retrieving i-th bit of x privately evaluating P on z.

Basic Protocol: t=1• Goal: user learns P(z) without revealing z.• Step 2: Secret Sharing of z

Fmz+z+2

z+3z+4

z

t =1t=1:

• Pick random “direction” Fm

• zj = z+j goes to Sj

• Step 3: Sj responds with P(zj)

– User can extrapolate P(z) from P(z1),…,p(zk) if k>d.• Define deg-d univariate poly Q(W)=P(z+W)

• Q(0)=P(z) can be extrapolated from d+1 distinct values Q(j)=P(z j)

Query length m=O(n1/d), answer length 1Using k servers, O(n1/k-1) communication

Basic Protocol: General t• Goal: user learns P(z) without revealing z.• Step 2: Secret Sharing of z

• Step 3: Sj responds with P(zj)

– User can extrapolate P(z) from P(z1),…,P(zk) if k>dt.

• Define deg-dt univariate poly Q(W)=P(z+W1+W22+…+Wtt)

• P(z)=Q(0) can be extrapolated from dt+1 distinct values q(j)

O(n1/d) communication using k=dt+1 servers.

General t:

• zj = z+j1+j22+…+jtt

Fmz

Improved Variant [WY05]

• Goal: user learns P(z) without revealing z.• Step 2: Secret Sharing of z

• Step 3: Sj responds with P(zj)along with all m partial derivatives of P evaluated at zj

– User can extrapolate P(z) if k>dt/2.• Define deg-dt univariate poly Q(W)=P(z+W1+W22+…+Wtt)

• P(z)=Q(0) can be extrapolated from 2k>dt distinct values Q(j),Q’(j)

• Complexity: O(m) communication both waysSame communication using half as many servers!

General t:

• zj = z+j1+j22+…+jtt

Breaking the O(n1/(2k-1)) Barrier [BIKR02]

communication in k-server PIR

length of k-query binary LDC

previous new previous new O(n1/3) - 2^O(n) - O(n1/5) O(n1/5.25) 2^O(n1/2) - O(n1/7) O(n1/7.87) 2^O(n1/3) 2^O(n3/10) O(n1/9) O(n1/10.83) 2^O(n1/4) 2^O(n1/5) O(n1/11) O(n1/13.78) 2^O(n1/5) 2^O(n1/7)

k = 2

k = 3

k = 4

k = 5

k = 6

Arithmetization

• As before, except that now F=GF(2)– Fix a degree parameter d (will be determined by k)

• Goal: Communication = O(n1/d)

– User maps i[n] into a weight-d vector z of length m=O(n1/d).

• 1 11100….0 2 11010…0 n 00…0111

– Servers view x as a degree-d m-variate polynomial P(Z1,…,Zm)= x1Z1Z2Z3 + x2Z1Z2Z4 + … + xnZm-2Zm-1Zm

– Privately retrieving i-th bit of x privately evaluating P on z.

Effect of Degree Reduction

degree d, m variablessize n

degree d/c, m variablessize O(n1/c)

Degree Reduction Using Partial Information [BKL95,BI01]

Each entry of y is known to all but one server.

S1

User

Q Q

y

S2 SkQ

Q(y)

S1

User

P P

z

S2 SkP

P(z)

z is hidden from servers

Q = + +S1 S2 S3

+ +

k=3,d=6

S1 S3 S1 S2 S3

Q(y)=Q1(y)+Q2(y)+Q3(y)degQj d/k = 2

Q1 Q3 Q2

Back to PIR

Each entry of y is known to all but one server.O(n1/k) comm. bits

S1

User

Q Q

y

S2 SkQ

Q(y)

S1

User

P P

z

S2 Sk

P

P(z)

z is hidden from serversn comm. bits

Let z=y1+…+ yk , where the yj are otherwise randomQ(Y1,…,Yk)= P(Y1+… +Yk)

Initial Protocol

• User picks random y1,…, yk s.t. y1+…+ yk= z,

and sends to Sj all y’s except yj.

• Servers define an mk-variate degree-d polynomial Q(Y1,…,Yk)= P(Y1+… +Yk) .

• Each Sj computes degree-(d/k) poly. Qj , such that Q(y)= Q1(y)+…+Qk(y).

• Sj sends a description of Qj to User.

• User computes Qj(y)=xi .

O(m) = O(n1/d)

O(n1/k)

M Sj missing at most d/k variables.

A Closer Look

Useful parameters:• d=k-1 query length O(n1/(k-1))

d/k =0 answer length 1• d=2k-1 query length O(n1/(2k-1))

d/k =1 answer length O(n1/(2k-1))

Bestprevious

binaryPIR

Bestprevious

PIR

deg Qj d/k

Boosting the Integer Truncation Effect • Idea: apply multiple “partial” degree reduction steps.

• Generalized degree reduction:

Assign each monomial to the k’ servers V which jointly miss the least number of variables.

Q = + +S1 S2 S3

+ +

k=3,d=6, k’=2

'||

)()(kV

VQQ yyS1S2 S2S3 S1S2 S1S2 S1S3

replication degree size k d n k’ d’ n’=O(n d’/d) k” d” n”=O(n’ d ”/d’) … … …

reduction

reduction

reduction

1 d/k n d/k /d

k d n k d’ n

The missing operator: #vars m m’=O(md/d’)

conversion

Additional cost: re-distribute new point y’

Queries Answers

O(n4/21) communication

replication degree #vars size 3 7 O(n1/7) n

Example: k=3

reduction 2 4 O(n1/7) O(n4/7)

conversion 2 3 O(n4/21) O(n4/7)

reduction 1 1 O(n4/21) O(n4/21)

In Search of the Missing Operator

Must have m’=(md/d’).Question: For which d’<d can get m’=O(md/d’)?•Possible when d’|d .•Open otherwise. Positive result Better PIR •Simplest open case: d=3, d’=2, m’=O(m3/2)

d, m d’, m’P(y)=P’(y’)

P P’

y y’

A Workaround

• Can’t solve the polynomial conversion problem in general.• … but easy to solve given the promise weight(y)=const.• Stronger degree reduction:

'||

11 ),...,(),...,(kV

kVk QQ yyyy

V

kVk QQ )...(),...,( 11 yyyy

• Main technical lemma: good parameters for strong degree reduction.

Open Problems

• Better upper bounds

– Known: O(ncloglogk /(klogk)) – What is the true limit of our technique?

• Generalize best upper bound to t>1

• Tight bounds for polynomial conversion

• Lower bounds

– Known: clogn

– Simplest cases:• k=2

• k=3, single answer bit per server