Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan...

42
Information Sharing Information Sharing across Private Databases across Private Databases Rakesh Agrawal Rakesh Agrawal Alexandre Evfimievski Alexandre Evfimievski Ramakrishnan Srikant Ramakrishnan Srikant IBM Almaden Research Center IBM Almaden Research Center

Transcript of Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan...

Page 1: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Information Sharing across Information Sharing across Private DatabasesPrivate Databases

Rakesh AgrawalRakesh Agrawal

Alexandre EvfimievskiAlexandre Evfimievski

Ramakrishnan SrikantRamakrishnan Srikant

IBM Almaden Research CenterIBM Almaden Research Center

Page 2: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Assumption: Information in each database can be Assumption: Information in each database can be freely shared.freely shared.

Today’s Information Sharing Today’s Information Sharing SystemsSystems

Mediator

Q R

Federated

Q R

Centralized

Page 3: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Selective Document SharingSelective Document Sharing

R is shopping for R is shopping for technology.technology.

S has intellectual S has intellectual property it may want to property it may want to license.license.

First find the specific First find the specific technologies where there technologies where there is a match, and then is a match, and then reveal further information reveal further information about those.about those.

R

ShoppingList

S

TechnologyList

Example 2: Govt. agencies sharing information on a

need-to-know basis.

Page 4: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Medical Research Medical Research

Validate hypothesis Validate hypothesis between adverse between adverse reaction to a drug and a reaction to a drug and a specific DNA sequence.specific DNA sequence.

Researchers should not Researchers should not learn anything beyond 4 learn anything beyond 4 counts:counts:

MayoClinic

DNA Sequences

DrugReactions

Adverse ReactionAdverse Reaction No Adv. ReactionNo Adv. Reaction

Sequence PresentSequence Present ?? ??

Sequence AbsentSequence Absent ?? ??

Page 5: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Minimal Necessary Information Minimal Necessary Information SharingSharing

Compute queries across databases so that no more Compute queries across databases so that no more information than necessary is revealed.information than necessary is revealed.

Need is driven by several trends:Need is driven by several trends:– End-to-end integration of information systems End-to-end integration of information systems

across companies.across companies.– Simultaneously compete and cooperate.Simultaneously compete and cooperate.– Security – need-to-know information sharingSecurity – need-to-know information sharing– Privacy legislation & stated privacy policesPrivacy legislation & stated privacy polices

Page 6: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Talk OutlineTalk Outline

MotivationMotivation Problem DefinitionProblem Definition ProtocolsProtocols Cost AnalysisCost Analysis ConclusionsConclusions

Page 7: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Current TechniquesCurrent Techniques

Trusted Third PartyTrusted Third Party– Has to be completely trusted, both wrt intent and Has to be completely trusted, both wrt intent and

competence against security breaches.competence against security breaches.

Secure Multi-Party ComputationSecure Multi-Party Computation– Given two parties with inputs x and y, compute f(x,y) such Given two parties with inputs x and y, compute f(x,y) such

that the parties learn only f(x,y) and nothing else.that the parties learn only f(x,y) and nothing else.– Can be solved by building a combinatorial ciruit, and Can be solved by building a combinatorial ciruit, and

simulating that circuit [Yao86].simulating that circuit [Yao86]. Cost makes them impractical for database-size Cost makes them impractical for database-size

problems.problems.

Page 8: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Our Security ModelOur Security Model

No third party.No third party. Main parties directly execute a protocol, which is designed to Main parties directly execute a protocol, which is designed to

guarantee that they do not learn any more than they would guarantee that they do not learn any more than they would have learnt had they given the data to a trusted third party and have learnt had they given the data to a trusted third party and got back the answer.got back the answer.

Honest-but-curious behavior: Parties follow protocol properly, Honest-but-curious behavior: Parties follow protocol properly, except that they can record all computation & received except that they can record all computation & received messages, and analyze them to learn additional information.messages, and analyze them to learn additional information.

Page 9: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Problem Statement (Ideal)Problem Statement (Ideal)

GivenGiven– Two parties: R (receiver) and S (sender)Two parties: R (receiver) and S (sender)

– Databases: DDatabases: DRR and D and DSS

– Query Q spanning the tables in DQuery Q spanning the tables in DRR and D and DSS

Compute the answer to Q and return it to R without Compute the answer to Q and return it to R without revealing any additional information to either party.revealing any additional information to either party.

Anything R can learn from the answer to the query is fair game!

Example: If Q = VR VS, then for all v VR – VS, R knows v VS.

Page 10: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Problem Statement (Minimal Problem Statement (Minimal Sharing)Sharing)

Given:Given:– Two parties: R (receiver) and S (sender)Two parties: R (receiver) and S (sender)

– Databases: DDatabases: DRR and D and DSS

– Query Q spanning the tables in DQuery Q spanning the tables in DRR and D and DSS

– Additional (pre-specified) categories of information IAdditional (pre-specified) categories of information I

Compute the answer to Q and return it to R without Compute the answer to Q and return it to R without revealing any additional information to either party, revealing any additional information to either party, except for the information contained in Iexcept for the information contained in I

Page 11: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

ProtocolsProtocols

Protocols for four key operations: Protocols for four key operations: – Intersection, Equijoin, Intersection Size & Equijoin SizeIntersection, Equijoin, Intersection Size & Equijoin Size

Notation: Notation:

– TTRR , T , TS S : tables in D: tables in DRR and D and DS S respectively.respectively.

– VVRR, V, VSS : set of distinct values in T : set of distinct values in TRR and T and TS S respectively.respectively. Additional Information I: Additional Information I:

– For intersection, intersection size & equijoin, For intersection, intersection size & equijoin,

I = { |VI = { |VSS| , |V| , |VRR| }| }

– For equijoin size, I also includes the distribution of For equijoin size, I also includes the distribution of duplicates & some subset of information in Vduplicates & some subset of information in VSS V VRR

Page 12: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Related WorkRelated Work

[NP99]: Protocols for list intersection problem[NP99]: Protocols for list intersection problem– Oblivious evaluation of n polynomials of degree n each.Oblivious evaluation of n polynomials of degree n each.– Oblivious evaluation of nOblivious evaluation of n22 polynomials. polynomials.

[HFH99]: find people with common preferences, without [HFH99]: find people with common preferences, without revealing the preferences.revealing the preferences.– Intersection protocols are similar to ours, but do not Intersection protocols are similar to ours, but do not

provide proofs of security.provide proofs of security. Private Information RetrievalPrivate Information Retrieval Privacy Preserving Data MiningPrivacy Preserving Data Mining

Page 13: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Talk OutlineTalk Outline

MotivationMotivation Problem DefinitionProblem Definition ProtocolsProtocols

– IntersectionIntersection– Intersection Size & Equijoin SizeIntersection Size & Equijoin Size– JoinsJoins– Proof MethodologyProof Methodology

Cost AnalysisCost Analysis ConclusionsConclusions

Page 14: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

A Simple, but Incorrect, A Simple, but Incorrect, Intersection ProtocolIntersection Protocol

R S

VR VS

fe(VS )

VR VS := { v VR | fe(v) fe(VS ) }

fe(VS )

Problem: For any element x, R can check whether fe(x) is in fe(VS )

R & S agree to use encryption function

fe (with key e)Shorthand for { fe(x) | x VS }

Page 15: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Intersection Protocol: IntuitionIntersection Protocol: Intuition

Still want to encrypt the value in VStill want to encrypt the value in VRR and V and VSS and and

compare the encrypted values.compare the encrypted values. However, want an encryption function such that it However, want an encryption function such that it

can only be jointly computed by R and S, not can only be jointly computed by R and S, not separately.separately.

Page 16: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Commutative EncryptionCommutative Encryption

Pair of encryption functions f and g such that Pair of encryption functions f and g such that

f(g(v)) = g(f(v))f(g(v)) = g(f(v)) Assuming the Decisional Diffie-Hellman (DDH) hypothesis, Assuming the Decisional Diffie-Hellman (DDH) hypothesis,

ffee(x) = x(x) = xee mod p mod p

wherewhere– p: safe prime number, i.e., both p and q=(p-1)/2 are primesp: safe prime number, i.e., both p and q=(p-1)/2 are primes– Dom f: all quadratic residues modulo p, andDom f: all quadratic residues modulo p, and– encryption key e encryption key e 1, 2, …, q-1 1, 2, …, q-1

is a commutative encryption.is a commutative encryption.

Page 17: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Commutative Encryption (2)Commutative Encryption (2)

The powers commute:The powers commute:(x(xdd mod p) mod p)ee mod p = x mod p = xdede mod p = (x mod p = (xee mod p) mod p)dd mod p mod p

DDH hypothesis: The distribution of <gDDH hypothesis: The distribution of <gaa, g, gbb, g, gabab> is > is computationally indistinguishable from the computationally indistinguishable from the distribution of <gdistribution of <gaa, g, gbb, g, gcc> where a,b,c > where a,b,c rr Dom f. Dom f.– Implication: <x, xImplication: <x, xee, y, y, y, yee> is also indistinguishable from > is also indistinguishable from

<x, x<x, xee, y, z> where x,y,z , y, z> where x,y,z rr Dom f. Dom f.

– Note: DDH does not hold if adversary can select a, b, c.Note: DDH does not hold if adversary can select a, b, c.

Page 18: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Intersection ProtocolIntersection Protocol

RS

VRVS

Secret keyeR

eS

feS(VS )To satisfy DDH, we apply feS on h(VS), where h is a hash function, not directly on VS.

Page 19: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

R

Intersection ProtocolIntersection Protocol

S

VRVS

feS(VS )feS(VS )

feR(feS(VS ))

eReS

feS(feR(VS ))

Commutative property

Page 20: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

R

Intersection ProtocolIntersection Protocol

S

VRVS

feR(VR ) feR(VR )

feS(feR(VS )) <y, feS(y)> for y feR(VR )

eReS

<x, feS(feR(x))> for x VR

<y, feS(y)> for y feR(VR )

Since R knows<x, feR(x)>

Page 21: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Intersection Size ProtocolIntersection Size Protocol

R S

VRVS

feR(VR ) feS(VS )

feS(VS ) feR(VR )

feR(feS(VS ))

eReS

feS(feR(VR ))

feR(feS(VR))

R cannot map z feR(feS(VR)) back to x VR.

Page 22: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Equijoin Size ProtocolEquijoin Size Protocol

Same as intersection size protocol, but allows duplicates.Same as intersection size protocol, but allows duplicates. Can reveal some subset of information in Can reveal some subset of information in VR VS based on

distribution of duplicates.

– If each element in VR VS has same number of duplicates in VR, does not reveal any additional information beyond the join size and the distribution of duplicates in VS.

– If each element in VR VS has unique number of duplicates in VR, reveals VR VS and the number of duplicates in VS for elements in VR VS.

Page 23: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Equijoin Protocol: IntuitionEquijoin Protocol: Intuition

R needs some extra information ext(v) for values v R needs some extra information ext(v) for values v V VRR V VSS..

– ext(v): information about the other attributes in ext(v): information about the other attributes in TTSS for those records where T for those records where TSS.A = v .A = v

S has second secret key eS has second secret key eSS’’

For each value v For each value v V VSS, ,

– S generates an encryption key S generates an encryption key = f = feS’eS’(v), and(v), and

– encrypts ext(v) using encryption function K with key encrypts ext(v) using encryption function K with key .. S allows R to learn fS allows R to learn feS’eS’(v) only for v (v) only for v V VRR.. K need not be a commutative encryption.K need not be a commutative encryption.

Page 24: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Join ProtocolJoin Protocol

R S

VR

feR(VR ) feR(VR )

<y, feS(y) , feS’(y)> for y feR(VR )

<x, feS(feR(x)), feS’(feR(x))> for x VR

eReS, eS’

<x, feS(x), feS’(x)> for x VR

feR-1(feS(feR(x))

= feR-1(feR(feS(x))

= feS(x)

Page 25: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

S

Join ProtocolJoin Protocol

R

VR

eReS, eS’

<x, feS(x), feS’(x)> for x VR

VS + ext(VS)

<feS(v), K(feS’(v), ext(v))> for v VS

<feS(v), K(feS’(v), ext(v))> for v VS

K: encryption function, Encrypts ext(v) using feS’(v)

as the encryption key<x, feS(x), feS’(x), K(feS’(x), ext(x))>

for x VR VS

Page 26: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Proof MethodologyProof Methodology

Consider two distributions:Consider two distributions:– S’s view of the protocol.S’s view of the protocol.– a simulation of S’s view that only uses what S is supposed a simulation of S’s view that only uses what S is supposed

to have at the end of the protocol.to have at the end of the protocol. e.g., Ve.g., VSS, V, VSS V VRR, and |V, and |VRR| for intersection.| for intersection.

If for any VIf for any VSS and V and VRR, these two distributions are , these two distributions are

computationally indistinguishable, then the protocol computationally indistinguishable, then the protocol is secure.is secure.– i.e., S cannot learn anything else from the protocol.i.e., S cannot learn anything else from the protocol.

Page 27: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Proof Methodology (2)Proof Methodology (2)

Simulation only uses the knowledge S is supposed Simulation only uses the knowledge S is supposed to have at the end of the protocol.to have at the end of the protocol.

Distinguisher can also use the inputs of R, i.e., VDistinguisher can also use the inputs of R, i.e., VRR, ,

but not R’s secret keys.but not R’s secret keys.– Implication: S doesn’t learn anything from the protocol Implication: S doesn’t learn anything from the protocol

even if S (correctly) guesses some of R’s inputs.even if S (correctly) guesses some of R’s inputs.

Page 28: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

ProofsProofs

We prove (for each protocol) that if the two We prove (for each protocol) that if the two distributions can be distinguished, the DDH distributions can be distinguished, the DDH hypothesis is false.hypothesis is false.

Easy to come up with protocols that look okay, but Easy to come up with protocols that look okay, but are flawed …are flawed …– Proof of security is important for real-world acceptance & Proof of security is important for real-world acceptance &

use.use.– The proofs are also fun! The proofs are also fun!

Page 29: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Talk OutlineTalk Outline

MotivationMotivation Problem StatementProblem Statement ProtocolsProtocols Cost AnalysisCost Analysis ConclusionsConclusions

Page 30: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Cost Analysis: OperationsCost Analysis: Operations

Cost is dominated by exponentiations.Cost is dominated by exponentiations. Let CLet Cee = cost of x = cost of xee mod p mod p

– x, e, p are all 1024-bit integersx, e, p are all 1024-bit integers– Roughly 0.02 seconds on a Pentium 3 (in 2001) [NP01], or Roughly 0.02 seconds on a Pentium 3 (in 2001) [NP01], or

2 x 102 x 1055 per hour per hour

Intersection: 2 (|VIntersection: 2 (|VRR| + |V| + |VSS|) C|) Cee

Join: (2 |VJoin: (2 |VRR| + 5 |V| + 5 |VSS|) C|) Cee Algorithms are trivially parallelizable.Algorithms are trivially parallelizable.

Page 31: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Selective Document Sharing: Selective Document Sharing: ImplementationImplementation

For each pair of documents dFor each pair of documents dRR D DRR and d and dSS D DSS

– R and S execute the intersection protocol to get |dR and S execute the intersection protocol to get |dRR|, |d|, |dSS|, |,

and |dand |dRR d dSS|.|.

– Then compute similarity function f between the Then compute similarity function f between the documents.documents.

Note: This protocol also reveals to R, for each Note: This protocol also reveals to R, for each document ddocument dRR D DRR, the size of |d, the size of |dRR d dSS| for each | for each

ddSS D DSS..

Page 32: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Selective Document Sharing:Selective Document Sharing:Cost AnalysisCost Analysis

If If – |D|DRR| = 10 documents, |D| = 10 documents, |DSS| = 100 docs, | = 100 docs,

– each document has 1000 words,each document has 1000 words,– 10 parallel processors,10 parallel processors,

2 hours computation time &2 hours computation time &

35 minutes communication time (on T1 line).35 minutes communication time (on T1 line).

Page 33: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Medical Research:Medical Research:ImplementationImplementation

LetLet

– VVRR = set of ids in R’s database that took the drug. = set of ids in R’s database that took the drug.

– VVRR’ = subset of V’ = subset of VRR with adverse reaction. with adverse reaction.

– VVSS = set of ids in S’s database. = set of ids in S’s database.

– VVSS’ = subset of V’ = subset of VSS with DNA sequence. with DNA sequence. Execute intersection size protocol 4 times: Execute intersection size protocol 4 times:

(V(VRR - V - VRR’) ’) (V (VS S - V- VSS’) ’) (V(VR R - V- VRR’) ’) V VSS’, ’,

VVRR’ ’ (V (VS S - V- VSS’)’) V VRR’ ’ V VSS’’

– Modified version of protocol that sends results directly to Modified version of protocol that sends results directly to researchers.researchers.

Page 34: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Medical Research:Medical Research:Cost AnalysisCost Analysis

If |VIf |VRR| = |V| = |VSS| = 1 million ids, and 10 parallel | = 1 million ids, and 10 parallel

processors:processors:– 4 hours computation time.4 hours computation time.– 1.5 hours communication time.1.5 hours communication time.

Page 35: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Talk OutlineTalk Outline

MotivationMotivation Problem StatementProblem Statement ProtocolsProtocols Cost AnalysisCost Analysis ConclusionsConclusions

Page 36: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

SummarySummary

Identified information sharing across private Identified information sharing across private databases as a new area for database research.databases as a new area for database research.

Developed novel protocols for intersection, Developed novel protocols for intersection, intersection size & equijoin, and proved that these intersection size & equijoin, and proved that these protocols disclose minimal information.protocols disclose minimal information.– Also gave protocol for equijoin size. This protocol reveals Also gave protocol for equijoin size. This protocol reveals

some information about which tuples joined, based on the some information about which tuples joined, based on the distribution of duplicates.distribution of duplicates.

Showed how new applications can be built using Showed how new applications can be built using these protocols.these protocols.

Page 37: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Future WorkFuture Work

What is the tradeoff between the additional What is the tradeoff between the additional information disclosed and efficiency?information disclosed and efficiency?– Will we be able to obtain much faster protocols if we are Will we be able to obtain much faster protocols if we are

willing to disclose additional information?willing to disclose additional information?

Can we formalize models of minimal disclosure and Can we formalize models of minimal disclosure and discover corresponding protocols for higher-level discover corresponding protocols for higher-level database operations?database operations?

Page 38: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

BackupBackup

Page 39: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

System ComponentsSystem Components

Operating System

SecureCommunication

Cryptographic Protocol

Libraries( incl. Encryption

Primitives)Database

Page 40: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Lemma 1Lemma 1

For polynomial m, the distribution of the 2 For polynomial m, the distribution of the 2 m – tuple m – tuple

is indistinguishable from the distribution of the tupleis indistinguishable from the distribution of the tuple

wherewhere

)()(...

...

)( 1

1

1

1

me

m

me

m

e xf

x

xf

x

xf

x

m

m

me

m

e z

x

xf

x

xf

x

)(...

...

)( 1

1

1

1

Page 41: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Lemma 2Lemma 2

For polynomial m and n, the distribution of the 2 For polynomial m and n, the distribution of the 2 n – tuple n – tuple

is indistinguishable from the distribution of the tupleis indistinguishable from the distribution of the tuple

wherewhere

)(...)()(...)(

......

11

11

nememee

nmm

xfxfxfxf

xxxx

nmmee

nmm

zzxfxf

xxxx

...)(...)(

......

11

11

Page 42: Information Sharing across Private Databases Rakesh Agrawal Alexandre Evfimievski Ramakrishnan Srikant IBM Almaden Research Center.

Lemma 3Lemma 3

For polynomial m and n, the distribution of the 3 For polynomial m and n, the distribution of the 3 n – tuple n – tuple

is indistinguishable from the distribution of the tupleis indistinguishable from the distribution of the tuple

wherewhere

)(...)()(...)(

)(...)()(...)(

......

''1'1'

11

11

nememee

nememee

nmm

xfxfxfxf

xfxfxfxf

xxxx

nmm

nmm

nmm

zzzz

yyyy

xxxx

......

......

......

11

11

11