Graph Analysis with Node Differential Privacy

22
1 Graph Analysis with Node Differential Privacy Sofya Raskhodnikova Penn State University Joint work with Shiva Kasiviswanathan (GE Research), Kobbi Nissim (Ben-Gurion U. and Harvard U.), Adam Smith (Penn State)

description

Graph Analysis with Node Differential Privacy . Sofya Raskhodnikova Penn State University. Joint work with Shiva Kasiviswanathan ( GE Research ), Kobbi Nissim ( Ben-Gurion U. and Harvard U. ), Adam Smith ( Penn State ). Publishing information about graphs. - PowerPoint PPT Presentation

Transcript of Graph Analysis with Node Differential Privacy

Page 1: Graph Analysis with  Node Differential  Privacy

1

Graph Analysis with Node Differential Privacy

Sofya RaskhodnikovaPenn State University

Joint work with Shiva Kasiviswanathan (GE Research), Kobbi Nissim (Ben-Gurion U. and Harvard U.),

Adam Smith (Penn State)

Page 2: Graph Analysis with  Node Differential  Privacy

2

Publishing information about graphs

Many datasets can be represented as graphs• “Friendships” in online social network• Financial transactions• Email communication• Romantic relationships

American J. Sociology,  Bearman, Moody, Stovel

image source http://community.expressor-software.com/blogs/mtarallo/36-extracting-data-facebook-social-graph-expressor-tutorial.html

Privacy is a big issue!

Page 3: Graph Analysis with  Node Differential  Privacy

Differential privacy for graph data

Differential privacy [Dwork McSherry Nissim Smith 06]An algorithm A is -differentially private if for all pairs of neighbors and all sets of answers S:

𝑷𝒓 [ 𝑨 (𝑮 )∈𝑺 ] ≤𝒆𝝐𝑷𝒓 [ 𝑨 (𝑮′ )∈𝑺 ]

Graph G

Aqueries

answers)(

Government,researchers,businesses

(or) maliciousadversary

3image source http://www.queticointernetmarketing.com/new-amazing-facebook-photo-mapper/

Trustedcurator

Users

Page 4: Graph Analysis with  Node Differential  Privacy

4

Two variants of differential privacy for graphs

• Edge differential privacy

Two graphs are neighbors if they differ in one edge.

• Node differential privacy

Two graphs are neighbors if one can be obtained from the other by deleting a node and its adjacent edges.

G: G:

G: G:

Page 5: Graph Analysis with  Node Differential  Privacy

Node differentially private analysis of graphs

5image source http://www.queticointernetmarketing.com/new-amazing-facebook-photo-mapper/

• Two conflicting goals: utility and privacy– Impossible to get both in the worst case

• Previously: no node differentially private algorithms that are accurate on realistic graphs

Graph G

Aqueries

answers)(

Government,researchers,businesses

(or) maliciousadversary

Trustedcurator

Users

Page 6: Graph Analysis with  Node Differential  Privacy

6

Our contributions

• First node differentially private algorithms that are accurate for sparse graphs– node differentially private for all graphs– accurate for a subclass of graphs, which includes

• graphs with sublinear (not necessarily constant) degree bound• graphs where the tail of the degree distribution is not too heavy• dense graphs

• Techniques for node differentially private algorithms• Methodology for analyzing the accuracy of such

algorithms on realistic networks

Concurrent work on node privacy [Blocki Blum Datta Sheffet 13]

Page 7: Graph Analysis with  Node Differential  Privacy

7

• Node differentially private algorithms for releasing – number of edges– counts of small subgraphs

(e.g., triangles, -triangles, -stars)– degree distribution

• Accuracy analysis of our algorithms for graphs with not-too-heavy-tailed degree distribution: with -decay for constant Notation: fraction of nodes in G of degree

– Every graph satisfies -decay– Natural graphs (e.g., “scale-free” graphs, Erdos-Renyi) satisfy

Our contributions: algorithms

A graph G satisfies -decay if for all

𝒅 𝑡 ⋅𝒅

≤ 𝒕−𝜶

… …

Frequency

Degrees…

……

Page 8: Graph Analysis with  Node Differential  Privacy

8

Our contributions: accuracy analysis• Node differentially private algorithms for releasing

– number of edges– counts of small subgraphs

(e.g., triangles, -triangles, -stars)– degree distribution

• Accuracy analysis of our algorithms for graphs with not-too-heavy-tailed degree distribution: with -decay for constant

– number of edges– counts of small subgraphs (e.g., triangles, -triangles, -stars)– degree distribution

A graph G satisfies -decay if for all

(1+o(1))-approximation

}

……

Page 9: Graph Analysis with  Node Differential  Privacy

9

Previous work ondifferentially private computations on graphs

Edge differentially private algorithms• number of triangles, MST cost [Nissim Raskhodnikova Smith 07]• degree distribution [Hay Rastogi Miklau Suciu 09, Hay Li Miklau Jensen 09,

Karwa Slavkovic 12]• small subgraph counts [Karwa Raskhodnikova Smith Yaroslavtsev 11]• cuts [Blocki Blum Datta Sheffet 12]

Edge private against Bayesian adversary (weaker privacy)• small subgraph counts [Rastogi Hay Miklau Suciu 09]

Node zero-knowledge private (stronger privacy)• average degree, distances to nearest connected, Eulerian,

cycle-free graphs for dense graphs [Gehrke Lui Pass 12]

Page 10: Graph Analysis with  Node Differential  Privacy

10

Challenge for node privacy: high local sensitivity• Local sensitivity [NRS’07]:

• Global sensitivity [DMNS’06]:

For many functions of the data, node is high.• Consider adding a node connected to all other nodes.• Examples: (G)= |E(G)|. (G)= # of s in G.

𝑳𝑺 𝒇 (𝑮 )= max𝐺 ′ :𝐧 𝐞𝐢𝐠𝐡𝐛𝐨𝐫 of 𝐺

|𝑓 (𝐺 )− 𝑓 (𝐺′ )|

Edge is 1; node is for all G. Edge is ; node is |E(G)|.

Page 11: Graph Analysis with  Node Differential  Privacy

11

“Projections” on graphs of small degree

Let = family of all graphs, = family of graphs of degree .Notation. = node over.

= node over .Observation. is low for many useful .Examples: = (compare to = ) = (compare to = )

Idea: ``Project’’ on graphs in for a carefully chosen d << n.

𝓖𝓖𝑑

Goal: privacy for all graphs

Page 12: Graph Analysis with  Node Differential  Privacy

12

Method 1: Lipschitz extensions

• Release via GS framework [DMNS’06]

• Requires designing Lipschitz extension for each function – we base ours on maximum flow and linear and convex programs

𝓖𝓖𝑑

low

high

𝒇 ′= 𝒇

=

A function is a Lipschitz extension of from to if

agrees with on and=

Page 13: Graph Analysis with  Node Differential  Privacy

13

Lipschitz extension of : flow graph

For a graph G=(V, E), define flow graph of G:

Add edge iff .(G) is the value of the maximum flow in this graph.Lemma. (G)/2 is a Lipschitz extension of .

s

1

3

5

1'

3'

5'

t

2

4

2'

4'

𝑑1

𝑑

Page 14: Graph Analysis with  Node Differential  Privacy

14

Lipschitz extension of : flow graph

For a graph G=(V, E), define flow graph of G:

Add edge iff .(G) is the value of the maximum flow in this graph.Lemma. (G)/2 is a Lipschitz extension of .Proof: (1) (G) = for all G (2) = 2⋅

s

1

3

5

1'

3'

5'

t

2

4

2'

4'

𝑑 𝑑1

deg (𝑣 )/¿ deg (𝑣 )/¿1/

Page 15: Graph Analysis with  Node Differential  Privacy

15

Lipschitz extension of : flow graph

For a graph G=(V, E), define flow graph of G:

(G) is the value of the maximum flow in this graph.Lemma. (G)/2 is a Lipschitz extension of .Proof: (1) (G) = for all G (2) = 2⋅ = 2

s

1

3

5

1'

3'

5'

t

2

4

2'

4'

𝑑 𝑑1

6'

𝑑 𝑑

6

Page 16: Graph Analysis with  Node Differential  Privacy

16

For a graph G=([n], E), define LP with variables for all triangles :

(G) is the value of LP.Lemma. (G) is a Lipschitz extension of .

• Can be generalized to other counting queries• Other queries use convex programs

Lipschitz extensions via linear/convex programs

Maximize

for all triangles for all nodes

∑𝑇=△ of 𝐺

𝑥𝑇

¿𝝏𝒅 𝒇 △

Page 17: Graph Analysis with  Node Differential  Privacy

17

Method 2: Generic reduction to privacy over

• Time(A) = Time(B) + O(m+n)• Reduction works for all functions How it works: Truncation T(G) outputs G with nodes of degree removed.• Answer queries on T(G) instead of G

𝓖𝓖𝑑

low

high

𝑻

Input: Algorithm B that is node-DP over Output: Algorithm A that is node-DP over , has accuracy similar to B on “nice” graphs

via Smooth Sensitivity framework [NRS’07] via finding a DP upper bound on [Dwork Lei 09, KRSY’11]

and running any algorithm that is -node-DP over

T query f

+ noiseT(G)

G

S (G)

A

Page 18: Graph Analysis with  Node Differential  Privacy

18

Generic Reduction via Truncation• Truncation T(G) removes

nodes of degree .• On query , answer

How much noise?• Local sensitivity of as a map

• Lemma., where = nodes of degree .

• Global sensitivity is too large.

… …d

Frequency

Degrees

Nodes that determine

𝐿𝑆𝑇 (𝐺 )= max𝐺′ :𝐧𝐞𝐢𝐠𝐡𝐛𝐨𝐫 of 𝐺

𝑑𝑖𝑠𝑡 (𝑇 (𝐺 ) ,𝑇 (𝐺′ ))

Page 19: Graph Analysis with  Node Differential  Privacy

19

Smooth Sensitivity of Truncation

Lemma. is a smooth bound for , computable in time

“Chain rule”: is a smooth bound for

Smooth Sensitivity Framework [NRS ‘07]is a smooth bound on local sensitivity ofif

– for all neighbors

T query f

+ noiseT(G)

G

S (G)

A

Page 20: Graph Analysis with  Node Differential  Privacy

20

Utility of the Truncation Mechanism

Lemma. If we truncate to a random degree in ,

• Application to releasing the degree distribution: an -node differentially private algorithm such that

with probability at least if satisfies -decay for

Utility: If G is -bounded, expected noise magnitude is .

T query f

+ noiseT(G)

G

S (G)

A

Page 21: Graph Analysis with  Node Differential  Privacy

21

Techniques used to obtain our results• Node differentially private algorithms for releasing

– number of edges– counts of small subgraphs (e.g., triangles, -triangles, -stars)– degree distribution

via Lipschitz extensions

} via generic reduction

Page 22: Graph Analysis with  Node Differential  Privacy

22

Conclusions• It is possible to design node differentially private algorithms

with good utility on sparse graphs– One can first test whether the graph is sparse privately

• Directions for future work– Node-private algorithm for releasing cuts– Node-private synthetic graphs– What are the right notions of privacy for graph data?