De- anonymizing Social Networks

45
Community-Enhanced De- anonymization of Online Social Networks By Shirin Nilizadeh, Apu Kapadia & Yong-Yeol Ahn Presented By Elaine Aryeetey

description

De- anonymizing Social Networks. Arvind Narayanan and Vitaly Shmatikov The University of Texas at Austin - by Nafia Malik. Motivation. OSN are sharing sensitive information User willingness to share information and disclosure to unintended parties are not connected. - PowerPoint PPT Presentation

Transcript of De- anonymizing Social Networks

Page 1: De- anonymizing  Social Networks

Community-Enhanced De-anonymization ofOnline Social Networks

By Shirin Nilizadeh, Apu Kapadia & Yong-Yeol Ahn

Presented By Elaine Aryeetey

Page 2: De- anonymizing  Social Networks

• Introduction• Terms & Definitions• Research Purpose• NS Algorithm• De-Anonymization Model• Experiment• Conclusion• Q&A

Content

Page 3: De- anonymizing  Social Networks

Introduction• Online social network very popular • Data released to marketers,

academic research & developing new applications Profit from data while honoring

privacy??Identity of user is removed

Page 4: De- anonymizing  Social Networks

Introduction

Page 5: De- anonymizing  Social Networks

Introduction

Page 6: De- anonymizing  Social Networks

OSN Providers

Page 7: De- anonymizing  Social Networks

Terms & Definitions• De-anonymization: Data mining strategy

in which anonymous data is cross-referenced with other data sources to re-identify the anonymous data source.

• Reference Graph: Network graph used for cross-referencing

• De-anonymized Graph: OSN datasets sold to third parties with user identities removed

• Seed: Known common identities in the two graphs

Page 8: De- anonymizing  Social Networks

Terms & Definitions• Community: Group of nodes (people)

that are densely connected to each other while having lesser connections to nodes residing outside of the community

• Anonymity: As the state of being not identifiable within a set of subjects

• Noise: How reference graph differs the anonymized graph

Page 9: De- anonymizing  Social Networks

Social Network Graph

Page 10: De- anonymizing  Social Networks

Social Network Graph

Page 11: De- anonymizing  Social Networks

Attack Model• Adversary with data may try to de-

anonymize• Has access to two networks,

G{V,E}and , • We focus on the cases where V ≈ V’

and E ≈ E ‘ , i where the vertices and edges are approximately the same

Page 12: De- anonymizing  Social Networks

Some proposed De-Anonymization Approaches• Adversarial models where the

attacker has attribute information from the reference network

Public Flickr Network Anonymized Flickr Network

Page 13: De- anonymizing  Social Networks

Some proposed De-Anonymization Approaches• An attacker may have access to the

social network structure with real identities

Twitter Network Instagram Network

Page 14: De- anonymizing  Social Networks

Research PurposeStudy Problem of de-anonymization

using network alignment techniques.Why? Datasets can be de-anonymized using network alignment’ techniques to map nodes from the reference graph into the anonymized graph

Page 15: De- anonymizing  Social Networks

• DEFINITION 1. A graph, G{V, E} V that represents the users in the network and a set of undirected edges E ⊆ {e = (u, v) : u, v ∈ V } We denote the degree of a node by . Let N = |V | be the total number of nodes in G.• DEFINITION 2. A graph G’s

community structure (C) is a disjoint partition of vertices in G, namely C = {, , . . . , }

Graph Definition

Page 16: De- anonymizing  Social Networks

Won link prediction of anonymized data set on kaggle1. Seed detection: Maps a small

number of users (seeds) between two networks by searching for unique subgraphs

2. Propagation: Expands the set of matched users by incrementally comparing and mapping the neighbors of the previously mapped seeds.

Re-identification algorithm by Narayanan

and Shmatikov (NS)

Page 17: De- anonymizing  Social Networks

Propagation• Randomly picks an already-mapped

node pair (,) ∈ M, where ∈ V , ∈ • Randomly pick node from set of

unmapped neighbors, then compares it with each unmapped node ( ) in the set of unmapped neighbors

• Uniqueness measured by eccentricity

Re-identification algorithm by Narayanan

and Shmatikov (NS)

𝑺 (𝒗 ,𝒗 ′ )=|{𝒘 ,𝒘 ′ ) :𝒘 ∈𝑵 ( 𝒗 ) ;𝒘 ′∈ 𝑵 (𝒗 ′) ;𝒂𝒏𝒅 (𝒘 ,𝒘 ′ )∈𝑴 }∨ ¿√𝒌𝒗 𝒌𝒗 ′

¿

Page 18: De- anonymizing  Social Networks

Community-Enhanced De-Anonymization Model

OverviewCommunity aware mapping algorithm built upon community-blind mapping algorithms1. Community detection (disjoint and

non-overlapping) 2. Community mapping using already-

known seeds and using the network of communities

3. Seed enrichment4. Global propagation

Page 19: De- anonymizing  Social Networks

De-Anonymization ModelCommunity Detection

Reference Network Anonymised Network

Page 20: De- anonymizing  Social Networks

De-Anonymization ModelCommunity mapping using Seed IdentificationCommunities associated with seed nodes can be mapped to each other.

Reference Network

Anonymised Network

Page 21: De- anonymizing  Social Networks

De-Anonymization ModelCommunity mapping using network of communitiesEach community is a node and a weighted edge between two communities represents the number of connections between nodes in two communitiesS(

• Computes the similarity score for each neighbor of the mapped nodes in the right graph

Page 22: De- anonymizing  Social Networks

De-Anonymization ModelSeed Enrichment & Local Propagation: Finding more seeds at community level using distance metrics

Reference Network

Anonymised Network

Page 23: De- anonymizing  Social Networks

De-Anonymization ModelSeed Enrichment & Local PropagationDistance metric

Nodes are matched and identified as seeds if either their degree or their clustering coefficients are similar enough and above a certain eccentricity threshold.

𝑫𝒅(𝒗 𝒊 ,𝒗 𝒋)¿ 𝒅 (𝒗𝒊 )−𝒅 (𝒗 𝒋)∨¿

𝒎𝒂𝒙 (𝒅 (𝒗𝒊 ) ,𝒅 (𝒗 𝒋 ))¿

Page 24: De- anonymizing  Social Networks

De-Anonymization ModelGlobal PropagationApplies the community-blind mapping algorithm to the whole network using all the currently mapped nodes as seeds. Necessary because1. Communities may not be mapped

correctly2. Communities may not be mapped at

all

Page 25: De- anonymizing  Social Networks

De-Anonymization ModelDegree Of Anonymity: Estimating degree of anonymity in anonymized network. G(V, E) (reference) and () (anonymized)

Page 26: De- anonymizing  Social Networks

De-Anonymization ModelAnonymity for a user u ∈ V

Normalized degree of anonymity for user u

Degree of anonymity for the whole system

Page 27: De- anonymizing  Social Networks

De-Anonymization ModelDegree Of Anonymity For Community Blind

P(~|) can be assigned ∈ V 1. If is mapped by the algorithm to z′, the vertices u′ ∈ V ′ can be partitioned as:

a. Mapped node z′, (, z′) ∈ . b. Nodes y′ not mapped to ,(u, y

′)∈! 2. If is not mapped, (, ′) ∈! , we consider entire mapping. P(~|1/||

Page 28: De- anonymizing  Social Networks

De-Anonymization ModelDegree Of Anonymity For Community Aware set of communities in G′ not mapped. c ↔ c′ mapped community by algorithmP ( u∼u′| , ) can be assigned values for all u′∈V′ based on the following cases

Page 29: De- anonymizing  Social Networks

De-Anonymization ModelDegree Of Anonymity For Community Aware1. If u is mapped to z′, and u ∈ c

where c is mapped to c′ and z′∈ c ′. V’ partitioned as

a. Mapped node z′b. Nodes y′ within c′ not mapped to

uc. The remaining nodes r′ not in c′pmap,1a + pmap,1b + pmap,1c = 1

Page 30: De- anonymizing  Social Networks

De-Anonymization ModelDegree Of Anonymity For Community Aware2. If u is mapped to z′, where u ∈ c and c

is mapped to c′ and z′∈! c ′. partitioned as

a. Mapped node z′b. Nodes y′ within c′c. Remaining nodes r′ not in c′

and not mapped to uPmap,2a + pmap,2b + pmap,2c = 1

Page 31: De- anonymizing  Social Networks

De-Anonymization ModelDegree Of Anonymity For Community Aware3. If u is mapped to z′, where u ∈ c and c is not mapped to any community. V’ partitioned as

a. Mapped node z′b. Remaining nodes r′ not

mapped to u4. If u not mapped to any node in G and

community of u not mapped to any community

a. Correct mapping is within entire set

Page 32: De- anonymizing  Social Networks

De-Anonymization ModelDegree Of Anonymity For Community Aware5. If u not mapped to any node in G but community of u, c is mapped to community c’

a. Nodes within b. Remaining nodes not in

Page 33: De- anonymizing  Social Networks

Experiments• An ensemble of networks with the

same number of nodes, edges, noise level, and the type of noise

• Simpler model used• Prepare a copy of the original

network, partially alter its structure, and compare the network alignment performance of two approaches — community-aware and community-blind

Page 34: De- anonymizing  Social Networks

ExperimentsDatasets

Data Set Nodes Edges Date RangeCollaboration network

36,458 171,735 Jan 1, 1995 and Mar, 31 2005.

Twitter mention network (4 partitions)

90,332 377,588 Mar 24, 2012 to Apr 25, 2012

Twitter mention network (9 partitions)

9,745 50,164 Mar 24, 2012 to Apr 25, 2012

Page 35: De- anonymizing  Social Networks

Experiments - Setup• Generate an ensemble of 10 networks

for each noise level• Run InfoMap algorithm• Attacker has less prior knowledge

(about small number of seeds)• Small set of initial seed for community

blind and aware• Performance calculation

Page 36: De- anonymizing  Social Networks

ExperimentsResults: Impact of noise

generate an ensemble of 10 networks foreach of the real-world networks. We run the InfoMap community

Page 37: De- anonymizing  Social Networks

ExperimentsResults• For 10% noise and 16 seeds, A(G) is

0.45 & 0.83 (or anonymity is 6.81 and 12.57 bits) using CA & CB

• In collaboration network, community- aware algorithm is able to correctly map about 15% of users while community-blind algorithm can barely re-identify any user. degree of anonymity is 0.84 and 1 (or anonymity is 12.72 and 15.15 bits)

• Difference between the performance of two algorithms greatly increases when the noise is above 15% and 20%.

Page 38: De- anonymizing  Social Networks

ExperimentsResults: Impact of number of seeds

Page 39: De- anonymizing  Social Networks

ExperimentsResults• In Twitter for seeds number of four, the

CA algorithm successfully re-identifies 77% of users while the CB algorithm only re-identifies about 7% of the users. Degree of anonymity is about 0.13 and 0.97 (and anonymity is 2.14 and 15.97.

• The community-aware algorithm decreases the anonymity by 13.83 additional bits compared to the community-blind algorithm.

Page 40: De- anonymizing  Social Networks

ExperimentsResults: Network Size• Performance difference between the

community-aware and community-blind algorithms is more obvious when the network is bigger

• Having a smaller network, both algorithms perform better in re-identifying users and tolerating noise

• The approach used exhibits slightly higher error rate in some cases but most of them occur when the community-blind approach completely fails, and theirs correctly identifies many more users.

Page 41: De- anonymizing  Social Networks

ExperimentsResults: Overlapped Data Set

Page 42: De- anonymizing  Social Networks

ExperimentsResults• Community-aware algorithm reduces

the degree of anonymity while the community-blind algorithm fails regardless of the number of seeds. (left column)

• Community-blind algorithm fails completely when the noise level is more than 10%, whereas the community- aware algorithm fails when the noise level is more than 30% (right column)

Page 43: De- anonymizing  Social Networks

• Approach doesn’t increase time complexity

• This approach is more robust against added noise to the anonymized data set

• Can perform well with fewer known seeds as well as larger networks.

• Approach is not tied to any specific algorithm; other community detection methods and community-blind network alignment algorithms could be ‘plugged in’ to the framework

Conclusion

Page 44: De- anonymizing  Social Networks

Mapping two networks that are not identical to each other, using the

community-based mapping algorithm is almost always guaranteed to reduce

the anonymity more and find more successful mappings than the

community-blind, global map- ping algorithm.

Conclusion

Page 45: De- anonymizing  Social Networks

Questions & Comments