Findingthe&Most&Appropriate&Auxiliary&Data&& …...ThetruenodeOverlap(isthefractionofthe nodesinF...

1
The true node Overlap is the fraction of the nodes in F aux that appear in F anon Finding the Most Appropriate Auxiliary Data for Social Graph Deanonymization Priya Govindan* Sucheta Soundarajan Tina EliassiRad * [email protected] Selecting the most appropriate auxiliary data for deanonymization of F anon reduces to the problem of predicting the amount of node overlap between F anon and F aux Given no additional info, an adversary can identify graphs with low Overlap with F anon Given labels for some of the nodes, an adversary can predict the Overlap quite well. If also given some seed matches, an adversary can estimate the Lookalikes for the given graph pair Given labels for one graph, an adversary can learn to predict Overlap in another graph. 1. No link structure 2. Nodes have many lookalikes (i.e., similar structural features) 3. DifNicult to distinguish between nodes in F anon that are present in F aux and those that are absent in F aux How can an adversary select the most appropriate auxiliary graph to breach the privacy of individuals in an anonymized graph? The features for each node in F anon are: 1. Node’s degree 2. Avg. degree of node’s neighbors 3. Node’s clustering coefNicient 4. Avg. clustering coefNicient of node’s neighbors 5. # of edges between node’s neighbors 6. # of nodes adjacent to node’s neighbors 7. # of edges outgoing from the node’s neighbors Given: F anon , F aux , matches for m% nodes Predicted Overlap = ratio of predicted ‘present’ labels, by learning a classiNier on labeled nodes Predicted Lookalikes = Average lookalikes of seed matches Given: F anon , F1 aux , F2 aux , labels on all nodes of F1 aux A classiNier is trained on the labels of F1 aux ; is used to predict labels on the nodes of F2 aux Predicted Overlap between F anon and F2 aux = ratio of predicted ‘present’ labels, by the classiNier Lookalikes for a node x in F aux is the number of nodes in F anon that are at least as similar to x, as its matching node x’ in F anon Goal of the adversary Find an auxiliary graph G aux , whose structural feature matrix (F aux ) has high Overlap and low Lookalikes with F anon Lookalikes for a graphpair is the average lookalikes of nodes in F aux , normalized by size of F anon Given: F anon , F aux , and labels (present/absent) for k nodes selected uniformly at random Predicted Overlap = ratio of `present’ labels in k seeds Given: F anon and F aux Predicted !"#$%&' = !"#$%&% !"#$%&' × 1 !"#$%&&" !"#$%&'( ! !"#" , !"#$%&'( ! !"# Case 2: Adversary has labels/ matches for some nodes Case 3: Adversary has labels on another auxiliary graph Case 1: Adversary has no side info DADNANEEOEENE1 2 3 n Nodes Local and Egonet based Features Structural Feature Matrix of the Anonymized Graph Organization Owns Releases Social Graph Anonymize Anonymized graph F anon When the predicted nodeoverlap is low (< 0.5), then the true nodeoverlap is also low (< 0.2) Given 10 seedlabels chosen uniformly at random, overlap can be predicted quite well (MAE = 0.03 ) Given 10% seedmatches, overlap can be predicted very accurately (~100%) when the predicted lookalikes is < 0.1 Most values lie on the diagonal (with RMSE = 0.14 from the 45degree line), thus transferlearning predictions are deemed good estimates of the true overlap where the Maximum Overlap is the minimum of |F anon | and |F aux |, divided by |F aux |

Transcript of Findingthe&Most&Appropriate&Auxiliary&Data&& …...ThetruenodeOverlap(isthefractionofthe nodesinF...

Page 1: Findingthe&Most&Appropriate&Auxiliary&Data&& …...ThetruenodeOverlap(isthefractionofthe nodesinF aux thatappearinF anon $ Findingthe&Most&Appropriate&Auxiliary&Data&& for&Social&Graph&Deanonymization&&

The  true  node  Overlap  is  the  fraction  of  the  nodes  in  Faux  that  appear  in  Fanon  

Finding  the  Most  Appropriate  Auxiliary  Data    for  Social  Graph  Deanonymization    Priya  Govindan*          Sucheta  Soundarajan          Tina  Eliassi-­‐Rad  

*  [email protected]    

•  Selecting  the  most  appropriate  auxiliary  data  for  deanonymization  of    Fanon  reduces  to  the  problem  of  predicting  the  amount  of  node-­‐overlap  between  Fanon  and  Faux  

•  Given  no  additional  info,  an  adversary  can  identify  graphs  with  low  Overlap  with  Fanon  

•  Given  labels  for  some  of  the  nodes,  an  adversary  can  predict  the  Overlap  quite  well.  If  also  given  some  seed  matches,  an  adversary  can  estimate  the  Lookalikes  for  the  given  graph  pair  

•  Given  labels  for  one  graph,  an  adversary  can  learn  to  predict  Overlap  in  another  graph.  

1.  No  link  structure    

2.  Nodes  have  many    lookalikes  (i.e.,  similar    structural  features)  

3.  DifNicult  to  distinguish    between    nodes  in  Fanon  that  are  present  in  Faux  and  those  that  are  absent  in  Faux  

How  can  an  adversary  select  the  most  appropriate  auxiliary  graph  to  breach  the  privacy  of  individuals  in  an  anonymized  graph?    

The  features  for  each  node  in  Fanon  are:  1.  Node’s  degree  2.  Avg.  degree  of  node’s  neighbors  3.  Node’s  clustering  coefNicient  4.  Avg.  clustering  coefNicient  of  node’s  neighbors    

5.  #  of  edges  between  node’s  neighbors  6.  #  of  nodes  adjacent  to  node’s  neighbors  7.  #  of  edges  outgoing  from  the  node’s  neighbors    

Given:  Fanon  ,  Faux  ,  matches  for  m%  nodes  

Predicted  Overlap  =  ratio  of  predicted  ‘present’  labels,  by  learning  a  classiNier  on  labeled  nodes  

Predicted  Lookalikes  =  Average  lookalikes  of  seed  matches  

 

 

Given:  Fanon  ,  F1aux  ,  F2aux  ,  labels  on  all  nodes  of  F1aux  A  classiNier  is  trained  on  the  labels  of  F1aux;  is  used  to  predict  labels  on  the  nodes  of  F2aux    

Predicted  Overlap  between  Fanon  and  F2aux  =  ratio  of  predicted  ‘present’  labels,  by  the  classiNier  

 

   

Lookalikes  for  a  node  x  in  Faux    is  the  number  of  nodes  in  Fanon  that  are  at  least  as  similar  to  x,  as  its  matching  node  x’  in  Fanon  

Goal  of  the  adversary  Find  an  auxiliary  graph  Gaux,  whose    structural  feature  matrix  (Faux)  has  high  Overlap  and  low  Lookalikes  with  Fanon  

Lookalikes  for  a  graph-­‐pair  is  the  average  lookalikes  of  nodes  in  Faux,  normalized  by  size  of  Fanon  

Given:  Fanon  ,  Faux  ,  and  labels  (present/absent)  for  k  nodes  selected  uniformly  at  random  

Predicted  Overlap  =  ratio  of  `present’  labels  in  k  seeds  

Given:  Fanon  and  Faux  

Predicted!!"#$%&' = !"#$%&%!!"#$%&'!×!1− !"#$%&&"! !"#$%&'( !!"#" ,!"#$%&'( !!"# !

Case  2:  Adversary  has  labels/matches  for  some  nodes  

Case  3:  Adversary  has  labels  on  another  auxiliary  graph    

Case  1:  Adversary  has  no  side  info   D egree Clust. Coeff.

Avg Degree of N brs

Avg Clust. Coeff. of N brs

# Edges in E gonet

# Outgoing Edges from E gonet

# Nbrs of E gonet

1 2 3 … … … … … … … … … … … … … … … … … … … … … … … … … n

Structural Feature Table, F

Nodes

Local and Egonet based Features

Structural  Feature  Matrix    of  the  Anonymized  Graph  

Organization  

Owns

Releases

Social  Graph  

Anonym

ize

Anonymized  graph  

Fanon  

When  the  predicted  node-­‐overlap  is    low    (<  0.5),  then  the  true  node-­‐overlap    is  also  low  (<  0.2)  

Given  10  seed-­‐labels  chosen  uniformly  at  random,  overlap  can  be  predicted  quite  well  (MAE  =  0.03  )    

Given  10%  seed-­‐matches,  overlap  can    be  predicted  very  accurately  (~100%)    when  the  predicted  lookalikes  is  <  0.1    

Most  values  lie  on  the  diagonal  (with    RMSE  =  0.14  from  the  45-­‐degree  line),    thus  transfer-­‐learning  predictions  are  deemed  good  estimates  of  the  true  overlap  

where  the  Maximum  Overlap  is  the  minimum  of  |Fanon|  and  |Faux|,  divided  by  |Faux|