My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an...

13
1 My Privacy My Decision: Control of Photo Sharing on Online Social Networks Kaihe Xu, Student Member, IEEE, Yuanxiong Guo, Member, IEEE, Linke Guo, Member, IEEE, Yuguang Fang, Fellow, IEEE, Xiaolin Li, Member, IEEE Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately, it may leak users’ privacy if they are allowed to post, comment, and tag a photo freely. In this paper, we attempt to address this issue and study the scenario when a user shares a photo containing individuals other than himself/herself (termed co-photo for short). To prevent possible privacy leakage of a photo, we design a mechanism to enable each individual in a photo be aware of the posting activity and participate in the decision making on the photo posting. For this purpose, we need an efficient facial recognition (FR) system that can recognize everyone in the photo. However, more demanding privacy setting may limit the number of the photos publicly available to train the FR system. To deal with this dilemma, our mechanism attempts to utilize users’ private photos to design a personalized FR system specifically trained to differentiate possible photo co-owners without leaking their privacy. We also develop a distributed consensus- based method to reduce the computational complexity and protect the private training set. We show that our system is superior to other possible approaches in terms of recognition ratio and efficiency. Our mechanism is implemented as a proof of concept Android application on Facebook’s platform. Index Terms—Social network, photo privacy, secure multi-party computation, support vector machine, collaborative learning 1 I NTRODUCTION O SNS have become integral part of our daily life and has profoundly changed the way we interact with each other, fulfilling our social needs–the needs for social interactions, information sharing, appreciation and respect. It is also this very nature of social media that makes people put more content, including photos, over OSNs without too much thought on the content. However, once something, such as a photo, is posted on- line, it becomes a permanent record, which may be used for purposes we never expect. For example, a posted photo in a party may reveal a connection of a celebrity to a mafia world. Because OSN users may be careless in posting content while the effect is so far-reaching, privacy protection over OSNs becomes an important issue. When more functions such as photo sharing and tagging are added, the situation becomes more compli- cated. For instance, nowadays we can share any photo as we like on OSNs, regardless of whether this photo contains other people (is a co-photo) or not. Currently there is no restriction with sharing of co-photos, on the contrary, social network service providers like Facebook K. Xu, Y. Fang, X. Li are with the Department of Electrical and Computer Engineering, Gainesville, FL 32611, USA. E-mail: xukaihe@ufl.edu, {fang, andyli}@ece.ufl.edu. L. Guo is with Department of Electrical and Computer Engineering, Binghamton University, Binghamton, NY 13902, USA. E-mail: [email protected]. Y. Guo is with School of Electrical and Computer Engineering, Oklahoma State University, Stillwater, OK 74078, USA. E-mail: [email protected]. This work was supported in part by the U.S. National Science Foundation under Grants CNS-1343356 and CNS-1423165. The work of X. Li was partially supported by CCF-1128805 and ACI-1229576. are encouraging users to post co-photos and tag their friends in order to get more people involved. However, what if the co-owners of a photo are not willing to share this photo? Is it a privacy violation to share this co- photo without permission of the co-owners? Should the co-owners have some control over the co-photos? To answer these questions, we need to elaborate on the privacy issues over OSNs. Traditionally, privacy is regarded as a state of social withdrawal. According to Altman’s privacy regulation theory [1][15], privacy is a dialectic and dynamic boundary regulation process where privacy is not static but “a selective control of access to the self or to ones group”. In this theory, “dialectic” refers to the openness and closeness of self to others and “dynamic” means the desired privacy level changes with time according to environment. During the process of privacy regulation, we strive to match the achieved privacy level to the desired one. At the optimum privacy level, we can experience the desired confidence when we want to hide or enjoy the desired attention when we want to show. However, if the actual level of privacy is greater than the desired one, we will feel lonely or isolated; on the other hand, if the actual level of privacy is smaller than the desired one, we will feel over-exposed and vulnerable. Unfortunately, on most current OSNs, users have no control over the information appearing outside their profile page. In [21], Thomas, Grier and Nicol examine how the lack of joint privacy control can inadvertently reveal sensitive information about a user. To mitigate this threat, they suggest Facebook’s privacy model to be adapted to achieve multi-party privacy. Specifically, there should be a mutually acceptable privacy policy This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795 Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Transcript of My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an...

Page 1: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

1

My Privacy My Decision: Control of PhotoSharing on Online Social Networks

Kaihe Xu, Student Member, IEEE, Yuanxiong Guo, Member, IEEE, Linke Guo, Member, IEEE,Yuguang Fang, Fellow, IEEE, Xiaolin Li, Member, IEEE

Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately, it may leak users’privacy if they are allowed to post, comment, and tag a photo freely. In this paper, we attempt to address this issue and study thescenario when a user shares a photo containing individuals other than himself/herself (termed co-photo for short). To prevent possibleprivacy leakage of a photo, we design a mechanism to enable each individual in a photo be aware of the posting activity and participatein the decision making on the photo posting. For this purpose, we need an efficient facial recognition (FR) system that can recognizeeveryone in the photo. However, more demanding privacy setting may limit the number of the photos publicly available to train theFR system. To deal with this dilemma, our mechanism attempts to utilize users’ private photos to design a personalized FR systemspecifically trained to differentiate possible photo co-owners without leaking their privacy. We also develop a distributed consensus-based method to reduce the computational complexity and protect the private training set. We show that our system is superior toother possible approaches in terms of recognition ratio and efficiency. Our mechanism is implemented as a proof of concept Androidapplication on Facebook’s platform.

Index Terms—Social network, photo privacy, secure multi-party computation, support vector machine, collaborative learning

F

1 INTRODUCTION

O SNS have become integral part of our daily lifeand has profoundly changed the way we interact

with each other, fulfilling our social needs–the needsfor social interactions, information sharing, appreciationand respect. It is also this very nature of social mediathat makes people put more content, including photos,over OSNs without too much thought on the content.However, once something, such as a photo, is posted on-line, it becomes a permanent record, which may be usedfor purposes we never expect. For example, a postedphoto in a party may reveal a connection of a celebrityto a mafia world. Because OSN users may be carelessin posting content while the effect is so far-reaching,privacy protection over OSNs becomes an importantissue. When more functions such as photo sharing andtagging are added, the situation becomes more compli-cated. For instance, nowadays we can share any photoas we like on OSNs, regardless of whether this photocontains other people (is a co-photo) or not. Currentlythere is no restriction with sharing of co-photos, on thecontrary, social network service providers like Facebook

• K. Xu, Y. Fang, X. Li are with the Department of Electrical and ComputerEngineering, Gainesville, FL 32611, USA.E-mail: [email protected], {fang, andyli}@ece.ufl.edu.

• L. Guo is with Department of Electrical and Computer Engineering,Binghamton University, Binghamton, NY 13902, USA.E-mail: [email protected].

• Y. Guo is with School of Electrical and Computer Engineering, OklahomaState University, Stillwater, OK 74078, USA.E-mail: [email protected].

This work was supported in part by the U.S. National Science Foundationunder Grants CNS-1343356 and CNS-1423165. The work of X. Li waspartially supported by CCF-1128805 and ACI-1229576.

are encouraging users to post co-photos and tag theirfriends in order to get more people involved. However,what if the co-owners of a photo are not willing to sharethis photo? Is it a privacy violation to share this co-photo without permission of the co-owners? Should theco-owners have some control over the co-photos?

To answer these questions, we need to elaborate onthe privacy issues over OSNs. Traditionally, privacy isregarded as a state of social withdrawal. According toAltman’s privacy regulation theory [1][15], privacy isa dialectic and dynamic boundary regulation processwhere privacy is not static but “a selective control of accessto the self or to ones group”. In this theory, “dialectic”refers to the openness and closeness of self to othersand “dynamic” means the desired privacy level changeswith time according to environment. During the processof privacy regulation, we strive to match the achievedprivacy level to the desired one. At the optimum privacylevel, we can experience the desired confidence whenwe want to hide or enjoy the desired attention when wewant to show. However, if the actual level of privacyis greater than the desired one, we will feel lonely orisolated; on the other hand, if the actual level of privacyis smaller than the desired one, we will feel over-exposedand vulnerable.

Unfortunately, on most current OSNs, users have nocontrol over the information appearing outside theirprofile page. In [21], Thomas, Grier and Nicol examinehow the lack of joint privacy control can inadvertentlyreveal sensitive information about a user. To mitigatethis threat, they suggest Facebook’s privacy model tobe adapted to achieve multi-party privacy. Specifically,there should be a mutually acceptable privacy policy

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 2: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

2

determining which information should be posted andshared. To achieve this, OSN users are asked to specifya privacy policy and a exposure policy. Privacy policy isused to define group of users that are able to access aphoto when being the owner, while exposure policy isused to define group of users that are able to access whenbeing a co-owner. These two policies will together mutu-ally specify how a co-photo could be accessed. However,before examining these policies, finding identities in co-photos is the first and probably the most import step. Inthe rest of this paper we will focus on a RF engine tofind identities on a co-photo.

FR problems over OSNs are easier than a regularFR problem because the contextual information of OSNcould be utilized for FR[20]. For example, people show-ing up together on a co-photo are very likely to befriends on OSNs, and thus, the FR engine could betrained to recognize social friends (people in social circle)specifically. Training techniques could be adapted fromthe off-the-shelf FR training algorithms, but how toget enough training samples is tricky. FR engine withhigher recognition ratio demands more training sam-ples (photos of each specific person), but online photoresources are often insufficient. Users care about privacyare unlikely to put photos online. Perhaps it is exactlythose people who really want to have a photo privacyprotection scheme. To break this dilemma, we proposea privacy-preserving distributed collaborative trainingsystem as our FR engine. In our system, we ask eachof our users to establish a private photo set of theirown. We use these private photos to build personal FRengines based on the specific social context and promisethat during FR training, only the discriminating rules arerevealed but nothing else.

With the training data (private photo sets) distributedamong users, this problem could be formulated as atypical secure multi-party computation problem. Intu-itively, we may apply cryptographic technique to protectthe private photos, but the computational and commu-nication cost may pose a serious problem for a largeOSN. In this paper, we propose a novel consensus-based approach to achieve efficiency and privacy atthe same time. The idea is to let each user only dealwith his/her private photo set as the local train dataand use it to learn out the local training result. Afterthis, local training results are exchanged among users toform a global knowledge. In the next round, each userlearns over his/hers local data again by taking the globalknowledge as a reference. Finally the information will bespread over users and consensus could be reached. Weshow later that by performing local learning in parallel,efficiency and privacy could be achieved at the sametime.

Comparing with previous works, our contributions areas follows.

1) In our paper, the potential owners of shared items(photos) can be automatically identified

with/without user-generated tags.2) We propose to use private photos in a

privacy-preserving manner and social contexts toderive a personal FR engine for any particularuser.

3) Orthogonal to the traditional cryptographicsolution, we propose a consensus-based methodto achieve privacy and efficiency.

The rest of this paper is organized as follows. InSection 2, we review the related works. Section 3 presentsthe formulation of our problem and the assumptions inour study. In Section 4, we give a detailed descriptionof the proposed mechanism, followed by Section 5,conducting performance analysis of the proposed mech-anism. In Section 6, we describe our implementation onAndroid platform with the Facebook SDK and the exten-sive experiments to validate the accuracy and efficiencyof our system. Finally, Section 7 concludes the paper.

2 RELATED WORK

In [12], Mavridis et al. study the statistics of photosharing on social networks and propose a three realmsmodel: “a social realm, in which identities are entities,and friendship a relation; second, a visual sensory realm,of which faces are entities, and co-occurrence in imagesa relation; and third, a physical realm, in which bod-ies belong, with physical proximity being a relation.”They show that any two realms are highly correlated.Given information in one realm, we can give a goodestimation of the relationship of the other realm. In[19], [20], Stone et al., for the first time, propose to usethe contextual information in the social realm and co-photo relationship to do automatic FR. They define apairwise conditional random field (CRF) model to findthe optimal joint labeling by maximizing the conditionaldensity. Specifically, they use the existing labeled photosas the training samples and combine the photo co-occurrence statistics and baseline FR score to improvethe accuracy of face annotation. In [6], Choi et al. discussthe difference between the traditional FR system and theFR system that is designed specifically for OSNs. Theypoint out that a customized FR system for each user isexpected to be much more accurate in his/her own photocollections. A similar work is done in [5], in which Choiet al. propose to use multiple personal FR engines towork collaboratively to improve the recognition ratio.Specifically, they use the social context to select the suit-able FR engines that contain the identity of the queriedface image with high probability.

While intensive research interests lie in FR enginesrefined by social connections, the security and privacyissues in OSNs also emerge as important and crucialresearch topics. In [17], the privacy leakage caused bythe poor access control of shared data in Web 2.0 iswell studied. To deal with this issue, access controlschemes are proposed in [13] and [4]. In these works,flexible access control schemes based on social contexts

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 3: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

3

are investigated. However, in current OSNs, when post-ing a photo, a user is not required to ask for permis-sions of other users appearing in the photo. In [2],Besmer and Lipford study the privacy concerns on photosharing and tagging features on Facebook. A surveywas conducted in [2] to study the effectiveness of theexisting countermeasure of untagging and shows thatthis countermeasure is far from satisfactory: users areworrying about offending their friends when untagging.As a result, they provide a tool to enable users torestrict others from seeing their photos when posted asa complementary strategy to protect privacy. However,this method will introduce a large number of manualtasks for end users. In [18], Squicciarini et al. propose agame-theoretic scheme in which the privacy policies arecollaboratively enforced over the shared data. Each useris able to define his/her privacy policy and exposurepolicy. Only when a photo is processed with owner’sprivacy policy and co-owner’s exposure policy couldit be posted. However, the co-owners of a co-photocannot be determined automatically, instead, potentialco-owners could only be identified by using the taggingfeatures on the current OSNs.

3 PROBLEM STATEMENT AND HYPOTHESES

3.1 Privacy policy and exposure policyIn this paper, we assume that each user i has a privacypolicy Pi(x) and a exposure policy Vi(x) for a specificphoto x. The privacy policy Pi(x) indicates the set ofusers who can access photo x and exposure policy Vi(x)indicates the set of users who can access x when user iis involved. After people on co-photo x are recognizedwith our algorithm as a set I, the set of users who followboth the privacy policy and exposure policy could becalculated by:

S = Pi(x)⋂k∈I

Vk(x) (1)

We assume that our users have defined their privacypolicy and exposure policy and these policies are mod-ifiable. The exposure policy is treated as a private datathat shall not be revealed, and a secure set intersectionprotocol [11] is used to find the access policy S in 1.After the access policy S is established, the co-photo xwill be shared with users in S.

3.2 FR with social contextsAn FR engine for a large-scale social network mayrequire discriminating millions of individuals. It seemsto be a daunting task that could never be accomplished.However, when we decompose it into several personalFR engines, the situation will change for better. Socialcontexts contains a large amount of useful informa-tion which could be utilized as a priori knowledge tohelp the facial recognition[19]. In [12], Mavridis, Kazmiand Toulis develop a three-realm model to study facial

Fig. 1: A friendship graph in visual sensory realm

recognition problems on OSN photos. The three realmsinclude a social realm, in which identities are entities,and friendship a relation; a visual sensory realm, ofwhich faces are entities and occurrence in images arelation; and a physical realm, in which bodies belong,with physical proximity being a relation. It is shownthat the relationship in the social realm and physicalrealm are highly correlated with the relationship in thevisual sensory realm. In this manner, we can use thesocial context to construct a priori distribution Pi overthe identities on the co-photos for user i. With this prioridistribution, while trying to recognize people on the co-photos, the FR engine could focus on a small portionof “close” friends (friends who are geographically closeand interacting frequently with user i).

Fig.1 shows a relational graph in the visual sensoryrealm. We assume that for user i, we can define a thresh-old on the priori distribution Pi to get a small group ofidentities consisting of i and his one-hop neighbors (e.g.,close friends), denoted as the neighborhood Bi. Then ourgoal for the personal FR at user i is to differentiate usersin Bi. For example, in Fig.1, if Bob has a co-photo, weassume that users appear in the photo are among the setof {Divid, Eve, Tom, Bob}.

3.3 FR systemWe assume that useri has a photo set of size Ni ofhimself/herself as his/her private training samples (say,stored on his/her own device such as smart phone).From the private photo set, a user detects and extractsthe faces on each photo with the standard face detectionmethod [23]. For each face, a vector of size p is extractedas the feature vector. Then, for user i, his/her privatetraining set could be written as xi of size Ni × p. Inthe rest of this paper, we use one record and one photointerchangeably to refer one row in xi.

With the private training set, each user will have apersonal FR engine to identify his/her one-hop neigh-bors. The personal FR can be constructed as a multi-classclassification system, where each class is correspondingto one user (himself/herself or one friend). In the restof this paper, we use one class interchangeably with theappearance of one user. In the realm of machine learning,usually a multi-class classification system is constructedby combining several binary classifiers together with theone of the following strategies[7]:• One-against-all method uses winner-take-all strat-

egy. It constructs n binary classifiers for each of

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 4: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

4

n classes. The goal of each binary classifier is todistinguish one class from the rest with a deci-sion function. Hence, the ith decision function fiis trained by taking records from user i as positivesamples and the records from all the other usersas the negative samples. When a testing record xcomes, if fi concludes that it belongs to class i, x islabeled as class i.

• One-against-one method uses max-voting-winstrategy. It constructs n(n − 1)/2 binary classifiers,in which each classifier is aimed to distinguish twoclasses. The idea is that if we can distinguish anytwo classes, then we can identify any of them.Hence, classifier uij is constructed by taking recordsfrom i as positive samples and records from j asnegative ones. Later on when we are trying toidentify a test record x, if uij concludes that x is inclass i, then the vote of class i is added by one. Aftertesting all the n(n− 1)/2 classifiers, x is assigned tothe class with the largest voting value.

However, no matter which method we use, it requiresa centralized node to access all the training samplesfrom each class, which is conflicting with our promisethat the private training samples will not be disclosedduring the whole process. In the rest of this paper wewill focus on how to build the personal FR engineswithout disclosing the private photo sets. Notice thatthe identification criterion could be asymmetric betweendifferent personal FR engines, which means that the wayhow David finds out Bob and how Bob finds out Davidare not the same as shown in Fig. 1. The reason is that,for Bob, his personal FR engine only knows how to findout David from the candidate set (”suspects” for short)of {Bob, David, Eve, Tom}, while for David, his personalFR only knows how to find out Bob from the suspects of{Alice, Bob, David, Tom}. In other words, with differentfriend sets (friendship graph) at each node, the personalFR engines are trained with different negative trainingsamples.

4 SYSTEM OVERVIEWIn this section, we present the detailed description of oursystem. Generally speaking, the consensus result couldbe achieve by iteratively refining the local training re-sult: firstly, each user performs local supervised learningonly with its own training set, then the local resultsare exchanged among collaborators to form a globalknowledge. In the next round, the global knowledge isused to regularize the local training until convergence. Inthis section, firstly, we use a toy system with two usersto demonstrate the principle of our design. Then, wediscuss how to build a general personal FR with morethan two users. Finally, we discuss the scalability of ourdesign at the large scale of OSNs.

4.1 A toy systemSuppose there are only two users user1 and user2 withprivate training data x1 and x2. In order to distinguish

them, we only need to find a binary decision functionf(·). When a probing sample x comes, if f(x) > 0,x belongs to user1 and vice versa. In this paper, thedecision function is determined by the support vectormachine as f(x) = K(w, x) + b, where K(·, ·) is thekernel function and we use linear kernel for the ease ofpresentation. For the training samples xi of size Ni × p,where Ni is the number of training samples, and p is thenumber of features in each training sample. Denote u asu = [w, b] of size (p + 1) × 1, Xi as Xi = [xi, 1] of sizeNi×(p+1) and Yi is a Ni×Ni diagonal matrix indicatingclass labels of samples in Xi on its diagonal elements.Let X1 denote the positive sample set, X2 the negativesample set and a diagonal matrix Π is constructed asa (p + 1) × (p + 1) diagonal matrix with Π(i, i) = 1 fori = 1, 2, . . . , p and Π(p+ 1, p+ 1) = 0. Then, the decisionfunction f(·) can be obtained by solving the followingproblem:

minu,ξ1≥0,ξ2≥0

1

2uTΠu+ C‖ξ1‖+ C‖ξ2‖

s.t. Y1X1u ≥ 1− ξ1,Y2X2u ≥ 1− ξ2.

(2)

In problem (2), by minimizing 12u

TΠu, we find u thatmaximizes the margin between the positive and negativetraining set. The constraints are used to ensure that thedecision function satisfies the training set. ξi is a setof slack variables in case the training samples are notseparable. If a certain positive sample X1k cannot makeX1ku > 1, a positive slack variable ξ1k is assigned sothat X1pu > 1 − ξ1p. Meanwhile, a penalty of Cξ1p isassigned to the objective function, where C is the user-chosen penalty parameter and vice versa for the negativesamples. Notice that the constraints are private trainingdata which are not available for a centralized SVMsolver. Our approach is to split (2) into two subproblemswith their own constraints and an additional constraintu1 = u2 as:

minu1,ξ1≥0

1

4uT1 Πu1 + C‖ξ1‖

s.t. Y1X1u1 ≥ 1− ξ1,u1 = u2, (3a)

minu2,ξ2≥0

1

4uT2 Πu2 + C‖ξ2‖

s.t. Y2X2u2 ≥ 1− ξ2,u1 = u2. (3b)

We can easily show that problem (3) is an identicaltransformation of problem (2) by substituting u = u1 =u2 and putting together the constraints[8]. Problem (3a)and (3b) could be assigned to user1 and user2 accord-ingly and be solved by alternatively optimize u1 andu2. ut1 and ut2 might be very different at the first fewiterations, however, they will slowly reach the consensusas t grows.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 5: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

5

To solve this problem, firstly, we need to find theaugmented Lagrange function with the Language mul-tipliers of {λi} and {αi} as:

L({ui}, {λi}, {αi}) =1

4

∑i=1,2

uTi Πui +∑

i,j=1,2

αTi (ui − uj)

−∑i=1,2

λTi (YiXiui − 1 + ξi) +∑

i,j=1,2

ρ

2‖ui − uj‖2.

(4)In Eq. (4), we omit the Language multipliers of the

slack variables, which can be canceled out in the Wolfedual problem. Here, ρ

2‖ui − uj‖2 is the regularizationterm, which has two roles: (1) It eliminates the conditionthat L is differentiable such that the solution convergesunder far more general conditions. (2) By adjusting theparameter of ρ, we can trade off the speed of conver-gence for better steady-state approximation[8].L could then be minimized in a cyclic fashion: at each

iteration, L is minimized with respect to one variablewhile keeping all other variables fixed. According toAlternating Direction Method of Multipliers (ADMM)[3],update of the variables at each iteration t + 1 could besummarized as follows,

ut+1i = argmin

ui

L(ui, {utj 6=i}, {λti}, {αti});

αt+1i =αti + ρ(ut+1

i − ut+1j ).

(5)

In (5), ui is calculated through the Wolfe dual problem.User i is could compute ut+1

i locally, because it is onlyrelated to Xi, Yi, λti and utj but have nothing to do withXj and Yj . This data isolation property is the essenceof our secure collaborative learning model and the de-tailed security analysis will be presented in Section 5).With KKT conditions and Wolfe dual, detailed iterativeupdates are listed in Eq. (6).

λt+1i = argmax

{−λTi YiXi(Π + 4ρI)−1XT

i Yiλi

+[1 + 2YiXi(Π + 4ρI)−1(αti − αtj − 2ρutj)]Tλi}

ut+1i = 2(Π + 4ρI)−1[XT

i Yiλt+1i − (αti − αtj) + 2ρutj ]

αt+1i = αti + ρ(ut+1

i − ut+1j ).

(6)Generally, the proposed distributed training scheme

of a toy system could be summarized in Algorithm 1.In this Algorithm, uij = F (Xi, Xj) is the computationof classifier uij with Xi as positive training samples andXj as negative training samples. qd(A,B) is a standardquadratic programming solver that gives the optimalsolution of max{− 1

2xTAx + BTx}, and notice that we

omit the constraint of 0 ≤ λ ≤ C for brevity. thresholdis the user-defined stopping criteria, a larger thresholdresults with fewer iterations while a larger discrepancybetween ui and uj . Theorem 4.1 asserts the convergenceof Algorithm 1.

Theorem 4.1: By following Algorithm 1, the resultinguij will converge to the feasible solution u of problem(2) after a certain number of iterations.

Algorithm 1: Iterative Method to Compute uijInput: Positive samples Xi, Negative samples Xj

Output: The classifier uij(·)Initial λ, u0i , u

0j , α

0i , α

0j as vectors of all zeros;

A = 2Xi(Π + 4ρI)−1XTi ;

for t = 0, 1, 2... doB = 1 + 2Xi(Π + 4ρI)−1(αti − αtj − 2ρutj);λt+1 = qd(A,B);ut+1i = 2(Π + 4ρI)−1[XT

i λt+1 − (αti − αtj) + 2ρutj ];

if |ut+1i − uti| <threshold then

break;else

αt+1i = αti + ρ(ut+1

i − ut+1j );

send ut+1i and αt+1

i to user j;request ut+1

j and αt+1j from user j;

endendreturn ut+1

i ;

Proof: For the toy system, problem (2) could bewritten in a general form as follows:

minv,u

F1(u1) + F2(u2)

s.t. Au1 = u2,

u1 ∈ S1, u2 ∈ S2.

(7)

In problem (7), F1(·) is the local problem for user1 andF2(·) is the local problem for user2. A is an identitymatrix to ensure that u1 = u2. It is proved in [8] and[3] that the convergence of problem (7) is guaranteed aslong as one of the following two conditions is true: S1is bounded; or ATA is nonsingular. In our scheme, A isan identity matrix, hence, u1 and u2 will converge to thesame optimal value.

4.2 OSNs with social contextsIn the previous subsection, we show how to build abinary classifier in a toy system with two users. Whenconsidering the practical scenario, each user may havemore than one friend, and thus multi-class classifiersare required. Generally speaking, a multi-class classi-fier is achieved by using one of the two strategies tocombine several binary classifiers: one-against-all andone-against-one. In this section, we analyze their perfor-mance and present mechanisms with the proper strategy.

4.2.1 Two strategies and classifier reuseFirst, let us introduce some notations: we denote user ias the initiator when Xi is used as the positive trainingsamples and user j as the cooperator when Xj is usedas negative samples. We denote a node i in friendshipgraph and its one-hop neighbors as Bi: the neighborhoodof i. A personal FR engine for user i should be trained todistinguish users in Bi. We use a node i on the friendshipgraph interchangeably with user i.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 6: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

6

(a) In the neighborhood of David

(b) In the neighborhood of Bob

Fig. 2: A one-hop neighbor could also be a two-hopneighbor

For the strategy of one-against-all, each user j in Biare associated with a binary classifier fj(·) by making jinitiator and {k ∈ Bi, k 6= j} cooperators. Denote Di thedegree of user i, there will be Di + 1 classifiers and eachclassifier involves Di + 1 users. The cost to build oneclassifier is hence O

(nεTaD̄

), where O

(nε)

is the cost oflocal SVM training with n training items, Ta is numberof iterations to converge and D̄ is the average degreefor a node in friendship graph. Hence, total cost in oneneighborhood is O

(nεTaD̄2

).

For the strategy of one-against-one, 12D̄(D̄ + 1) toy

systems need to be trained. The cost for each toy systemwith 2 users is O

(nεTo

), where O

(nε)

is the cost of localSVM training with n training items, To is number of iter-ations to converge. Hence, total cost in one neighborhoodis O

(nεToD̄2

).

Comparing these two strategies, we can see that theonly difference is the term of Ta and To, average numberof iterations needed to converge for systems with D̄users and 2 users, respectively. Intuitively, To should bemuch smaller than Ta, because less data is considered.Another factor makes one-against-one strategy appear-ing is that we could reuse classifiers among mutualfriends. For example, in Fig. 2, Tom, Bob and David aremutual friends. When working in David’s neighborhood,we need to build classifier of {Tom, Bob} and later on,when working in Bob’s neighborhood, we need to buildanother classifier of {Tom, Bob}. We know that these twoclassifiers are identical and hence could be reused. Thefactor of classifier reuse is highly depend on number ofcomplete subgraphs, which seems to be very commonover OSNs. According to the data research team fromFacebook, the degrees of separation on Facebook is 3.74,meaning that the average distance between any twopeople is only 3.74 hops on friendship graph. This meansfriendship graph contains large numbers of completesubgraphs. In comparison, classifiers cannot be reusedin the one-against-all strategy because they are trainedwith different cooperators. The procedure to establish

classifiers considering classifier reuse is summarized inAlgorithm 2.

Algorithm 2: Classifier Computation AlgorithmInitial as Ci = ∅,∀i ∈ N ;for i ∈ N do

for j ∈ Bi doif uij * Ci then

uij = F (Xi, Xj);uji = −uij ;Ci = {uij , Ci}; Cj = {uji, Cj};

endend

endfor i ∈ N do

for k, j ∈ Bi || k 6= j doif ukj * Ck then

ukj = F (Xk, Xj);else

Request ujk from user j;endCi = {ujk, Ci};

endend

According to Algorithm 2, there are two steps to buildclassifiers for each neighborhood: firstly find classifiersof {self, friend} for each node, then find classifiers of{friend, friend}. Notice that the second step is tricky,because the friend list of the neighborhood owner couldbe revealed to all his/her friends. On the other hand,friends may not know how to communicate with eachother. For this consideration, when building classifiersof {friend, friend}, all the local training results aresend to the neighborhood owner, who will coordinatethe collaborative training processes by forwarding localtraining results to right collaborators. In this manner,friends need not to know who they are working withand how to talk with them.

4.2.2 Stranger detection

When Algorithm 2 is done, user i is able to differentiateall his friend with classifiers in Ci. The only thing remainsto assemble binary classifiers to be a multi-class classifier.In this paper, we construct a decision tree by arrang-ing binary classifiers similarly to the DAGSVM[16]. Inthe original DAGSVM, the tree nodes contains binaryclassifiers. Decisions of left or right is made based onoutput of the tree nodes and class labels are stored at leafnodes. But a limitation of DAGSVM is that it is basedon a strong assumption: users on a co-photo are friends, inother words, DAGSVM will always classify x to be oneof the friends. In reality, this is not the case, we should beprepared of strangers. For example, Bob has a co-photowith him and Alice at a popular attraction spot. It isvery likely that random people could be captured in the

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 7: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

7

David

Tom Alice

Bob

uDT uTA uAB

uDA uTB

uDB

uTA

uTB

uTD

uBA

uDA

uTA

Strangers

Fig. 3: Improved decision tree

photo. False positive errors could be generated if we tryto classify random people as if we know them.

However, detection of strangers (or outliers) is a well-known difficult problem, because there is no such a classof stranger has been involved in training. Intuitively,this stranger class includes everyone other than thesein a certain neighborhood. It requires tons of trainingsamples to construct such a all-embracing class and wesimply cannot afford to it. However, we observed thata photo of a stranger could make the binary classifiersto output contradictory decisions. For example, in Fig.3,a test sample x of Alice cannot make fAD(x) > 0 andfTA(x) > 0 at the same, otherwise, x belongs to Aliceand doesn’t belongs to Alice at the same time. Basically,we propose a stranger rejection mechanism based onthe following two assumptions: (1) If a certain classparticipates in the training process, its probing samplesnever generate contradictory decisions. (2) If a certain classdoes not participate in the training process, its probingsamples will make the classifiers to output unpredictabledecisions and may result with contradictory decisions.

Fig.3 illustrate how DAGSVM is extended to capturecontradictory decisions by adding more tree nodes. In thisextended decision tree, if a probing sample passes allthe classifiers of one class, it is assigned to this class,otherwise, it is classified to be a stranger. Theorem 4.2states correctness of this design.

Theorem 4.2: Assuming a probing sample x istraveling through the DAG decision tree and gets thedestination class i, then the only possible class for x topass all its tests is i.

Proof: The DAG decision tree is based on the exclu-sive method. Each tree node test will rule out a wrongclass. Then, at the leaf node, only one possible classremains. If a sample x exists such that the DAG decisionis i while it could pass all the tests of class j, we willget the conflicting result that fij(x) > 0 and fij(x) < 0.Hence, the only possible class of x is the DAG decisionclass i.

Notice that the proposed stranger detection schemebrings trivial extra storage cost and computation cost totravel through the tree, which are still O

(D̄2)

and O(D̄),

respectively.

4.3 ScalabilityTo make our system design scalable, we need to considerthe following two cases: (1) The private photo set Xi andthe corresponding labels Yi may change over time as Xt

i

and Y ti . This happens when the appearance of user i haschanged, or the photos in the training set are modified(adding new images or deleting existing images). (2) Thefriendship graph may change over time. For example,when a user moves to another city for work or study,new friends should be added to friendship graph.

For the first case, when Xi is changing, then all theclassifiers related to Xi in Algorithm 2 should change.We can modify the iterations in (6) as

λt+1i = argmax {−λTi Y t+1

i Xt+1i (Π + 4ρI)−1Xt+1

i Y t+1i λi

+ [1 + 2Y t+1i Xt+1

i (Π + 4ρI)−1(αti − αtj − 2ρutj)]Tλi}

ut+1i = 2(Π + 4ρI)−1[Y t+1

i Xt+1i λt+1

i − (αti − αtj) + 2ρutj ]

αt+1i = αti + ρ(ut+1

i − ut+1j ).

(8)In Eq.(8), the private training set Xi now is a variable

over time. At each iteration t, local training results arecalculated with the current training set Xt

i . Intuitively,the training set Xi varies in a much slower rate thanthe iterative updates of parameters. In other words,we assume that Xi remain invariant across a sufficientnumber of iterations, during which the resulting uij willclosely track the optimal classifier between the trainingsets.

If the social circle of a user is changed, his/her per-sonal FR engine should also be modified. If this modifi-cation is made by adding a new friend, new classifiersshould be computed or reused by following Algorithm.2. After that, the existing decision tree could be extendedby adding tree nodes with these new classifiers. If themodification is generated by removing the friendship,one just need to remove all the corresponding classifiersand reassemble his/her decision tree.

5 PERFORMANCE ANALYSIS

In this section, we present the performance analysisof our scheme. In the first subsection, we analyze thecomputational complexity of our FR system and compar-isons with other two possible approaches. In the secondsubsection, the detailed privacy analysis is presented.

5.1 Benefits of our designIn this subsection, we describe the expected computa-tional complexity of three approaches: centralized so-lution, one-against-all strategy and our approach. Thenotations involved are: N is the number nodes and D̄is the average degree of friendship graph, n and p areparameters of private training data X , denote numberof training records and length of each training record,respectively.• Centralized approach has a centralized FR engine

in charge of recognizing all users over a large OSN.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 8: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

8

To protect the training photos, a privacy-preservingSVM training method [25] is used. In this approach,Yu et al. use secure dot product protocols [9] toevaluate kernel matrix of SVM. The computationalcost for the secure dot product protocol based onhomomorphic encryption is O

(p logm

), where m is

the value of the exponent. SVM kernel matrix iscomposed of (Nn)2 dot products, hence the totalcost is O

(N ε+1nε

)+ O

(N3n2p logm

), where ε is a

factor between 2 and 3.• One-against-all approach decompose the friend-

ship graph and use our proposed consensus-basedtraining method to perform collaborative training.As we discussed in the previous section, at eachiteration, local SVM problem only deals with atraining set of size n× p. Hence, computational costis O

(Ta(nε + n2p)

)≈ O

(nTaε

). There are D̄2 local

training problems to find D̄ classifiers for one neigh-borhood. Hence the total cost for N neighborhoodsis O

(ND̄2Tanε

).

• One-against-one: The analysis of this approach issimilar to the one-against-all approach, except thatthe average rounds in one training process shouldbe much less, due to the fact that there are only twoparticipants instead of D̄ + 1 ones. If we considerthe complete subgraph in the friendship graph, theexpected cost should be less than O

(ND̄2Tonε

).

A theoretical comparison of the three approaches arelisted in Table. 1. We can see that the distributed so-lutions with context information can greatly reduce thecomputation. Meanwhile, among the two distributedapproaches, the proposed approach should be muchmore efficient than using the one-against-all approach.In Section 6, we will further demonstrate one-against-one strategy is much more efficient than one-against-allstrategy with numerical results.

complexity Privacy-preserving

Strangerdetection

Centralized O(Nε+1nε

)X ×

OVA O(ND̄2Tanε

)X ×

Our approach O(ND̄2Tonε

)X X

TABLE 1: Theoretical comparison of the threeapproaches

5.2 Security analysisIn this paper, private information of a user is consideredas his/hers privacy and exposure policies; friend listand the private training data set Xa. In the rest of thissubsection, we show how these private information areprotected from a semi-honest adversary.

Privacy and exposure policies: In 1, access policy ofx is determined by the intersection of owner’s privacypolicy and co-owners’ exposure policy. In [10], Kissnerand Song proposed privacy-preserving set operations in-cluding set intersection by employing the mathematical

properties of polynomials. We can directly adopt theirscheme to find the access policy S.

Friend list: Basically, in our proposed one-against-onestrategy a user needs to establish classifiers between{self, friend} and {friend, friend} also known as the twoloops in Algorithm. 2. During the first loop, there is noprivacy concerns of Alice’s friend list because friendshipgraph is undirected. However, in the second loop, Aliceneed to coordinate all her friends to build classifiersbetween them. According to our protocol, her friendsonly communicate with her and they have no idea ofwhat they are computing for.

Friend list could also be revealed during the classifierreuse stage. For example, suppose Alice want to findubt between Bob and Tom, which has already beencomputed by Bob. Alice will first query user k to seeif ukj has already been computed. If this query is madein plaintext, Bob immediately knows Alice and Bob arefriends. To address this problem, Alice will first makea list for desired classifiers use private set operations in[10] to query against her neighbors’ classifiers lists oneby one. Classifiers in the intersection part will be reused.Notice that even with this protection, mutual friendsbetween Alice and Bob are still revealed to Bob, thisis the trade-off we made for classifiers reuse. Actually,OSNs like Facebook shows mutual friends anyway andthere is no such privacy setting as “hide mutual friends”.

Private training sets: We assume that Alice and Bob ina toy system are semi-honest. They will follow the pro-tocol but are so curious that they store all the exchangeddata and try to trace back others’ private training sets.The analysis is done on behalf of Alice (Alice stores allthe data and tries to find the private photo set of BobXb) and the analysis for Bob is similar. To show theprivate training sets are secure, we only need to showthat during the To rounds of parameter exchanges, anadversary cannot reverse engineer X of the other user.After To rounds of parameter exchange, informationavailable to Alice is {utb, αtb}, for t = 1...To. Her goal isto find an Nb× (p+ 1) matrix Xb with Nb×p unknowns.Alice is familiar with the training mechanism and sheknows that the parameters at hand have the relationshipas follows:

A = 2Xbc−1XT

b , (9)

XTb λ = cutb + d, (10)

B = 1 +Xbc−1d, (11)

λ = arg min0≤λ≤C

2

1

2λTAλ+BTλ (12)

where c = 2(Π + 4ρI)−1, d = αtb − αta − 2ρuta couldbe computed accordingly for each iteration. Notice thatthe value of λ comes from the quadratic optimizationproblem (12), in which A is a fixed matrix determinedby Xb, B is changing by iterations. We need to showthat, with multiple {B, ub} tuples, Alice cannot get anyinformation of Xb. To solve the quadratic optimization

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 9: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

9

problem (12), we need to first find its Lagrange function:

L =1

2λTAλ+BTλ+ τT (λ− C

2)− νTλ, (13)

where τ and ν are Lagrange multipliers. The solutionof problem (12) could be obtained through the KKTconditions:

−Aλ− τ + ν = BT (14)

τT (λ− C

2) = 0 (15)

νTλ = 0 (16)

0 ≤ λ ≤ C

2, ν, τ ≥ 0.

With (9) and (10), (14) can be written as

−Xb(c · utb + d)− τ + ν = B. (17)

If the parameters {c, utb, d, B, τ, ν} are known to Alice,she can get Nb equations, one for each training sampleat Bob. With more than p iterations, she should be able torecover Xb by having enough equations to find out Nb×punknowns. However, the Lagrange multipliers τ and νare calculated when Bob is trying to solve problem (12)at each iteration and he will not reveal these parametersto Alice. τ and ν are easy to compute for Bob with matrixA, but it is hard to make a reasonable guess for Alice.In this way, at each iteration, by revealing Nb equations,2Nb unknowns are introduced. Alice could never haveenough equations to find out Xb.

From another point of view, the information availableto Alice is that support vectors of Bob are sitting on a pdimension hyperplane (ub). One support vector could befound by intersecting p such hyperplanes. However, Bobwill never tell Alice which hyperplane contain whichsupport vectors, hence, Alice could not form the properlinear equations to solve a support vector. For these non-support vector training samples, the only informationfor Alice is that those samples are laying on the oppositeside of the hyperplane, Alice have no clue of where theyare.

6 EVALUATION

Our system is evaluated with two criteria: network-wideperformance and facial recognition performance. Theformer is used to capture the real-world performance ofour design on large-scale OSNs in terms of computationcost, while the latter is an important factor for theuser experience. In this section, we will describe ourAndroid implementation first and then the experimentsto evaluate these two criteria.

6.1 Implementation

Our prototype application is implemented on GoogleNexus 7 tablets with Android 4.2 Jelly Bean (API level17) and Facebook SDK. We use OpenCV Library 2.4.6to carry out the face detection and Eigenface method to

Compute

classifiers

Feature

extraction

Check with

privacy policieswho post

Post

photo

Pick

friends

Classifiers

Private

training set

Pick close

friends

Start

Specify policy

Training set

Post photo

Fig. 4: System structure of our application

carry out the FR. Fig.4 shows the graphical user interface(GUI). A log in/out button could be used for log in/outwith Facebook. After logging in, a greeting message andthe profile picture will be shown. Our prototype worksin three modes: a setup mode, a sleeping mode and aworking mode.

Running in the setup mode, the program is workingtowards the establishment of the decision tree. For thispurpose, the private training set Xi and neighborhoodBi need to be specified. Xi could be specified by theuser with the button “Private training set”. When it ispressed, photos in the smart phone galleries could beselected and added to Xi. To setup the neighborhood Bi,at this stage, a user needs to manually specify the set of“close friends” among their Facebook friends with thebutton “Pick friends” as their neighborhood. Accordingto the Facebook statistics, on average a user has 130friends, we assume only a small portion of them are“close friends”. In our application, each user picks upto 30 “close friends”. Notice that all the selected friendsare required to install our application to carry out thecollaborative training. With Xi and Bi specified, thesetup mode could be activated by pressing the button“Start”. Key operations and the data flow in this modeare enclosed by a yellow dashed box on the systemarchitecture Fig.4.

During the training process, a socket is establishedexchange local training results. After the classifiers areobtained, decision tree is constructed and the programswitches from the setup mode to the sleeping mode.Facebook allows us to create a list of friends such as“close friends” or “Acquaintances”. We can share aphoto only to friends on list. According to the proposedscheme, this friend list should be intersection of owner’sprivacy policy and co-owners’ exposure policies. How-ever, in Facebook API, friend lists are read-only items,they cannot be created or updated through the currentAPI. That means we cannot customize a friend listto share a co-photo. Currently, when the button “PostPhoto” is pressed, co-owners of x are identified, thennotifications along with x are send to the co-owners to

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 10: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

10

request permissions. If they all agree to post x, x will beshared on the owner’s page like a normal photo. In thissense, users could specify their privacy policy but theirexposure policies are either everybody on earth or nobodydepending on their attitude toward x. The data flow fora photo posting activity is illustrated by the solid redarrows. After the requests are sent out, the program willgo back to the sleeping mode. If Xi or Bi is modified,the program will be invoked to the setup mode. In thiscase, the operations in the yellow dashed box will beperformed again and decision tree will be updated.

6.2 Network-wide performance

At this stage, a large number of users are absent forus to carry out the network-wide evaluation. We sim-ulate a real-life social network with the small-worldnetwork[24]. The simulations are conducted on a desk-top with Intel i3 550 3.4 GHz and 4.0 GB memory. Weuse the database of “Face Recognition Data, Universityof Essex, UK” to assign training set for each simulatedusers. The database contains photos for 395 individualsand 20 images per individual with varying poses andfacial expressions. Users are assigned with photos fromthe same individual randomly.

In a small world network, there are three input pa-rameters: the total number of vertex N , the averagenode degree D̄ and rewire probability p. In the rest ofthis section, we use D̄ and the number of neighborsinterchangeably to denote the average number of usersin one’s neighborhood. To construct a small-world net-work, first we arrange the vertices and connect them ina ring. Then we connect every vertex with its D̄ nearestneighbors. Finally, for each vertex, with probability p, itsexisting edge is rewired with another randomly selectedvertex. It is shown in [14] that the rewire probabilityis highly related to the geodesic distance (the averageshortest distance between any two vertices). We wantto show that in a small-world network, there exist alot of complete subgraphs, which greatly reduces thesetup time by reusing the existing classifiers. Due toresource limitations, we simulate on a network with 3000vertices. The the computation cost is measured by totalcomputation time.

Fig.5 and Fig.7 plot our simulation results in a networkof 3000 nodes with a fixed rewire probability of 0.3 anda varying D̄ from 6 to 18. Specifically, as in Fig.5, theone-against-all (OVA) approach and our proposed one-against-one (OVO) approach are compared in terms oftotal computation cost. We can see that the computationcost of the proposed OVO approach is much lowerand the efficiency gain is increasing with number ofneighbors. In the previous section, we argued that thisphenomenon is caused by two reasons: first, the averagenumber of iterations to converge in our OVO approachshould be much smaller; second, the classifiers could bereused with the existence of complete subgraphs.

Fig.6 illustrates the results for the computation cost

6 8 10 12 14 16 180

5x 10

5

Tot

al tr

aini

ng ti

me

Number of neighbors

0

2

4

6

8

10

Effi

cien

cy g

ain

ovaovoovaovo

Fig. 5: Total computation cost and the efficiency gainagainst the number of neighbors

0 6 12 18 24 300

5

10

15

20

Tra

inin

g tim

e(s)

Number of neighbors

0

10

20

30

40

Itera

tions

Training time

Num of iterations

Fig. 6: The average training time and iterations againstthe number of neighbors

6 8 10 12 14 16 180

1

2

3

4

5

6

Ave

rage

sho

rtes

t dis

tanc

e

Average degree

0.33

0.35

0.37

0.39

0.41

0.43

0.45

Pro

babi

lity

of k

onw

ledg

e re

use

shortest distancereuse probability

Fig. 7: the average shortest distance and knowledgereuse probability against average degree

and the average number of iterations, which are increas-ing with the number of participants. In this simulation,each user has 20 training samples and each sample isa vector of 20 features. The stopping criteria is set tobe 5%, which means the algorithm will return ui if itsvariation is less than 5% between two adjacent iterations.On the one hand, we can see from Fig.6 that for 2 users,it only takes less than 5 iterations to converge, whilefor 30 users, it takes more than 30 iterations. Moreover,

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 11: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

11

for 30 users, each iteration involves 30 users, both thecomputation and communication cost are much higherthan the case where there are only 2 users. As a result,the training time in total for 30 users is 100 times morethan that for 2 users.

The probability of classifier reuse is studied in Fig.7, inwhich we plot the probability of reuse together againstthe average shortest distance. By reusing a classifier, wemean that when user i and user j attempt to computea classifier uij , instead of conducting the iterative algo-rithm immediately, they first try to look up at the localtable. If uij exists in the table, this classifier could bereused. Fig.7 shows that with a small average shortestdistance, the reuse probability is high because a smallerdistance between vertices means the vertices are “wellconnected”, in which a complete subgraph is more likelyto exist.

6.3 Facial recognition performance

In this subsection, we study the recognition ratio againstthe number of friends and the number of strangers. Stan-dard face detection in [23] is used for face detection andeigenface [22] is used to extract features and vectorize thetraining image. However, the standard eigenface methodis a centralized approach, it may not be applicable to ourdistributed case. To address this, we assume principlecomponents have already been extract to form a vectorspace S. User’s facial photos are projected into thisspace as feature vectors. Based on our simulation results,we find that this modification is reasonable due to thefact that the important features on human face lie ononly a few directions. Facial feature extraction is beyondthe scope of this paper. Better facial feature extractionmethod can be applied to our system to obtain a betterrecognition ratio.

In Fig.8, we show the recognition ratios of our pro-posed scheme and the scheme with DAG decision tree.As in Fig.8(a), when there are no strangers, both ourproposed scheme and the DAG scheme could achievevery high recognition ratio of more than 80% when thenumber of users is fewer than 30. While in Fig.8(b),among the users, 10% of them are strangers, we can seethat the recognition ratio of our scheme has a higherrecognition ratio than the DAG scheme by 5%. Thereason is that our scheme is able to reject strangers.The solid line on each figure represents recognition ratioof strangers ps, which is increasing with number ofusers. Intuitively, if there are more users, there willbe more classifiers and the chance that a stranger getscontradictory decisions will be higher. Fig.8(c) shows asimilar case where there are 30% strangers. In this case,our scheme outperforms the DAG scheme by 10% interms of recognition ratio. This is achieved by the abilityof identifying strangers. With 30 users, the probability ofidentifying a stranger is around 35%.

Another criterion to measure the performance is thefalse positive rate. In the previous section we argued

dag pfpour approach pfpour approach pfn

Fa

lse

re

co

gn

itio

n r

atio

Fig. 9: The false negative and false positive ratios

that a false positive recognition will reveal the test imageto the wrong person. Thus, a low false positive rate isdesirable. If there are no strangers, the false positive rateis only determined by the recognition accuracy. If thereare strangers, the false positive is also determined bymisclassification of the strangers. Fig. 9 illustrates bothfalse positive rate and false negative rate of our schemeand the DAG scheme. We observe that false positive rateof our scheme is 10% lower than original DAG schemeon average. Notice that false negative recognitions couldalso be introduced by our stranger detection scheme,according to Fig. 9, the more users, the higher chancea user is recognized as a stranger.

7 CONCLUSION AND DISCUSSION

Photo sharing is one of the most popular features inonline social networks such as Facebook. Unfortunately,careless photo posting may reveal privacy of individualsin a posted photo. To curb the privacy leakage, weproposed to enable individuals potentially in a phototo give the permissions before posting a co-photo. Wedesigned a privacy-preserving FR system to identifyindividuals in a co-photo. The proposed system is fea-tured with low computation cost and confidentiality ofthe training set. Theoretical analysis and experimentswere conducted to show effectiveness and efficiencyof the proposed scheme. We expect that our proposedscheme be very useful in protecting users’ privacy inphoto/image sharing over online social networks. How-ever, there always exist trade-off between privacy andutility. For example, in our current Android application,the co-photo could only be post with permission of allthe co-owners. Latency introduced in this process willgreatly impact user experience of OSNs. More over, localFR training will drain battery quickly. Our future workcould be how to move the proposed training schemes topersonal clouds like Dropbox and/or icloud.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 12: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

12

10 15 20 25 300

0.2

0.4

0.6

0.8

1R

ecog

nitio

n ra

tio

Number of users

dag

our approach

(a) No stranger

10 15 20 25 300

0.2

0.4

0.6

0.8

1

Rec

ogni

tion

ratio

Number of users

0.2

0.24

0.28

0.32

0.36

0.4

Str

ange

r id

entif

icat

ion

prob

abili

ty

dag

our approach

ps

(b) 10% strangers

10 15 20 25 300

0.2

0.4

0.6

0.8

1

Rec

ogni

tion

ratio

Number of users

0.2

0.24

0.28

0.32

0.36

0.4

Str

ange

r id

entif

icat

ion

prob

abili

ty

dag

our approach

ps

(c) 30% strangers

Fig. 8: Recognition ratio with varying number of users

REFERENCES

[1] I. Altman. Privacy regulation: Culturally universal or culturallyspecific? Journal of Social Issues, 33(3):66–84, 1977.

[2] A. Besmer and H. Richter Lipford. Moving beyond untagging:photo privacy in a tagged world. In Proceedings of the SIGCHIConference on Human Factors in Computing Systems, CHI ’10, pages1563–1572, New York, NY, USA, 2010. ACM.

[3] S. Boyd, N. Parikh, E. Chu, B. Peleato, and J. Eckstein. Distributedoptimization and statistical learning via the alternating directionmethod of multipliers. Found. Trends Mach. Learn., 3(1):1–122, Jan.2011.

[4] B. Carminati, E. Ferrari, and A. Perego. Rule-based access controlfor social networks. In R. Meersman, Z. Tari, and P. Herrero,editors, On the Move to Meaningful Internet Systems 2006: OTM2006 Workshops, volume 4278 of Lecture Notes in Computer Science,pages 1734–1744. Springer Berlin Heidelberg, 2006.

[5] J. Y. Choi, W. De Neve, K. Plataniotis, and Y.-M. Ro. Collaborativeface recognition for improved face annotation in personal photocollections shared on online social networks. Multimedia, IEEETransactions on, 13(1):14–28, 2011.

[6] K. Choi, H. Byun, and K.-A. Toh. A collaborative face recognitionframework on a social network platform. In Automatic Face GestureRecognition, 2008. FG ’08. 8th IEEE International Conference on,pages 1–6, 2008.

[7] K.-B. Duan and S. S. Keerthi. Which is the best multiclass svmmethod? an empirical study. In Proceedings of the 6th internationalconference on Multiple Classifier Systems, MCS’05, pages 278–285,Berlin, Heidelberg, 2005. Springer-Verlag.

[8] P. A. Forero, A. Cano, and G. B. Giannakis. Consensus-baseddistributed support vector machines. J. Mach. Learn. Res., 99:1663–1707, August 2010.

[9] B. Goethals, S. Laur, H. Lipmaa, and T. Mielik?inen. On privatescalar product computation for privacy-preserving data mining.In In Proceedings of the 7th Annual International Conference in In-formation Security and Cryptology, pages 104–120. Springer-Verlag,2004.

[10] L. Kissner and D. Song. Privacy-preserving set operations. InIN ADVANCES IN CRYPTOLOGY - CRYPTO 2005, LNCS, pages241–257. Springer, 2005.

[11] L. Kissner and D. X. Song. Privacy-preserving set operations.In V. Shoup, editor, CRYPTO, volume 3621 of Lecture Notes inComputer Science, pages 241–257. Springer, 2005.

[12] N. Mavridis, W. Kazmi, and P. Toulis. Friends with faces: Howsocial networks can enhance face recognition and vice versa. InComputational Social Network Analysis, Computer Communicationsand Networks, pages 453–482. Springer London, 2010.

[13] R. J. Michael Hart and A. Stent. More content - less control: Accesscontrol in the web 2.0. In Proceedings of the Workshop on Web 2.0Security and Privacy at the IEEE Symposium on Security and Privacy,2007.

[14] M. E. Newman. The structure and function of complex networks.SIAM review, 45(2):167–256, 2003.

[15] L. Palen. Unpacking privacy for a networked world. pages 129–136. Press, 2003.

[16] J. C. Platt, N. Cristianini, and J. Shawe-taylor. Large margin dagsfor multiclass classification. In Advances in Neural InformationProcessing Systems 12, pages 547–553, 2000.

[17] D. Rosenblum. What anyone can know: The privacy risks of socialnetworking sites. Security Privacy, IEEE, 5(3):40–49, 2007.

[18] A. C. Squicciarini, M. Shehab, and F. Paci. Collective privacy man-agement in social networks. In Proceedings of the 18th InternationalConference on World Wide Web, WWW ’09, pages 521–530, NewYork, NY, USA, 2009. ACM.

[19] Z. Stone, T. Zickler, and T. Darrell. Toward large-scale facerecognition using social network context. Proceedings of the IEEE,98(8):1408–1415.

[20] Z. Stone, T. Zickler, and T. Darrell. Autotagging facebook:Social network context improves photo annotation. In ComputerVision and Pattern Recognition Workshops, 2008. CVPRW’08. IEEEComputer Society Conference on, pages 1–8. IEEE, 2008.

[21] K. Thomas, C. Grier, and D. M. Nicol. unfriendly: Multi-partyprivacy risks in social networks. In M. J. Atallah and N. J.Hopper, editors, Privacy Enhancing Technologies, volume 6205 ofLecture Notes in Computer Science, pages 236–252. Springer, 2010.

[22] M. Turk and A. Pentland. Eigenfaces for recognition. Journal ofcognitive neuroscience, 3(1):71–86, 1991.

[23] P. Viola and M. Jones. Robust real-time object detection. InInternational Journal of Computer Vision, 2001.

[24] D. J. Watts and S. H. Strogatz. Collective dynamics of “small-world” networks. nature, 393(6684):440–442, 1998.

[25] H. Yu, X. Jiang, and J. Vaidya. Privacy-preserving svm usingnonlinear kernels on horizontally partitioned data. In Proceedingsof the 2006 ACM symposium on Applied computing, SAC ’06, pages603–610, New York, NY, USA, 2006. ACM.

Kaihe Xu received his B.E. degree from Nan-jing university of Posts and Telecommunications,China, in 2010, the M.S. degree from IllinoisInstitute of Technology, USA, in 2012. He is cur-rently working toward the PhD degree in depart-ment of Electrical and Computer Engineering,University of Florida. His research interests in-clude Secure multi-party computation, machinelearning and distributed system.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Page 13: My Privacy My Decision: Control of Photo Sharing on Online ... · Abstract—Photo sharing is an attractive feature which popularizes Online Social Networks (OSNs). Unfortunately,

13

Yuanxiong Guo received his B.E. degree inelectronic information science and technologyfrom Huazhong University of Science and Tech-nology, Wuhan, China, in 2009. He receivedM.S. and Ph.D. degree in Electrical and Com-puter Engineering from the University of Floridain 2012 and 2014, respectively. He has beenan assistant professor in the School of Electri-cal and Computer Engineering, Oklahoma StateUniversity from August 2014. His research in-terests include Smart Grids, Power and Energy

Systems, Cyber-Physical Systems, and Sustainable Computing andNetworking Systems.

Linke Guo received his B.E. degree in electronicinformation science and technology from BeijingUniversity of Posts and Telecommunications in2008. He received M.S. and Ph.D. degree inElectrical and Computer Engineering from theUniversity of Florida in 2011 and 2014, respec-tively. He has been an assistant professor in theDepartment of Electrical and Computer Engi-neering, Binghamton University, State Universityof New York from August 2014. His researchinterests include network security and privacy,

social networks, and applied cryptography. He has served as theTechnical Program Committee (TPC) members for several conferencesincluding IEEE INFOCOM, ICC, GLOBECOM, and WCNC. He is amember of the IEEE and ACM.

Yuguang Fang (F’08) received a Ph.D. degreein Systems Engineering from Case Western Re-serve University in January 1994 and a Ph.Ddegree in Electrical Engineering from BostonUniversity in May 1997. He was an assistantprofessor in the Department of Electrical andComputer Engineering at New Jersey Instituteof Technology from July 1998 to May 2000. Hethen joined the Department of Electrical andComputer Engineering at University of Floridain May 2000 as an assistant professor, got an

early promotion to an associate professor with tenure in August 2003and a professor in August 2005. He has published over 350 papersin refereed professional journals and conferences. He received theNational Science Foundation Faculty Early Career Award in 2001 andthe Office of Naval Research Young Investigator Award in 2002. Hewon the Best Paper Award at IEEE ICNP2006. He has served onmany editorial boards of technical journals including IEEE Transactionson Communications, IEEE Transactions on Wireless Communications,IEEE Transactions on Mobile Computing, Wireless Networks, and IEEEWireless Communications (including the Editor-in-Chief). He is alsoserving as the Technical Program Co-Chair for IEEE INFOCOM2014.He is a fellow of the IEEE.

Xiaolin Li is Associate Professor in Departmentof Electrical and Computer Engineering at Uni-versity of Florida. His research interests includeParallel and Distributed Systems, CyberCPhysi-cal Systems, and Network Security. His researchhas been sponsored by National Science Foun-dation (NSF), Department of Homeland Security(DHS), and other funding agencies. He is anassociate editor of several international journalsand a program chair or co-chair for over 10international conferences and workshops. He is

on the executive committee of IEEE Technical Committee on ScalableComputing (TCSC) and served as a panelist for NSF. He received aPh.D. degree in Computer Engineering from Rutgers University, USA.He is the founding director of the Scalable Software Systems Laboratory(http://www.s3lab.ece.ufl.edu). He received the National Science Foun-dation CAREER Award in 2010. He is a member of IEEE and ACM.

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication.The final version of record is available at http://dx.doi.org/10.1109/TDSC.2015.2443795

Copyright (c) 2015 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].