IEEE TRANSACTIONS ON INFORMATION …faculty.missouri.edu/lindan/papers/iprivacy.pdfpersonal...

IEEE TRANSACTIONS ON INFORMATION FORENSICS AND SECURITY, 2016 1

iPrivacy: Image Privacy Protection by IdentifyingSensitive Objects via Deep Multi-Task Learning

Jianping Fan, Zhenzhong Kuang, Baopeng Zhang, Jun Yu, Dan Lin

Abstract—Even major social websites like Facebook can allowusers to manually specify their coarse-grained privacy settings,more and more privacy concerns still raise for social imagesharing. The reason is that it may not be easy for common usersto correctly configure their privacy preferences manually due toboth the lack of privacy knowledge and the huge time consuming.To automate such privacy setting process for social image sharing,a new approach callediPrivacy is developed in this paper: (a)massive social images and their privacy settings are leveragedto learn the object-privacy relatedness effectively and identifya large set of privacy-sensitive object classes automatically; (b)a hierarchical deep multi-task learning algorithm is developedto jointly learn more representative deep CNNs (convolutionalneural networks) and more discriminative tree classifier over avisual tree, so that we can achieve fast and accurate detection oflarge numbers of privacy-sensitive object classes; (c) automaticrecommendation of privacy settings for image sharing can beachieved by detecting the underlying privacy-sensitive objectsfrom the images being shared, recognizing their classes, andidentifying their privacy settings; and (d) one simple solution forimage privacy protection is provided by blurring the privacy-sensitive objects automatically. We have conducted extensiveexperimental studies on real-world images and the results havedemonstrated both efficiency and effectiveness of our proposedapproach.

Index Terms—Image sharing, privacy setting recommendation,object-privacy alignment, image privacy protection, privacy-sensitive object classes, hierarchical deep multi-task learning, treeclassifier for hierarchical object detection.

I. I NTRODUCTION

W Ith the growing popularity of smart-phones and othermobile devices, high-quality cameras are becoming

increasingly ubiquitous and pervasive, as a result, capturinghigh-quality images has become one part of our daily ac-tivities and image sharing has now become very popular insocial platforms like Facebook, Instagram and Foursquare.Bydefault, the shared images can be seen by anyone in socialnetworks. Since images can intuitively tell when and where aspecial moment took place, who participated and what weretheir relationships, the shared images can reveal much of users’personal and social environments and their private lives. Thus,

Jianping Fan, Zhenzhong Kuang, Baopeng Zhang and Jun Yu are with theDepartment of Computer Science, University of North Carolinaat Charlotte,Charlotte, NC 28223, USA. e-mail: [email protected]

Dan Lin is with the Department of Computer Science, Missouri Universityof Science and Technology, Rolla, MO 65409, USA. e-mail: [email protected]

This research is supported by National Science Foundation under 1651166-CNS and 1651455-CNS.

Copyright (c) 2008 IEEE. Personal use of this material is permitted.However, permission to use this material for any other purposes must beobtained from the IEEE by sending an email to [email protected].

Fig. 1. Illustration of some privacy-sensitive object classes: (a) user-independent classes: (a1) sensitive people, (a2) sensitive locations, (a3)public toilet, (a4) discrimination texts; (b) user-dependent classes: (b1)home shrines which may indicate personal religions, (b2) visual attributeswhich can indicate smoking in public, (b3) conference name tags, (b4)personal information tags.

privacy protection [1-4] is a critical issue to be addresseddur-ing social image sharing [5-16]. Unfortunately, many peopleespecially young users of social networks often share privateimages about themselves, their friends and classmates withoutbeing aware of the potential impact on their future lives causedby unwanted disclosure and privacy violations.

To ensure privacy, most social image sharing sites allowusers to manually specify coarse-grained privacy settings(preferences): whether an image is public, private or visible totheir family members or friends. However, due to the lack ofprivacy knowledge, it would not be easy for common users tocorrectly configure privacy settings to achieve their desiredlevels of privacy protection; also, given the large numberof images being shared and the tedious steps needed forfine-grained privacy settings, some users may not be willingto spend extra time on providing such fine-grained privacysettings [1-4].

Based on these observations, in this work, a new approachcallediPrivacy (image Privacy) is developed to automate suchprivacy setting process for social image sharing. Unlike manyprevious works [5-13] which typically recommend privacysettings based on similarity of users’ profiles or image tags,our idea is to automatically detect the privacy-sensitive objectsfrom the images being shared, recognize their classes, andidentify their privacy settings, so that the image owners can bewarned what objects in the images need to be protected beforesharing. The critical challenge to be conquered here is how toidentify all the privacy-sensitive object classes efficiently andlearn the object-privacy relatedness precisely from massivesocial images, so that such knowledge can later be usedto: (1) determine which object classes should be detectedfrom individual images being shared; and (2) leverage objectdetection results to recommend the best-matching privacysettings for image sharing. Considering 1.82 billions active


users of social networks and trillions of shared images, theremay exist a large set of privacy-sensitive object classes. Asillustrated in Fig. 1, such privacy-sensitive object classes canfurther be partitioned into two categories: (a)user-independentclassessuch as humans, locations and discrimination texts inimages; and (b)user-dependent classessuch as home shrinesand visual attributes for personal hobbies.

Another critical issue for automating the privacy settingprocess is the time limitation, e.g., users may expect to gettheir privacy setting recommendations quickly. Because therecould have large numbers of privacy-sensitive object classes,detecting the privacy-sensitive objects from the images beingshared and recognizing their classes could be very expensive,thus recommending the best-matching privacy settings forimage sharing could be an extremely time consuming process.Specifically, if a flat approach is employed, the computationalcost will grow linearly with the total number of privacy-sensitive object classes (to be detected and recognized) andhence it is not scalable; if a hierarchical approach [19-20]is adopted, the object detection process could be speed updramatically but it would seriously suffer from the so-calledinter-level error propagationproblem, i.e., the mistakes madeat the parent nodes will propagate to their child nodes andsuch mistakes cannot be recovered.

To address the aforementioned challenges, our iPrivacysystem takes four main steps (as illustrated in Fig. 2): (1) DeepCNNs are learned to achieve semantic image segmentation andidentify large numbers of object classes from massive socialimages, and an automatic object-privacy alignment algorithmis developed to learn the object-privacy relatedness and iden-tify a large set of privacy-sensitive object classes; (2) A visualtree is learned to organize large numbers of privacy-sensitiveobject classes hierarchically in a coarse-to-fine fashion,whichcan provide a good environment to determine the inter-relatedlearning tasks automatically; (3) A hierarchical deep multi-task learning (HD-MTL) algorithm is developed to learnmore representative deep CNNs and more discriminative treeclassifier jointly over the visual tree, so that we can achieve fastand accurate detection of large numbers of privacy-sensitiveobject classes; (4) A soft prediction scheme is used to enhancethe performance of our hierarchical object detection approachby exploiting multiple paths simultaneously. A simple solutionfor image privacy protection is further provided by blurringthe privacy-sensitive objects automatically. We have conductedextensive experimental studies on real-world images and theresults have demonstrated both efficiency and effectiveness ofour proposed approach.

The remaining of the paper is organized as follows. Section2 reviews the related work briefly; Section 3 presents ouralgorithm on leveraging deep learning for semantic imagesegmentation and our automatic object-privacy alignment al-gorithm for assigning the privacy settings given at the imagelevel into the most relevant object classes; Section 4 introducesour algorithm for visual tree construction; Section 5 presentsour algorithm for joint learning of the deep CNNs and thetree classifier over the visual tree; Section 6 describes oursoftprediction algorithm for hierarchical object detection; Section7 reports the experimental results for algorithm and system

Fig. 2. An Overview of the key components of our iPrivacy system.

evaluation; Section 8 concludes the paper and outlines thefuture work.

II. RELATED WORK

In this section, we briefly review the most relevant re-searches on: (1) privacy protection for social image sharing[5-18, 49-50]; (2) semantic image analysis [44-48]; and (3)deep learning [23-30, 51] and multi-task learning for objectdetection [21-22, 37-38].

A. Privacy Protection for Social Image Sharing

Several recent works have studied how to automate theprivacy setting process for image sharing [5-18]. Bonneauet al. [14] proposed the concept of privacy suites whichrecommend users a suite of privacy settings that “expert”users or other trusted friends have already set, so that normalusers can either directly choose a setting or only need to dominor modification. Ravichandran et al. [13] studied how topredict a user’s privacy preferences for location-based data(i.e., share her location or not) based on location and time ofday. Fang et al. [16] proposed a privacy wizard to help usersgrant privileges to their friends. The wizard asks users to firstassign privacy labels to the selected friends, and then usesthisas the input to construct a classifier to classify friends basedon their profiles and automatically assign privacy labels totheunlabeled friends. More recently, Klemperer et al. [12] studiedwhether the keywords and captions (which are provided by theusers when they tag their photos) can be used to help userscreate and maintain access-control policies more intuitively,where the social tags created for organizational purposes canbe re-purposed to help create reasonable access-control rules.

The aforementioned approaches focus on deriving policysettings for only traits, so they mainly consider social contextsuch as one’s friend list. While interesting, they may notbe sufficient to address challenges brought by image filesfor which privacy may vary substantially not just because ofsocial contexts but also due to the actual image content asconsidered in our work. Zerr’s work [17] and Squicciarini etal.[18] have explored privacy-aware image classification by usinga mixed set of features, both content and meta-data. Morerecently, Tonge and Caragea [49] have first integrated the deep


features for image privacy prediction, Spyromitros-Xioufis etal. [50] have recently leveraged user-dependent images andprivacy settings to support personalized privacy-aware imageclassification. Both teams have found that the deep featurescan yield remarkable improvements on the performance ascompared with other handcrafted visual features such as SIFT,GIST and color histograms. On the other hand, our approachprovides a finer level of image classification by detecting theprivacy-sensitive objects from the images. More importantly,our approach is very efficient in: (a) identifying a large setofprivacy-sensitive object classes from massive social images;(b) learning the object-privacy relatedness, e.g., the correspon-dences between the object classes and the privacy settings forimage sharing; and (c) supporting fast and accurate detectionof the privacy-sensitive objects from the images being sharedand leveraging the object-privacy relatedness to recommendthe best-matching privacy settings for image sharing.

B. Semantic Image Analysis

Even social tags have been leveraged for automating theprivacy setting process and social tagging has been provedto be a valuable tool in image sharing platforms, it is worthnoting that image privacy protection is about image content(object classes) rather than social tags. Thus it is very at-tractive to develop new algorithm to automatically identifythe correspondences (relatedness) between the object classesand the privacy settings for image sharing. It is worth notingthat the privacy settings are given at the image level ratherthan at the object level and each social image may containmultiple objects, thus there may have huge uncertainty on suchobject-privacy relatedness. To reduce the uncertainty, itis veryattractive to develop new algorithms to: (a) achieve semanticsegmentation of object regions and some pioneering researcheshave been done recently on leveraging deep learning to achievesemantic image segmentation [44-48]; and (b) accurately alignthe object classes with the privacy settings for image sharingand we can take the lessons from some pioneering researcheson automatic object-tag alignment [39-43].

By performing semantic image segmentation and automaticobject-privacy alignment, we can learn from massive socialimages (and their privacy settings) to determine the correspon-dences between the object classes and their privacy settings,such object-privacy correspondences may further allow us to:(1) identify a large set of privacy-sensitive object classes;and (b) recommend the best-matching privacy settings for theimages being shared by detecting their underlying privacy-sensitive objects, recognizing their classes, and identifyingtheir privacy settings automatically.

C. Deep Multi-Task Learning for Object Detection

Achieving automatic detection of privacy-sensitive objectsfrom images may play an important role in image privacyprotection. Deep learning [23-25] has demonstrated its out-standing abilities on learning high-level features and signifi-cantly boosting the accuracy rates for large-scale object detec-tion (i.e., detecting and recognizing large numbers of objectclasses), but they still have room to improve. For example,

softmax is used to flatly map the high-level features into largenumbers of object classes, where the inter-task correlations(inter-class visual similarities) are completely ignored. As aresult, the process for learning the deep CNNs may be pushedaway from the global optimum because the gradients of theobjective function are not uniform for all the object classes andsuch learning process may distract on discerning the objectclasses that are hard to be discriminated [26-30]. In our work,a tree structure is seamlessly integrated with deep networkto identify the inter-related learning tasks and avoid suchdistraction effectively.

By considering multiple inter-related learning tasks jointly,multi-task learning [21-22] has demonstrated its strong abilityon learning more discriminative classifiers for large-scaleobject detection. Even multi-task learning has demonstratedmany advantages in theory, there are at least two obstaclesfor applying multi-task learning to support large-scale objectdetection. Thefirst obstacle is how to identify the inter-related learning tasks automatically, and traditional multi-task learning algorithms usually assume that all the tasksare equally related. However, such assumption may not holdin the scenario of large-scale object detection because it isunnecessary for each object class to be related with all theothers. Thesecond obstacleis how to leverage inter-taskrelationships for enhancing multi-task learning. In ordertoshare information appropriately, multi-task learning usuallyneeds to assume how the tasks are correlated. By consideringthe differences of the inter-task relationships among multiplelearning tasks, some researchers have proposed joint learningapproach to learn the inter-task relationships and the multi-taskclassifiers simultaneously [22]. For large-scale object detectionapplication, such joint learning approach may seriously sufferfrom the problem of huge computational cost. Some attemptshave recently been made to exploit the tree structures in multi-task learning [37] and deep learning [26-30, 51].

III. SEMANTIC IMAGE SEGMENTATION AND AUTOMATIC

OBJECT-PRIVACY ALIGNMENT

The first step in our iPrivacy system is to: (1) achieve seman-tic segmentation of object regions for massive social images;(2) leverage large-scale social images and their privacy settingsto identify a large set of privacy-sensitive object classes; (3)assign the privacy settings (which are given at the image level)into the most relevant object regions (object classes); and(4)learn the correspondences between the object classes and theprivacy settings, e.g., object-privacy relatedness. In this work,the privacy settings are coarsely partitioned into 3 groups: (a)public; (b) private; and (c) shared with friends or family.

We have collected massive social images and their privacysettings, where the privacy settings are loosely given at theimage level without providing their exact correspondenceswith the underlying object classes. Based on these observa-tions, each social image is first segmented into a set of seman-tic object regions. Deep CNNs have shown its strong abilityon supporting pixel-level image classification (i.e., semanticimage segmentation by assigning one particular semantic labelto every pixel in an image) [44-48]. To extract semantic object


Fig. 3. Our results on semantic image segmentation: (a) original social images; and (b) semantic segmentation of object regionsfor human beingsand cars, bikes, motorcycles.

Fig. 4. Our results on semantic image segmentation: (a) original social images; and (b) semantic segmentation of object regionsfor human beingsand animals (horses, cats and dogs).

regions from the images, we have trained a fully convolutionalnetwork in an end-to-end way to enable pixel-level predictionand classification, and a CRF (conditional random fields)model is further learned to integrate the neighboring pixelsfor the same object classes to generate semantic object regions[44]. As shown in Fig. 3 and Fig. 4, by integrating deep CNNswith CRF models for semantic image segmentation, we canidentify semantic object regions and their categories (objectclasses) effectively.

After the semantic object regions are extracted from imagesand their object classes are recognized, an automatic object-privacy alignment algorithm is developed to achieve more pre-cise alignment between the object regions (object classes)andthe privacy settings for image sharing. By performing semanticimage segmentation, each social image is partitioned into asetof semantic object regions and each semantic object regionmay correspond to one certain type of object classes, e.g.,one image may contain multiple objects. The semantics foreach social image can be described effectively by all its objectclasses. For a given social image, by projecting all its objecttags (object classes) over the full set of 1000 object classes,we can obtain a 1000-dimensional sparse representation forthe given social image, e.g., bag of object classes. Thus thesemantic similarity between two social images can then bedefined as:

κI(Xi, Xj) =1000∑

l=1

δ(X li , X

lj) (1)

whereXi andXj are the bags of object classes for these twosocial images,δ(X l

i , Xlj) is defined as:

δ(X li , X

lj) =

1, if X li = X l

j = 1;

0, otherwise(2)

whereX li = X l

j = 1 indicates that two social imagesXi andXj contain the same object class, e.g., thelth object class.

Thus massive social images are further grouped into a largenumber of clusters according to their semantic similaritiesκI(·, ·). Some experimental results on semantic image cluster-ing are shown in Fig. 5, where the semantically-similar social

images in the same cluster are visualized according to theirsemantic similarities.

Each cluster contains large amounts of semantically-similarsocial images, such semantically-similar social images inthesame cluster contain similar object classes. Thus the privacysettings for the semantically-similar social images in thesame cluster are merged to generate a short list of commonprivacy settings. Such common privacy settings are re-rankedaccording to their occurrence frequencies and some of themwith higher occurrence frequencies are selected to define theprivacy settings for such semantically-similar social images inthe same cluster. Thus performing semantic image clusteringmay significantly reduce huge uncertainties on the object-privacy relatedness.

For a given cluster, the privacy settings for all itssemantically-similar social images can be integrated to gen-erate a short listP of privacy settings. For a given privacysetting t from the short listP , t ∈ P , its relevance scoreγ(Ci, t) with the object classCi is defined as:

γ(Ci, t) =‖ Ψ(Ci, t) ‖

‖ Ψ(C,P ) ‖(3)

whereΨ(C,P ) is the full set of the semantically-similar socialimages in the same cluster which contain a set of privacy-sensitive object classesC = C1, · · · , Ci, · · · , C and arerelated with a set of privacy settingsP , Ψ(Ci, t) is a subsetof Ψ(C,P ) and it is used to indicate the semantically-similarsocial images which contain the privacy-sensitive object classCi and are assigned with the specific privacy settingt,Ψ(Ci, t) ⊆ Ψ(C,P ), Ci ∈ C and t ∈ P . ‖ Ψ(Ci, t) ‖ isthe number of semantically-similar social images inΨ(Ci, t)and‖ Ψ(C,P ) ‖ is the number of semantically-similar socialimages inΨ(C,P ).

It is more straightforward to calculate the object-privacyrel-evance scoreγ(Ci, t) by using the co-occurrences‖ Ψ(Ci, t) ‖between the given privacy settingt and the object classCi

because their co-occurrences‖ Ψ(Ci, t) ‖ can indicate theprobability for the privacy settingt to be assigned with theobject classCi. For object-privacy relevance score estimation,we normalize the co-occurrences‖ Ψ(Ci, t) ‖ because of


Fig. 5. Image clustering results, where the semantically-similarsocial images are illustrated according to their semantic similarities.

two reasons: (a) each cluster may consist of large amounts ofsemantically-similar social images and contain large numbersof object classes; (b) our focus is to find the common privacysettings for all the object classes.

For a given set of object classes, there may have high co-occurrencesφ(·, ·) in massive social images, thus an objectco-occurrence network is constructed for characterizing suchinter-object co-occurrences and providing a good environmentto refine the object-privacy relevance scores iteratively.Ourobject co-occurrence network consists of two key components:(a) object tags for interpreting the object classes; and (b)theirco-occurrences in massive social images.

For two given object classesCi andCj , their co-occurrenceφ(Ci, Cj) is defined as:

φ(Ci, Cj) = ρ(Ci, Cj) logρ(Ci, Cj)

ρ(Ci) + ρ(Cj)(4)

where ρ(Ci, Cj) is the co-occurrence probability for twoobject classesCi andCj in massive social images,ρ(Ci) andρ(Cj) are their individual occurrence probabilities.

ρ(Ci, Cj) =N(Ci, Cj)

N, ρ(Ci) =

N(Ci)

N, ρ(Cj) =

N(Cj)

N(5)

whereN(Ci, Cj) is the number of social images which containtwo object classesCi and Cj simultaneously,N(Ci) is thenumber of social images which contain the object classCi,N(Cj) is the number of social images which contain the objectclassCj , N is the total number of social images.

The object classes, which have large values of co-occurrencesφ(·, ·), are connected to form an object co-occurrence network. The object classes, which frequently co-occur in massive social images, are strongly related and theymay share similar privacy settings. For examples, multipleobject classes, such computer screens, offices and notebooks,may co-occur frequently in the same social images and sharesimilar privacy settings for image sharing. One small part ofour object co-occurrence network is shown in Fig. 6, whereeach object class is linked with multiple object classes withlarger values ofφ(·, ·). Our object co-occurrence network canprovide a good environment for refining the object-privacyrelevance scores and identifying the privacy settings for largenumbers of object classes.

To achieve more accurate alignment, a random walk processis further performed over such object co-occurrence networkfor refining the object-privacy relevance scores iteratively.Given an image cluster which containsm object classes, weuse ρk(Ci, t) to denote the object-privacy relevance score

Fig. 6. A small part of our object co-occurrence network.

between the object classCi and the privacy settingt at thekthiteration, e.g.,ρ0(Ci, t) = γ(Ci, t) is the initial object-privacyrelevance score as defined in Eq.(3). We further defineΦ asanm×m transition matrix, its elementψij is used to definethe transition probability from the object classCi to its inter-related object classCj on our object co-occurrence network,e.g., such inter-object transition means that the frequently co-occurring object classes may share similar privacy settings forimage sharing.ψij is defined as:

ψij =φ(Ci, Cj)

∑

Ck∈ΩCiφ(Ci, Ck)

(6)

whereΩCiis the first-order neighbors of the object classCi

on our object co-occurrence network,φ(Ci, Cj) is the inter-object correlation betweenCi andCj as defined in Eq. (4).

For a given object classCi and a given privacy settingt, therandom walk process over our object co-occurrence networkfor refining their relevance score is formulated as:

ρk(Ci, t) = θ∑

Cj∈ΩCi

ρk−1(Ci, t)ψij + (1− θ)γ(Ci, t) (7)

where γ(Ci, t) is the initial object-privacy relevance scorebetween the given object classCi and the given privacy settingt, ρk−1(Ci, t) is the object-privacy relevance score at the(k−1)th iteration andθ = 0.4 is a weight parameter (e.g., theoriginal relevance score is more important than the transitionrelevance score). For a given object classCi, all the relevant


privacy settings are re-ranked according to their object-privacyrelevance scores withCi and the privacy setting with thelargest object-privacy relevance score is finally selectedforthe given object classCi.

By assigning the privacy settings (given at the image level)into the most relevant object classes precisely, our object-privacy alignment algorithm can effectively: (a) identifya largeset of privacy-sensitive object classes (which their privacysettings are private); and (b) determine the object-privacycorrespondences. Finally, recommending the privacy settingsfor the images being shared can be achieved by detecting theirunderlying privacy-sensitive objects and using such object-privacy correspondences to recommend the best-matchingprivacy settings for image sharing. In our current implemen-tation, we have identified 268 privacy-sensitive object classes.

IV. V ISUAL TREE CONSTRUCTION

The second step in our iPrivacy system is to provide aneffective organization of large numbers of privacy-sensitiveobject classes. For this, a visual treeT = (V,E) is learned andit comprises a set of nodesV and a set of edgesE. Each non-leaf nodec ∈ V is associated with a set of privacy-sensitiveobject classesL(c) ⊆ 1, . . . ,M, each leaf node is associatedwith one particular privacy-sensitive object class. For a givennon-leaf nodec, it contains only a subset of privacy-sensitiveobject classes for its parent node.

For two privacy-sensitive object classesCj and Ci, theirinter-class visual similarityS(Ci, Cj) is defined as:

S(Ci, Cj) =1

R2

∑

xl∈Ci

∑

xh∈Cj

κo(xl, xh) (8)

whereκo(·, ·) is the Gaussian kernel function for similaritycharacterization,xl and xh are the deep features for thelthsocial image from the object classCi and thehth socialimage from the object classCj , R is the total number ofsocial images from each object class. GivenN privacy-sensitive object classes, their inter-class similarity matrix S

is obtained automatically and its component is defined asSij = S(Ci, Cj).

For a large set of privacy-sensitive object classes, their inter-class visual similarities can be used as a proxy to determinetheir inter-class separability. The visually-similar privacy-sensitive object classes are more difficult to be separated by thenode classifiers, they should be assigned into the same coarse-grained group (superclass) of privacy-sensitive object classesto avoid early incorrect partitioning at a high-level node of thevisual tree, i.e., partitioning mistakes at the high-levelnodesare more critical because of inter-level error propagation. Thetop-down partitioning of large numbers of privacy-sensitiveobject classes starts from the root node that contains all theseN privacy-sensitive object classes and ends at the leaf nodes(i.e., each leaf node contains one particular privacy-sensitiveobject class).

For the current non-leaf nodec (which containsM privacy-sensitive object classes), itsM privacy-sensitive object classesare partitioned intoB smaller groups (i.e.,B child nodes at

Fig. 7. Hierarchical partitioning of the inter-class similarity ma trix S forvisual tree construction: B = 4 and H = 5, where B is the branchingfactor to indicate the number of sibling child nodes under the sameparent node andH is the maximum depth of visual tree (from root nodeto leaf node).

next level) by minimizing inter-group visual similarity andmaximizing intra-group visual similarity:

min

ψ(c,B) =B∑

l=1

∑

Ci∈Gl

∑

Cj∈Gc/GlS(Ci, Cj)

∑

Ci∈Gl

∑

Ch∈GlS(Ci, Ch)

(9)

whereS(Ci, Cj) is the inter-class visual similarity betweentwo privacy-sensitive object classesCi andCj , Gc = Gl|l =1, · · · , B is used to representB groups (clusters) ofMprivacy-sensitive object classes for the current non-leafnodec, Gc/Gl is used to represent otherB−1 groups inGc exceptGl.

This node partitioning process is performed repeatedly untila complete tree is created (i.e., each leaf node contains onesingle privacy-sensitive object class). Note that if the set L(c)for the current non-leaf nodec has less thanB privacy-sensitive object classes, then only|L(c)| child nodes aregenerated. Thus the visually-similar privacy-sensitive objectclasses, which share significant common visual properties butcontain subtle visual differences, are finally assigned intothe sibling leaf nodes under the same parent node. Suchhierarchical clustering process for visual tree construction isillustrated in Fig. 7 for one specific branching factorB = 4and the maximum depthH = logB N .

By organizing large numbers of privacy-sensitive objectclasses hierarchically in a coarse-to-fine fashion, our visualtree can provide a good environment for determining the inter-related learning tasks automatically, which may allow ourhierarchical deep multi-task learning (HD-MTL) algorithmtolearn more discriminative tree classifier. In addition, learning


Fig. 8. The illustration of the key components of our HD-MTL algorithm :(a) 3 commonly-shared convolutional layers for all the privacy-sensitiveobject classes; (b) 2 group-specific convolutional layers and 2 fully-connected layers for the visually-similar privacy-sensitive object classesin the same group; (c) the last layer for the tree classifier and the visualtree to replace the flat softmax-layer.

the tree classifier for hierarchical object detection can reducethe computational cost significantly by ruling out unlikelygroups of privacy-sensitive object classes (i.e., irrelevant high-level nodes) at an early stage, so that we can achieve fast andaccurate detection of large numbers of privacy-sensitive objectclasses.

V. JOINT LEARNING OF DEEPCNNS AND TREE

CLASSIFIER FORLARGE-SCALE OBJECTDETECTION

The third step in our iPrivacy system is to learn the treeclassifier and the deep CNNs jointly over the visual tree in anend-to-end fashion, so that we can achieve fast and accuratedetection of large numbers of privacy-sensitive object classes.As illustrated in Fig. 8, our deep network for hierarchical ob-ject detection are partitioned into three parts: (a) 3 commonly-shared convolutional layers (i.e., the first 3 convolutionallayers in CNNs) to learn the common representations forall the privacy-sensitive object classes; (b) 2 group-specificconvolutional layers (i.e., the fourth and the fifth convolutionallayers in CNNs) and 2 group-specific fully-connected layersto learn the class-specific representations for the visually-similar privacy-sensitive object classes in the same group, e.g.,the sibling privacy-sensitive object classes under the sameparent node; and (c) the last layer for the tree classifier toreplace the flat softmax-layer [23-25]. Because each group(superclass) contains only a small number of visually-similarprivacy-sensitive object classes (i.e., each parent node containsat mostB sibling leaf nodes on our visual tree), we scaledown the number of kernel mappings for the 2 group-specificconvolutional layers and 2 fully-connected layers into10%of that for the deep CNNs in Caffe [24], and Dropout [32] isapplied to 2 group-specific fully-connected layers with a valueof 0.5 to prevent over-fitting.

A bottom-upapproach is further developed to achieve jointlearning of the deep CNNs and the tree classifier over thevisual tree: (a) To distinguish the visually-similar privacy-sensitive object classes at the sibling leaf nodes under the

same parent node, adeep multi-task learningalgorithm isdeveloped to train their multi-task softmax classifiers jointlyfor enhancing their discrimination power, and a joint objectivefunction is used to simultaneously refine both the multi-tasksoftmax classifiers for the sibling leaf nodes (i.e., visually-similar privacy-sensitive object classes under the same parentnode) and the deep CNNs for image representation; (b) Ahierarchical deep multi-task learning(HD-MTL) algorithm isdeveloped to train more discriminative classifiers for high-levelnodes and control the inter-level error propagation effectively,where a joint objective function is used to simultaneouslyupdate both the tree classifier and the deep CNNs.

A. Deep Multi-Task Learning

It is worth noting that our visual tree has provided agood environment to identify the inter-related learning tasksautomatically, e.g., the tasks for learning the classifiersfor thesibling leaf nodes under the same parent node are stronglyinter-related because such sibling privacy-sensitive objectclasses share some common visual properties significantly.Adeep multi-task learning algorithm is developed to train suchinter-related classifiers jointly to enhance their discriminationpower, where the inter-task relationships are leveraged toregularize the manifold model structures. The complexityfor joint classifier training can be controlled effectivelybyfocusing on at mostB sibling privacy-sensitive object classes(i.e., visually-similar privacy-sensitive object classes) underthe same parent node (superclass), and the negative imagescan be selected locally from other sibling leaf nodes underthe same parent node.

For a given parent nodech at the second level of the visualtree, the multi-task softmax classifiers for its visually-similarprivacy-sensitive object classes (its sibling leaf nodes at thefirst level of the visual tree) are trained simultaneously byoptimizing a joint objective function:

min

ν

R∑

l=1

B∑

j=1

ξlj + δ1Tr

(

WWT)

+δ2

2Tr

(

WLWT)

(10)

subject to:

∀Rl=1∀Bj=1 : ylj

(

WTj · xlj + b

)

≥ 1− ξlj , ξlj ≥ 0 (11)

whereR is the number of training images for each privacy-sensitive object class,Wj is the classifier parameter for thejthprivacy-sensitive object classCj , Tr(·) is used to representthe trace of matrix,ξlj indicates the training error rate,δ1and δ2 are the regularization parameters,ν is the penaltyterm, W = (W1, · · · ,Wj , · · · ,WB), L is the Laplacianmatrix of the relevant inter-class similarity matrixS. Theinter-class visual similaritiesS(·, ·) as defined in Eq. (8) areused to approximate the inter-task relationships, the manifoldregularization termTr

(

WLWT)

is used to enforce that: iftwo visually-similar privacy-sensitive object classesci andcjhave larger inter-class visual similarityS(·, ·), their multi-tasksoftmax classifiers will have stronger correlations. The dualproblem is then defined as:

min

B∑

j=1

R∑

l=1

βj

l −1

2δ1βTY ℜ

(

ℜ+δ2

δ1ℜ

(

L⊗

I

)

ℜ

)

−1

ℜY β

(12)


subject to:

∀Rl=1∀Bj=1 :

R∑

l=1

βjl · y

jl = 0, 0 ≤ βj

l ≤ 1 (13)

whereY = [ylj | l ∈ 1, R, j ∈ 1, B]T ∈ 0, 1R×B andylj ∈ 0, 1B×1 is the class indicator vector for the trainingimagexlj , I is the identical matrix,L

⊗

I is the Kroneckerproduct between two matrixL and I, β = (β1, · · · , βj , · · · ,βB) is the set of the dual variables,βj = (βj

1, · · · , βjl , · · · ,

βjR), ℜ is a block diagonal similarity matrix and it is defined

as:

ℜ =

ℜ1

. . .ℜj

. . .ℜB

(14)

where ℜj ∈ RR×R is the similarity matrix forR trainingimages for thejth privacy-sensitive object class.

After the optimalβ∗ is obtained by solving Eqs. (12-13),we can compute the optimalα∗ = (α∗

1, · · · , α∗j , · · · , α∗

B) as:

α∗ =1

2δ1

(

ℜ+δ2δ1

(

ℜ(

L⊗

I)

ℜ)−1

ℜY β∗

)

(15)

where α∗j = (α1∗

j , · · · , αl∗j , · · · , αR∗

j ). Finally, the multi-task softmax classifiers for the sibling privacy-sensitiveobjectclasses under the same parent node (superclass)ch are definedas:

∀Bj=1 : f1cj (x) |F 1cj=

R∑

l=1

αl∗j κ(x

lj , x) + b∗j , cj ∈ ch (16)

By embedding the inter-class visual similarities (i.e., inter-task relationships) into a manifold structure regularizationterm, our deep multi-task learning algorithm can explicitlyconsider the differences of the inter-task relationships and theireffects on multi-task learning. By focusing on the visually-similar privacy-sensitive object classes under the same parentnode (superclass), our deep multi-task learning algorithmcaneffectively control the complexity for joint classifier trainingand achieve good sample balance by selecting the negativeinstances locally from other visually-similar privacy-sensitiveobject classes under the same parent node. Our deep multi-tasklearning algorithm can simultaneously compare and contrastthe visually-similar privacy-sensitive object classes toenhancetheir separability, thus it is able to establish two separabledecision functions: (1) the common prediction function sharedamong the multi-task softmax classifiers for the visually-similar privacy-sensitive object classes under the same parentnode; and (2) the class-specific prediction function for themulti-task softmax classifier for each privacy-sensitive objectclass.

By explicitly separating the common prediction functionfrom the class-specific prediction function, our multi-tasksoftmax classifiers can have higher discrimination poweron distinguishing the visually-similar privacy-sensitive objectclasses under the same parent node (which are usually hard

to be separated), e.g., learning such inter-related tasks jointlycan reduce the risk of over-fitting to a specific detection taskand boost the performance of all the individual detectiontasks. Because the visually-similar privacy-sensitive objectclasses have similar learning difficulty, the gradients of theirjoint objective function could be more uniform and the back-propagation operations can stick on reaching the global opti-mum effectively.

We jointly learn the multi-task softmax classifiers for mul-tiple visually-similar privacy-sensitive object classesunderthe same parent node and the deep CNNs (3 commonly-shared convolutional layers and 2 group-specific convolutionallayers and 2 fully-connected layers) according to the jointobjective function as defined in Eqs. (11-12). The errorsof these inter-related learning tasks are back-propagatedtoupdate the weights for the deep CNNs [23-25, 31]. Given atraining instance, the predictions of the inter-related tasks arecalculated. We formulate the training error rateξlj in the formof softmax regression:

ξlj = −Iyljlog

exp(WTj x

lj + b)

∑Bi=1 exp(W

Ti x

li + b)

(17)

where Iylj is the indicator function such thatIylj = 1if ylj = 1 (i.e., (xlj , y

lj) is the positive training image for the

privacy-sensitive object classcj), otherwiseIylj = 0. Thejoint objective function in Eqs.(11-12) is then reformulated as:

£(W,X, Y ) = −

R∑

l=1

B∑

j=1

Iyljlog

exp(WTj x

lj + b)

∑Bi=1 exp(W

Ti x

li + b)

+

δ1Tr(

WWT)

+δ2

2Tr

(

WLWT)

+λ

R∑

l=1

B∑

j=1

ylj

(

WTj · x

lj + b

)

−

1 + Iyljlog

exp(WTj x

lj + b)

∑Bi=1 exp(W

Ti x

li + b)

(18)

whereλ is the Lagrange weight parameter. The joint objectivefunction £(W,X, Y ) is optimized by using the stochasticalternating direction method of multipliers (ADMM) algorithm[33-34], which is able to handle non-smooth regularizationterm. The corresponding gradients for the joint objectivefunction £(W,X, Y ) are calculated as∂£(W,X,Y )

∂W , and theyare back-propagated [31] through the deep CNNs (both 3commonly-shared convolutional layers and 2 group-specificconvolutional layers and 2 fully-connected layers) to fine-tunethe weights.

B. Hierarchical Deep Multi-Task Learning

One salient principle of our hierarchical object detectionapproach is that:an image or an object proposalx fromthe image should first be assigned into the parent node(i.e.,yl+1ch

· f l+1ch

(x) ≥ 0, yl+1ch

= +1) correctly if it can further beassigned into the child node(i.e., ylcj · f

lcj (x) ≥ 0, ylcj = +1),

thus aninter-level relationship constraint(discriminative reg-ularization term) is defined as:

∀Bj=1 : f l+1ch

(x)− f lcj (x) ≥ 0, cj ∈ ch (19)


Fig. 9. Our experimental results on integrating the deep CNNs and thetree classifier to detect and recognize the most significant privacy-sensitiveobject class (i.e., human beings) and protecting image privacy via simply blurring privacy-sensitive objects: (a) original image; (b) human regionsdetected by deep learning; (c) detected human objects; (d) face identification for privacy-sensitive objects; (e) faceblurring for privacy-sensitiveobjects; (f) shared images with privacy protection via simple face blurring.

Such inter-level relationship constraint (discriminative regu-larization term) can fully capture the correlation of inter-levelpredictions and its effects on hierarchical classifier training,e.g., it can force our hierarchical deep multi-task learning (HD-MTL) algorithm to train more discriminative classifiers forthehigh-level nodes on the visual tree, so that the tree classifiercan control the inter-level error propagation effectively.

It is worth noting that distinguishing the coarse-grainedgroups of privacy-sensitive object classes on the high-levelnon-leaf nodes is much easier than distinguishing the visually-similar privacy-sensitive object classes on the sibling leafnodes, thus it is possible for us to learn more discriminativeclassifiers for the high-level non-leaf nodes on the visual tree.

Another salient principle of our hierarchical object detectionapproach is that:all the images or all the object proposals,which can be assigned into the same parent node correctly,should further be able to be assigned into the relevant childnodes, thus aninter-level complementarity constraint(discrim-inative regularization term) is defined as:

∀Bh=1 : f l+1ch

(x) =B∑

j=1

ηjflcj (x) (20)

whereηj is defined as:

ηj =exp

(

−f lcj (x))

∑Bi=1 exp

(

−f lci(x))

(21)

.Thus training the multi-task softmax classifiers for the

sibling non-leaf nodes under the same parent nodeck isachieved by:

min

ν

R∑

m=1

B∑

h=1

ξmj + γ1Tr(

WWT)

+γ22Tr

(

WLWT)

(22)subject to:

∀Rm=1∀Bh=1 : ymh

(

WTh · xmh + b

)

≥ 1− ξmh , ξmh ≥ 0, ch ∈ ck

(23)

∀Bh=1 : f l+1ch

(x)− f lcj (x) ≥ 0 (24)

∀Bh=1 : f l+1ch

(x) =B∑

j=1

ηjflcj (x) (25)

where W = (W1, · · · ,Wh, · · · ,WB). The bundle method[35-36] is used to solve the optimization problem as definedin Eqs. (23-25).


Fig. 10. Our experimental results on integrating the deep CNNs and thetree classifier to detect and recognize the most significant privacy-sensitiveobject class (i.e., human beings) and protecting image privacy via simply blurring privacy-sensitive objects: (a) original image; (b) human regionsdetected by deep learning; (c) detected human objects; (d) face identification for privacy-sensitive objects; (e) faceblurring for privacy-sensitiveobjects; (f) shared images with privacy protection via simple face blurring.

By leveraging the visual tree to generate subtrees (eachsubtree contains one parent node and at mostB sibling childnodes) iteratively and determine the inter-related learning tasksautomatically, our hierarchical deep multi-task learning(HD-MTL) algorithm can providean iterative solutionfor large-scale machine learning, so that training large numbers of nodeclassifiers over the visual tree (i.e., tree classifier) becomescomputationally tractable. Such tree classifier can effectivelyrule out unlikely coarse-grained groups of privacy-sensitiveobject classes (i.e., irrelevant high-level nodes on the visualtree) at an early stage, which can significantly reduce thecomputational cost for detecting and recognizing the privacy-sensitive objects from the images to be shared. By leverag-ing two inter-level constraints(discriminative regularizationterms) to force our HD-MTL algorithm on training morediscriminative classifiers for the high-level non-leaf nodes onthe visual tree, our tree classifier can control the inter-levelerror propagation effectively and obtain high accuracy rateson large-scale object detection.

For the sibling high-level non-leaf nodes on the visual tree,we jointly train: (a) their multi-task softmax classifiers;and(b) the deep CNNs according to the joint objective function

as defined in Eqs. (22-25). We use back-propagation to update:(1) the multi-task softmax classifiers for the sibling high-levelnon-leaf nodes under the same parent node; (2) the multi-tasksoftmax classifiers for their child nodes at the lower levelsofthe visual tree until the relevant leaf nodes; and (3) the weightsfor the deep CNNs. Given a training image or an objectproposal from the training image, predictions of the inter-related tasks (i.e., the sibling high-level non-leaf nodesunderthe same parent node) are calculated, and the correspondinggradients are back-propagated [31] to fine-tune both the treeclassifier and the deep CNNs simultaneously.

VI. SOFT PREDICTION FORHIERARCHICAL OBJECT

DETECTION

The fourth step in our iPrivacy system is to automaticallydetect the privacy-sensitive objects from the images beingshared, recognize their classes, and identify their privacysettings for image sharing. After the tree classifier and thedeepCNNs are available, they are used to predict the label (objectclass) for a given image or an object proposalx in the image,i.e., identifying its best-matching privacy-sensitive object classhierarchically. The overall algorithm is the following: the


image or the object proposalx traverses the tree classifierstarting from the root node of the visual tree. At each level,the best-matching entry that has the largest score (defined as ρin the following) is selected and then it continues to the childnode pointed by this entry. The search stops until reaching theleaf node whereby the best matching leaf node that containsthe privacy-sensitive object class finally.

For a given imagex being shared or an object proposalxfrom the given image, its confidence scoreρ(cj , x), with themulti-task softmax classifierf lcj (x) for the non-leaf nodecjat thelth level of the visual tree, is defined as:

ρ(cj , x) =exp

(

−f lcj (x))

∑Bi=1 exp

(

−f lci(x))

(26)

The marginδ(cj , ci, x) between its confidence scores with twosibling non-leaf nodescj andci (at thelth level of the visualtree) under the same parent nodech (at the(l + 1)th level ofthe visual tree) is defined as:

δ(cj , ci, x) =| ρ(cj , x)− ρ(ci, x) |, cj , ci ∈ ch (27)

Finally, the multi-task softmax classifiers for the siblingprivacy-sensitive object classes (i.e., sibling leaf nodes) underthe selected parent nodec (at the second level of visual tree)are used to decide: (1) The given imagex or the objectproposalx from the image is assigned into one of these siblingprivacy-sensitive object classes under the selected parent nodec if the corresponding confidence scoreρ(·, x) is above a giventhresholdT1 or the marginδ(·, ·, x) is above a given thresholdT2; (2) Otherwise, it is assigned into the uncertain category ornew privacy-sensitive object class.

New(x) =

0, if ρ(·, x) < T1 or δ(·, ·, x) < T2;

1, otherwise(28)

If the appearance of new privacy-sensitive object class isdetected, a new leaf node is inserted under the selected parentnodec because of the atomicity of the new object class. Sinceonly the most confident (best-matching) child node is chosenat each step, we call such approach ashard prediction.

One problem for hard prediction is that only one single childnode is selected at each step, even the difference between theprediction scores for all the sibling child nodes under the sameparent node does not exceed a considerable margin. It is morelikely that a wrong child node would be chosen early andsuch mistake will be propagated along the path until the leafnodes. This inevitably degrade the performance of hierarchicaldetection and recognition of large numbers of privacy-sensitiveobject classes.

To alleviate the performance degradation, asoft predictionapproach is developed, where top 2 child nodes (which are or-dered according to their prediction scores) are simultaneouslychosen at each step if the difference between their predictionscores does not exceed a considerable margin. In such softprediction approach, multiple leaf nodes can be reached bydifferent prediction paths (i.e., the image or the object proposalfrom the image can be assigned into multiple privacy-sensitiveobject classes). However, in the context of detecting and

Fig. 11. The comparison on the accuracy rates on large-scale object de-tection and recognition when different approaches are usedfor classifiertraining: (a) our HD-MTL algorithm; (b) traditional HD-CNN app roach[29].

recognizing privacy-sensitive object classes for privacysettingrecommendation, each image or each object proposal shouldbe assigned into the best-matching privacy-sensitive objectclass rather than assigning it into many ones without orders.Based on this understanding, the chosen leaf nodes (selectedprivacy-sensitive object classes) are sorted according totheirconfidence scores, and each test image or each object proposalfrom the image is assigned into at most top 3 privacy-sensitiveobject classes in our soft prediction approach.

It is worth noting that we do not need to re-train the deepCNNs and the tree classifier for our soft prediction approach.By evaluating multiple prediction paths simultaneously atthetest side, our soft prediction approach can provide a goodsolution to avoid inter-level error propagation effectively, andit can also achieve a good balance between the computationalcost and the accuracy rates for detecting the privacy-sensitiveobjects from the images being shared, e.g., we can spendlittle computational cost (at the image sharing time) to gainreasonable improvement on the accuracy rates and recommendthe best-matching privacy settings for image sharing.

VII. E XPERIMENTAL RESULTS FORALGORITHM AND

SYSTEM EVALUATION

All our experiments for algorithm and system evaluation areconducted on a parallel set that comprises around800, 000social images and their privacy settings. To assess the ef-fectiveness of our proposed algorithms, our evaluation workfocuses on evaluating multiple factors: (a) whether our HD-MTL algorithm can obtain better results on detecting andrecognizing large numbers of privacy-sensitive object classes;(b) whether our HD-MTL algorithm can control the inter-levelerror propagation more effectively as compared with otherbaseline methods and achieve higher accuracy rates on large-scale object detection; (c) whether our HD-MTL algorithmcan provide better solution for detecting new privacy-sensitiveobject class; (d) whether blurring privacy-sensitive objectsmay significantly affect the utility of the shared images; and(e) whether detecting the privacy-sensitive object classes andlearning their object-privacy correspondences can allow usto recommend better privacy settings for image sharing andimage privacy protection.

As shown in Fig. 9 and Fig. 10, one can observe that ourHD-MTL algorithm can effectively detect and recognize the


Fig. 12. The comparison on the accuracy rates on large-scale objectdetection and recognition when different approaches are used for clas-sifier training: (a) our HD-MTL algorithm; (b) traditional hie rarchicalmulti-task learning approach [37]; and (c) label tree [38].

most important privacy-sensitive object class (i.e., human be-ings) from the images being shared. Detecting and recognizingsuch privacy-sensitive objects from images is the first andmost important step for automating image privacy protection.When such privacy-sensitive object classes are detected andrecognized from the images being shared, we can incorporatean simple approach to blur such privacy-sensitive objects orbackgrounds to protect the image privacy as illustrated in Fig.9 (f) and Fig. 10 (f). The average time cost is around 1.2seconds for detecting and recognizing the privacy-sensitiveobjects from an image being shared.

To evaluate the effectiveness of our HD-MTL algorithm oncontrolling the inter-level error propagation, we have comparedour HD-MTL algorithm with three baseline methods: (a)Traditional hierarchical multi-task learning approach [37]; (b)Label tree [38]; and (c) traditional HD-CNN approach [29]. Asshown in Fig. 11 and Fig. 12, the accuracy rates for our HD-MTL algorithm are higher than that for other three baselinemethods. The reasons are: (1) Using inter-level constraints(discriminative regularization terms) can force our HD-MTLalgorithm to learn more discriminative classifiers for high-levelnon-leaf nodes and control inter-level error propagation effec-tively; (2) Leveraging inter-task relationships to regularize themanifold model structures can allow our HD-MTL algorithmto explicitly consider the effects of the inter-task relationshipson multi-task learning; (3) By clustering large numbers ofprivacy-sensitive object classes into many groups accordingto their inter-class visual correlations, our visual tree canprovide a better environment for determining the inter-relatedlearning tasks. Because each group contains a small number ofvisually-similar privacy-sensitive object classes, the gradientsof their joint objective function are more uniform because suchvisually-similar privacy-sensitive object classes sharesimilarlearning complexity, thus our HD-MTL algorithm can obtainthe global optimum effectively. By explicitly leveraging theinter-class visual correlations for multi-task learning,our HD-MTL algorithm can achieve better performance on distinguish-ing the visually-similar privacy-sensitive object classes, whichare usually hard to be distinguished.

As shown in Fig. 13, when a soft prediction approach isfurther used for hierarchical detection and recognition oflargenumbers of privacy-sensitive object classes (i.e., two paths areexplored simultaneously at each step to achieve good balancebetween the accuracy rates and the computational efficiency),

Fig. 13. The comparison on the accuracy rates on detecting andrecognizing large numbers of privacy-sensitive object classes by usingsoft and hard prediction.

Fig. 14. The relationship between the accuracy rate for new objectdetection and the values for two thresholds (T1 and T2 in Eq. (28)).

our HD-MTL algorithm can achieve reasonable improvementon the accuracy rates for large-scale object detection. Byexploring multiple paths simultaneously, our soft predictionapproach has also provided another solution for controllingthe inter-level error propagation effectively.

In our hierarchical approach for detecting new privacy-sensitive object classes, we have used two thresholdsT1 andT2in Eq. (28), thus it is very attractive to evaluate the relationshipbetween the accuracy rate for new object detection and thesetwo thresholds. As shown in Fig. 14, after these two thresholdsreach the optimal values, increasing their values may inducesignificant decreasing of the accuracy rates, thus a cross-validation process is used to determine the optimal values forthese two thresholds.

Blurring the privacy-sensitive objects in the images beingshared may allow us to protect the image privacy effectively,but it may seriously affect the utility of the shared images.It

Fig. 15. The utility of blurred images could be low, i.e., sharingthe blurred images for learning deep CNNs and tree classifiermaydramatically decrease the accuracy rates for object detection.


Fig. 16. The utility of blurred images may change quickly when the sizesof blurred regions are increased.

is worth noting that the definition of image utility could besignificantly different for various tasks. For the task of jointlearning of the deep CNNs and the tree classifier, we havecompared two approaches: (a) using original images; and (b)using blurred images. As shown in Fig. 15, one can observethat using the blurred images may seriously mislead the jointprocess for learning the deep CNNs and the tree classifier,which may dramatically decrease the accuracy rates for objectdetection. For the same object class, as shown in Fig. 16,increasing the sizes of the blurred regions may significantlydecrease the detection accuracy rate because large sizes ofblurred regions may result in unrepresentative deep CNNs andgenerate irrelevant features for training the tree classifier withlower discrimination power.

Some pioneering researches have leveraged various features(both handcrafted and deep features) to train the classifiersfor supporting privacy setting recommendation [17-18, 49-50]. Thus it is also very interesting to evaluate whether usingdifferent types of features may bring significant improvementon privacy setting recommendation, e.g., recommending betterprivacy settings can achieve better protection of image privacyor result in less privacy disclosure. For100, 000 test images,we partition them into1000 smaller sets and perform our al-gorithm evaluation over these1000 image sets independently.For a given image set withT test images, its privacy disclosureis defined as:

PD =1

T

T∑

l=1

δ(PS(Il), OS(Il)) (29)

where T is the total number of test images,PS(Il) indi-cates the predicted privacy setting for thelth given imageIl, OS(Il) is its original privacy setting given by users.δ(PS(Il), OS(Il)) is defined as:

δ(PS(Il), OS(Il)) =

1, if PS(Il) 6= OS(Il)

0, otherwise(30)

The δ(·, ·) function is used to emphasize that the privacydisclosure is counted only if the privacy setting for thegiven image is predicted incorrectly. We sort the image setsaccording to their values of privacy disclosures and illustratethem in orders.

As shown in Fig. 17, we have compared the performanceof our classifier for privacy setting recommendation whendifferent types of features are used for image representationand classifier training. From these comparison experiments,we can observe multiple interesting factors: (1) For mostimage sets, the deep features can achieve the best performance

Fig. 17. The comparison on the effectiveness of our classifier on privacysetting recommendation when different types of features are extractedfor image representation and classifier training.

as compared with other handcrafted features such as SIFT,GIST and color histograms; (2) For some difficult image sets(with huge uncertainty on the object-privacy correspondences),all these features (including deep features) may not be ableto achieve acceptable performance, e.g., all of them havelarge privacy disclosures; (3) For some easy image sets (withvery good object-privacy correspondences), all these features(including handcrafted features) can achieve good performance(with small privacy disclosures).

VIII. C ONCLUSIONS ANDFUTURE WORK

To automate image privacy protection, a new approachcalled iPrivacy is developed in this paper to support privacysetting recommendation for image sharing. By learning theobject-privacy relatedness from massive social images, ourobject-privacy alignment algorithm can allow us to identifya large set of privacy-sensitive object classes and their privacysettings automatically. By learning the deep CNNs and thetree classifier jointly over the visual tree in an end-to-endway, our HD-MTL algorithm can achieve fast and accuratedetection of large numbers of privacy-sensitive object classesand recommend the best-matching privacy settings for imagesharing. An simple solution for image privacy protection isfur-ther provided by automatically blurring the privacy-sensitiveobjects during the image sharing process. Our experimentalresults have demonstrated that our HD-MTL algorithm canachieve very competitive results on both the accuracy ratesand the computational efficiency.

Our future work will focus on developing new algorithmsto: (a) avoid privacy inference through analyzing the inter-classor object-background co-occurrence contexts; (b) leverage ourobject detection results to automatically identify a suitablegroup of friends for image sharing; (c) define the image utilityfor various tasks and quantify the privacy metrics; (d) performuser study for achieving a good balance between the privacydisclosure and the utility (beautifulness) of the images whenblurring function is used for privacy protection, and we willdevelop an interactive system to allow each subject to negotiateamong the privacy disclosure, the image utility (beautifulness)and the significances of social impacts (benefits); and (e)


integrate other information sources (such as social networks)to improve our algorithm for large-scale object detection.

REFERENCES

[1] C. Wang, B. Zhang, K. Ren, J.M. Roveda, “Privacy-assuredoutsourcingof image reconstruction service in cloud”,IEEE Trans. on EmergingTopics in Computing, 2013.

[2] O. Nov, M. Naaman, C. Ye, “Motivational, structural and tenure factorsthat impact online community photo sharing”, ICWSM, 2009.

[3] A. C. Squicciarini, H. Xu, X. Zhang, “CoPE: Enabling collaborativeprivacy management in online social networks”,J. of American Societyfor Information Science and Technology, 2011.

[4] N. Wang, H. Xu, J. Grossklags, “Third-party Apps on Facebook: Privacyand the illusion of control”, ACM CHIMIT’11, 2011.

[5] M. Choudhury, H. Sundaram, Y.-R. Lin, A. John, D. Seligmann,“Connecting content to community in social media via image content,user tags and user communication”, IEEE ICME, 2009.

[6] C. Yeung, L. Kagal, N. Gibbins, N. Shadbolt, “Providing access controlto online albums based on tags and linked data”, AAAI Symposium,2009.

[7] J. Pesce, D. Casas, G. Rauder, V. Almeida, “Privacy attacks in socialmedia using photo tagging networks: A case study with facebook”, ACMPSOSM, 2012.

[8] D. Christin, P. Lopez, A. Reinhardt, M. Hollick, M. Kauer, “Sharing withstrangers: Privacy bubbles as user-centered privacy control for mobilecontent sharing applications”,Information Security Technical Report,vol.17, pp.105-116, 2013.

[9] M. Manban, P. van Oorschot, “Privacy-enhanced sharing of personalcontent on the Web”, ACM WWW, 2008.

[10] N. Vyas, A. Squicciarini, C. Chang, D. Yao, “Towards automatic privacymanagement in Web2.0 with semantic analysis on annotations”, IEEECollaborateCom, 2009.

[11] A. Squicciarini, M. Shehab, F. Paci, “Collective privacy management insocial network”, ACM WWW, 2009.

[12] P. Klemperer, Y. Liang, M. Mazurek, M. Sleeper, B. Ur, L. Bauer, L. F.Cranor, N. Gupta, M. Reiter, “Tag, you can see it!: using tagsfor accesscontrol in photo sharing”, in: CHI, 2012, pp. 377–386.

[13] R. Ravichandran, M. Benisch, P. Kelley, N. Sadeh, “Capturing socialnetworking privacy preferences”, in: SOUPS, 2009.

[14] J. Bonneau, J. Anderson, L. Church, “Privacy suites: shared privacy forsocial networks”, in: SOUPS, 2009.

[15] J. Bonneau, J. Anderson, G. Danezis, “Prying data out ofa socialnetwork”, in: ASONAM, 2009, pp. 249–254.

[16] L. Fang, K. LeFevre, “Privacy wizards for social networking sites’, ACMWWW, 2010.

[17] S. Zerr, S. Siersdorfer, J. hare, E. Demidova, “Privacy-aware imageclassification and search”, ACM SIGIR, 2012.

[18] A. C. Squicciarini, D. Lin, S. Karumanchi, N. DeSisto, “Automatic socialgroup organization and privacy management”, in: CollaborateCom, pp.89–96.

[19] O. Dekel, J. Keshet, Y. Singer, Large margin hierarchical classification,ICML, 2004.

[20] J. Wang, X. Shen, W. Pan, On large margin hierarchical classificationwith multiple paths,Journal of the American Statistical Association,vol.104, no.487, 2009.

[21] T. Evgeniou, C.A. Micchelli, M. Pontil, “Learning multiple tasks withkernel methods”,Journal of Machine Learning Research, vol.6, pp.615-637, 2005.

[22] H. Fei, J. Huan, Structured feature selection and task relationshipinference for multi-task learning, IEEE ICDM, 2011.

[23] A. Krizhevsky, I. Sutskever, G. E. Hinton, ImageNet classification withdeep convolutional neural networks, NIPS, 2012.

[24] Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S.Guadarrama, T. Darrell, Caffe: Convolutional architecturefor fast featureembedding, ACM Multimedia, 2014.

[25] J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, T.Darrell, DeCAF: A deep convolutional activation feature for genericvisual recognition, ICML, 2014.

[26] Z. Zhang, P. Luo, C. Loy, X. Tang, “Facial landmark detection by deepmulti-task learning”, ECCV, 2014.

[27] P. Teterwak, L. Torresani, “Shared roots: Regularizing neural networksthrough multitask learning”, TR2014-762, Dartmouth, 2014.

[28] S. Li, Z.-Q. Liu, A.B. Chan, “Heterogeneous multi-task learning forhuman pose estimation with deep convolutional neural network”, Intl J.of Computer Vision, vol.113, no.1, pp.19-36, 2015.

[29] Z. Yang, H. Zhang, R. Piramuthu, V. Jagadeesh, D. DeCoste, W. Di,Y. Yu, “HD-CNN: Hierarchical deep convolutional neural network forimage classification”, IEEE CVPR, 2015.

[30] N. Srivastava, R. Salakhutdinov, “Discriminative transfer learning withtree-based priors”, NIPS, 2013.

[31] Y. Lecun, L. Bottou, Y. Bengio, P. Haffner, “Gradient-based learningapplied to document recognition”,Proceeding of IEEE, 1998.

[32] N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever,R. Salakhutdinov,“Dropout: A simple way to prevent neural networks from overfitting”,Journal of Machine Learning Research, vol. 15, pp.1929-1958, 2014.

[33] S. Azadi, S. Sra, “Towards an optimal stochastic alternating directionmethod of multipliers”, ICML, 2014.

[34] H. Ouyang, N. He, L.T. Tran, A. Gray, “Stochastic alternating directionmethod of multipliers”, ICML, 2013.

[35] M. Zinkevich, A. Smola, M. Weimer, H. Li, Parallelized stochasticgradient descent, NIPS, 2010.

[36] L. Bottou, Large-scale machine learning with stochastic gradient de-scent, Proc. of the 19th Intl Conf. on Computational Statistics (COMP-STAT’2010), 2010.

[37] S. Gopal, Y. Yang, A. Niculescu-Mizil, Regularizationframework forlarge-scale hierarchical classification, ECML, 2012.

[38] S. Bengio, J. Weston, D. Grangier, Label embedding treesfor largemulti-class tasks, NIPS, 2010.

[39] M. Ames, M. Naaman, “Why we tag: Motivations for annotationinmobile and online media”, ACM CHI, 2007.

[40] D. Liu, X.-S. Hua, L. Yang, M. Wang, H.-J. Zhang, “Tag ranking”, ACMMultimedia, 2009.

[41] K. Weinberger, M. Slaney, R. van Zwol, “Resolving tag ambiguity”,ACM Multimedia, 2008.

[42] D. Liu, S. Yan, Y. Rui, H.-J. Zhang, “Unified tag analysiswith multi-edge graph”, ACM Multimedia, 2010.

[43] Y. Shen, J. Fan, “Leveraging loosely-tagged images and inter-objectcorrelations for tag recommendation”, ACM Multimedia, 2010.

[44] L-C Chen, G. Papandreou, I. Kokkions, K. Murphy, and A.L. Yuille, “Se-mantic image segmentation with deep convolutional neural networks”,ICLR, 2015.

[45] B. Hariharan, P. Arbelaez, R. Girshick, J. Malik, “Hypercolumns forobject segmentation and fine-grained locallization”, IEEE CVPR, 2015.

[46] P. Pinheiro, R. Collobert, “From image-level to pixel-level labeing withconvolutional networks”, IEEE CVPR, 2015.

[47] S. Zheng, S. Jayasumana, B. Romera-Paredes, V. Vineet, Z.Su, D.Du, C. Huang, P. Terr, “Conditional random fields as recurrent neuralnetworks”,

[48] J. Long, E. Shelhamer, T. Darrel, “Fully convolutional networks forsemantic segmentation”, IEEE CVPR, 2015.

[49] A. Tonge, C. Caragea, “Privacy prediction of images shared on socialmedia sites using deep features”, AAAI Symposium, 2015.

[50] E. Spyromitros-Xioufis, S. Papadopoulos, A. Popescu, Y.Kompatsiaris,“Personalized privacy-aware image classification”, ACM ICMR, 2016.

[51] P. Kontschieder, M. Fiterau, A. Criminisi1, S. Rota Bulo, “Deep neuraldecision forests”, IEEE ICCV, 2015.

Jianping Fan received his MS degree in theoryphysics from Northwestern University, Xian, Chinain 1994 and his PhD degree in optical storage andcomputer science from Shanghai Institute of Opticsand Fine Mechanics, Chinese Academy of Sciences,Shanghai, China, in 1997.

He was a Postdoc Researcher at Fudan University,Shanghai, China, during 1997-1998. From 1998 to1999, he was a Researcher with Japan Society ofPromotion of Science (JSPS), Department of In-formation System Engineering, Osaka University,

Osaka, Japan. From 1999 to 2001, he was a Postdoc Researcher in theDepartment of Computer Science, Purdue University, West Lafayette, IN. At2001, he joined UNC-Charlotte as Assistant Professor and hebecame as theFull Professor at 2012. His research interests include image/video privacyprotection, automatic image/video understanding, and statistical machinelearning.


Zhengzhong Kuangreceived the BS degree in com-puter science from China University of Petroleum,Tsingdao, China, in 2014. At 2015, he joined Uni-versity of North Carolina at Charlotte to pursue hisPhD degree on Information Technology. His researchinterests include computer vision, machine learning,and image privacy protection.

Baopeng Zhangreceived his Ph.D. degree in com-puter science from Tsinghua University in 2008.He is currently an assistant professor in the Schoolof Computer and Information Technology, BeijingJiaotong University, China. He is now a visitingscholar at UNC-Charlotte. His research interestsinclude semantic image/video classification and re-trieval, statistical machine learning, and large-scalesemantic data management and analysis.

Jun Yu (M13) received his BEng and PhD fromZhejiang University, Zhejiang, China. He is cur-rently a Professor with the School of ComputerScience and Technology, Hangzhou Dianzi Univer-sity. He was an Associate Professor with Schoolof Information Science and Technology, XiamenUniversity. From 2009 to 2011, he worked in Singa-pore Nanyang Technological University. From 2012-2013, he was a visiting researcher in MicrosoftResearch Asia (MSRA). He is now a visiting scholarat UNC-Charlotte. Over the past years, his research

interests include multimedia analysis, machine learning and image processing.He has authored and co-authored more than 50 scientific articles. He has(co-)chaired for several special sessions, invited sessions, and workshops. Heserved as a program committee member or reviewer top conferencesandprestigious journals. He is a Professional Member of the IEEE, ACM andCCF.

Dan Lin is an Associate Professor

IEEE TRANSACTIONS ON INFORMATION …faculty.missouri.edu/lindan/papers/iprivacy.pdfpersonal...

Documents

Transcript of IEEE TRANSACTIONS ON INFORMATION …faculty.missouri.edu/lindan/papers/iprivacy.pdfpersonal...