Perceived versus Actual Predictability of Personal Information in Social Networks

Perceived versus Actual Predictability of Personal Information in Social Networks Eleftherios (Lefteris) Spyromitros-Xioufis 1 , Georgios Petkos 1 , Symeon Papadopoulos 1 , Rob Heyman 2 , Yiannis Kompatsiaris 1 1 Center for Research and Technology Hellas – Information Technologies Institute (CERTH-ITI) 2 iMinds-SMIT, Vrije Universiteit Brussel, Brussels, Belgium INSCI 2016, Sep 12-14, 2016, Florence, Italy 1

Transcript of Perceived versus Actual Predictability of Personal Information in Social Networks

Perceived versus Actual Predictability of Personal Information in Social Networks

Perceived versus Actual Predictability of Personal Information in Social NetworksEleftherios (Lefteris) Spyromitros-Xioufis1, Georgios Petkos1, Symeon Papadopoulos1, Rob Heyman2, Yiannis Kompatsiaris1

1Center for Research and Technology Hellas Information Technologies Institute (CERTH-ITI)2iMinds-SMIT, Vrije Universiteit Brussel, Brussels, Belgium

INSCI 2016, Sep 12-14, 2016, Florence, Italy



Disclosure of Personal Information in OSNsOnline Social Networks (OSNs) have had transforming impact!People use it for communication, as news source, to make business,However, participation in OSNs comes at a price!User-related data is shared with: a) other OSN users, b) the OSN itself, c) third parties (e.g. ad networks)Disclosure of specific types of data:e.g. gender, age, ethnicity, political or religious beliefs, sexual preferences, financial status, etc.Has implications:e.g. unjustified discrimination in personnel selection / loan approval Information need not be explicitly disclosed!Several types of personal information can be accurately inferred based on implicit cues (e.g. Facebook likes) using machine learning!


Inferring Personal Information3[1] Kosinski, et al. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 2013.[2] Schwartz, et al. Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 2013.

Inferred Information & Privacy in OSNsStudy of user awareness with regard to inferred information largely neglected by social research on OSN privacyPrivacy usually presented as a question of giving access or communicating personal information to a particular partyE.g. Westins [1] definition of privacy: The claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others.However, access control is non-existent for inferred information:Users are unaware of the inferences being madeHave not control over their logicAim of our work:Investigate if and how users intuitively grasp what can be inferred from their disclosed data!4[1] Alan Westin. Privacy and freedom. Bodley Head, London, 1970.


Main Research QuestionsOur study attempts to answer the following questions:PredictabilityHow predictable different types of personal information are, based on users OSN data?Actual vs perceived predictabilityHow realistic are user perceptions about predictability of their personal information?Predictability vs sensitivityWhat is the relationship between perceived sensitivity and predictability of personal information?Previous work has focused mainly on Q1We address Q1 using a variety of data and methods and additionally we address Q2 and Q3


What data is needed for this study?We collected 3 types of data about 170 Facebook users:OSN data: likes, posts, imagesCollected through a test Facebook application (Databait1 developed within the USEMP2 FP7 project)Answers to questions about 96 personal attributes, organized3 into 9 categories (disclosure dimensions)E.g. health factors, sexual orientation, income, political attitude, etc.Answers to questions related to their perceptions about predictability and sensitivity of the 9 disclosure dimensionsWhat is the purpose of each data type?1 & 2 allow accessing actual predictability of personal informationTraining sets for supervised learning algorithms3 facilitates a comparison between actual predictability and perceived predictability/sensitivity of personal information 61 https://databait.hwcomms.com2


Example from the questionnaire 7What is your sexual orientation? Ground truth!

Do you think the information on your Facebook profile reveals your sexual orientation? Either because you yourself have put it online, or it could be inferred from a combination of posts. Measures perceived predictability

How sensitive do you find the information you had to reveal about your sexual orientation in the previous section? (1=not sensitive at all, 7= very sensitive) Measures perceived sensitivityResponseNo. of participantsheterosexual147homosexual14bisexual7n/a2

ResponseNo. of participantsyes134no33n/a3

Predictive Attributes Extracted from OSN Datalikes: binary vector denoting presence/absence of like (#3.6K)likesCats: histogram of like category frequencies (#191)likesTerms: Bag-of-Words (BoW) of terms in description, title and about sections of likes (#62.5K)msgTerms: BoW vector of terms in user posts (#25K)lda-t: Distribution of topics in the textual contents of both likes (description, title and about section) and postsLatent Dirichlet Allocation with t=20,30,50,100visual: concepts depicted in user images (#11.9K)Detected using CNN, top 12 concepts per images, 3 variantsvisual-bin: hard 0/1 encodingvisual-freq: concept frequency histogramvisual-conf: sum of detection scores across all images8

Experimental Setup9


Results 1: Evaluating Classifiers10


Results 2: Evaluating Features11



Results 3: Combining Features


Results 4: Best Performance per Attribute13


Ranking of Dimensions

14RankPerceived predictabilityActual predictabilityActual predictability according to [1]1DemographicsDemographics-Demographics2Relationship status and living conditionPolitical views +3Political views3Sexual orientationSexual orientation-Religious views4Consumer profileEmployment/Income+4Sexual orientation5Political viewsConsumer profile-1Health status6Personality traitsRelationship status and living condition-4Relationship status and living condition7Religious viewsReligious views-8Employment/IncomeHealth status+19Health statusPersonality traits-3

[1] Kosinski, et al. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 2013.


Perceived/Actual Predictability vs Sensitivity15


Conclusions & Future WorkConclusionsBoth correct and incorrect perceptions about predictabilityPredictability of sensitive information is underestimatedSophisticated privacy assistance tools are neededSupport users in managing disclosure of personal informationDatabait: a privacy assistance tool (still in beta mode)



Thank you!ResourcesCode/models: https://databait.hwcomms.comContact us

17@[email protected]@sympap [email protected]@kompats [email protected]