Multimedia Privacy

199
Multimedia Privacy Gerald Friedland Symeon Papadopoulos Julia Bernd Yiannis Kompatsiaris ACM Multimedia, Amsterdam, October 16, 2016

Transcript of Multimedia Privacy

Page 1: Multimedia Privacy

Multimedia PrivacyGerald FriedlandSymeon PapadopoulosJulia BerndYiannis Kompatsiaris

ACM Multimedia, Amsterdam, October 16, 2016

Page 2: Multimedia Privacy

What’s the Big Deal?

Page 3: Multimedia Privacy

Overview of Tutorial

• Part I: Understanding the Problem• Part II: User Perceptions About Privacy• Part III: Multimodal Inferences• Part IV: Some Possible Solutions• Part V: Future Directions

Page 4: Multimedia Privacy

Part I: Understanding the Problem

Page 5: Multimedia Privacy

What Can a Mindreader Read?• This vulnerability is a problem with any type of

public or semi-public post. They’re not specific to a particular type of information, e.g. text, image, or video.

• However, let’s focus on multimedia data: images, audio, video, social media context, etc.

Page 6: Multimedia Privacy

Multimedia on the Internet Is Big!

Source: Domosphere

Page 7: Multimedia Privacy

Resulting Problem• More multimedia data = Higher demand for

retrieval and organization tools.• But multimedia retrieval is hard!

• Researchers work on making retrieval better (cf. latest advances in Deep Learning for content-based retrieval).

• Industry develops workarounds to make retrieval easier right away.

Page 8: Multimedia Privacy

Hypothesis

• Retrieval is already good enough to cause major issues for privacy that are not easy to solve.

• Let’s take a look at some retrieval approaches:• Image tagging• Geo-tagging• Multimodal Location Estimation• Audio-based user matching

Page 9: Multimedia Privacy

Workaround: Manual Tagging

Page 10: Multimedia Privacy

Workaround: Geo-Tagging

Source: Wikipedia

Page 11: Multimedia Privacy

Geo-Tagging

Allows easier clustering of photo and video series, among other things.

Page 12: Multimedia Privacy

Geo-Tagging EverywherePart of the location-based service hype:

But: Geo-coordinates + Time = Unique ID!

Page 13: Multimedia Privacy

Support for Geo-Tags• Social media portals provide APIs to connect geo-

tags with metadata, accounts, and web content.

• Allows easy search, retrieval, and ad placement.

Portal %* Total

YouTube 3.0 3M

Flickr 4.5 180M

*estimate (2013)

Page 14: Multimedia Privacy

Hypothesis

• Since geo-tagging is a workaround for multimedia retrieval, it allows us to peek into a future where multimedia retrieval works perfectly.

• What if multimedia retrieval actually just worked?

Page 15: Multimedia Privacy

Related Work

“Be careful when using social location sharing services, such as Foursquare.”

Page 16: Multimedia Privacy

Related Work

Mayhemic Labs, June 2010: “Are you aware that Tweets are geo-tagged?”

Page 17: Multimedia Privacy

Can you do real harm?• Cybercasing: Using online (location-based) data and

services to enable physical-world crimes.

• Three case studies:

G. Friedland and R. Sommer: "Cybercasing the Joint: On the Privacy Implications of Geotagging", Proceedings of the Fifth USENIX Workshop on Hot Topics in Security (HotSec 10), Washington, D.C, August 2010.

Page 18: Multimedia Privacy

Case Study 1: Twitter• Pictures in Tweets can be geo-tagged

• From a tech-savvy celebrity we found:• Home location (several pics)• Where the kids go to school• Where he/she walks the dog• “Secret” office

Page 19: Multimedia Privacy

Celebs Unaware of Geo-Tagging

Source: ABC News

Page 20: Multimedia Privacy

Celebs Unaware of Geotagging

Page 21: Multimedia Privacy

Google Maps Shows Address...

Page 22: Multimedia Privacy

Case Study 2: Craigslist

“For Sale” section of Bay Area Craigslist.com:• 4 days: 68,729 pictures total - 1.3% geo-tagged

Page 23: Multimedia Privacy

Users Are Unaware of Geo-Tagging

• Many “anonymized” ads had geo-location • Sometimes selling high-value goods, e.g. cars,

diamonds, etc.• Sometimes “call Sunday after 6pm”• Multiple photos allow interpolation of coordinates

for higher accuracy

Page 24: Multimedia Privacy

Craigslist: Real Example

Page 25: Multimedia Privacy

Geo-Tagging Resolution

Measured accuracy: +/- 1m

iPhone 3G picture Google Street View

Page 26: Multimedia Privacy

What About Inference?

Owner

Valuable

Page 27: Multimedia Privacy

Case Study 3: YouTube

Recall:• Once data is published, the Internet keeps it (often

with many copies).• APIs are easy to use and allow quick retrieval of

large amounts of data.

Can we find people on vacation using YouTube?

Page 28: Multimedia Privacy

Cybercasing on YouTubeExperiment: Cybercasing using the YouTube API (240 lines in Python)

Page 29: Multimedia Privacy

Cybercasing on YouTube

Input parametersLocation: 37.869885,-122.270539Radius: 100kmKeywords: kids Distance: 1000km Time-frame: this_week

Page 30: Multimedia Privacy

Cybercasing on YouTubeOutput• Initial videos: 1000 (max_res)

• User hull: ~50k videos • Vacation hits: 106 • Cybercasing targets: >12

Page 31: Multimedia Privacy

The Threat Is Real!

Page 32: Multimedia Privacy

Question

Do you think geo-tagging should be illegal?a) No, people just have to be more careful. The

possibilities still outweigh the risks.b) Maybe it should be regulated somehow to make

sure no harm can be done.c) Yes, absolutely! This information is too

dangerous.

Page 33: Multimedia Privacy

But…

Is this really about geo-tags?

(remember: hypothesis)

Page 34: Multimedia Privacy

But…

Is this really about geo-tags?

No, it’s about the privacy implications of multimedia retrieval in general.

Page 35: Multimedia Privacy

QuestionAnd now? What do you think should be done? a) Nothing can be done. Privacy is dead.b) I will think before I post, but I don’t know that it

matters.c)We need to educate people about this and try to save privacy. (Fight!)

d) I’ll never post anything ever again! (Flight!)

Page 36: Multimedia Privacy

Observations• Many applications encourage heavy data sharing,

and users go with it.• Multimedia isn’t only a lot of data, it’s also a lot of

implicit information.• Both users and engineers often unaware of the

hidden retrieval possibilities of shared (multimedia) data.

• Local anonymization and privacy policies may be ineffective against cross-site inference.

Page 37: Multimedia Privacy

Dilemma• People will continue to want social networks and

location-based services. • Industry and research will continue to improve

retrieval techniques.• Government will continue to do surveillance and

intelligence-gathering.

Page 38: Multimedia Privacy

Solutions That Don’t Work• I blur the faces

•Audio and image artifacts can still give you away

• I only share with my friends •But who are they sharing with, on what platforms?

• I don’t do social networking •Others may do it for you!

Page 39: Multimedia Privacy

Further Observations

• There is not much incentive to worry about privacy, until things go wrong.

• People’s perception of the Internet does not match reality (enough).

Page 40: Multimedia Privacy

Basics: Definitions and Background

Page 41: Multimedia Privacy

Definition

• Privacy is the right to be let alone (Justices Warren and Brandeis)

• Privacy is:a) the quality or state of being apart from company

or observationb) freedom from unauthorized intrusion

(Merriam Webster’s)

Page 42: Multimedia Privacy

Starting Points

• Privacy is a human right. Every individual has a need to keep something about themselves private.

• Companies have a need for privacy.

• Governments have a need for privacy (currently heavily discussed).

Page 43: Multimedia Privacy

Where We’re At (Legally)

Keep an eye out for multimedia inference!

Page 44: Multimedia Privacy

A Taxonomy of Social Networking Data

• Service data: Data you give to an OSN to use it, e.g. name, birthday, etc.

• Disclosed data: What you post on your page/space• Entrusted data: What you post on other people’s

pages, e.g. comments• Incidental data: What other people post about you• Behavioural data: Data the site collects about you• Derived data: Data that a third party infers about you

based on all that other data

B. Schneier. A Taxonomy of Social Networking Data, Security & Privacy, IEEE, vol.8, no.4, pp.88, July-Aug. 2010

Page 45: Multimedia Privacy

Privacy Bill of Rights

In February 2012, the US Government released

CONSUMER DATA PRIVACY IN A NETWORKED WORLD:A FRAMEWORK FOR PROTECTING PRIVACY AND PROMOTING

INNOVATION IN THE GLOBAL DIGITAL ECONOMY

http://www.whitehouse.gov/sites/default/files/privacy-final.pdf

Page 46: Multimedia Privacy

Privacy Bill of Rights1) Individual Control: Consumers have a right to

exercise control over what personal data is collected from them and how they use it.

2) Transparency: Consumers have a right to easily understandable and accessible information about privacy and security practices.

3) Respect for Context: Consumers have a right to expect that organizations will collect, use, and disclose personal data in ways consistent with the context in which consumers provide the data.

Page 47: Multimedia Privacy

Privacy Bill of Rights4) Security: Consumers have a right to secure and

responsible handling of personal data.5) Access and Accuracy: Consumers have a right to

access and correct personal data in usable formats, in a manner that is appropriate to the sensitivity of the data and the risk of adverse consequences to citizens if the data is inaccurate.

Page 48: Multimedia Privacy

Privacy Bill of Rights6) Focused Collection: Consumers have a right to

reasonable limits on the personal data that organizations collect and retain.

7) Accountability: Consumers have a right to have personal data handled by organizations with appropriate measures in place to assure they adhere to the Consumer Privacy Bill of Rights.

Page 49: Multimedia Privacy

One View

The Privacy Bill of Rights could serve as a requirements framework for an ideally privacy-aware Internet service.

...if it were adopted.

Page 50: Multimedia Privacy

Limitations

• The Privacy Bill of Rights is subject to interpretation.

• What is “reasonable”?• What is “context”?• What is “personal data”?

• The Privacy Bill of Rights presents technical challenges.

Page 53: Multimedia Privacy

Personal Data Protection in EU• The Data Protection Directive* (aka Directive 95/46/EC on

the protection of individuals with regard to the processing of personal data and on the free movement of such data) is an EU directive adopted in 1995 which regulates the processing of personal data within the EU. It is an important component of EU privacy and human rights law.

• The General Data Protection Regulation, in progress since 2012 and adopted in April 2016, will supersede the Data Protection Directive and be enforceable as of 25 May 2018

• Objectives• Give control of personal data to citizens• Simplify regulatory environment for businesses

* A directive is a legal act of the European Union, which requires member states to achieve a particular result without dictating the means of achieving that result.

Page 54: Multimedia Privacy

When is it legitimate…?Collecting and processing the personal data of individuals is only legitimate in one of the following circumstances (Article 7 of Directive):• Individual gives unambiguously consent• If data processing is needed for a contract (e.g. electricity bill)• If processing is required by a legal obligation• If processing is necessary to protect the vital interests of the

person (e.g. processing medical data of an accident victim)• If processing is necessary to perform tasks of public interest• If data controller or third party have a legitimate interest in

doing so, as long as this does not affect the interests of the data subject or infringe his/her fundamental rights

Page 55: Multimedia Privacy

Obligations of data controllers in EU

Respect for the following rules:• Personal Data collected and used for explicit and

legitimate purposes • It must be adequate, relevant and not excessive in relation

to the above purposes• It must be accurate and updated when needed• Data subjects must be able to correct, remove, etc.

incorrect data about themselves (access)• Personal data should not be kept longer than necessary• Data controllers must protect personal data (incl. from

unauthorized access to third parties) using appropriate measures of protection (security, accountability)

Page 56: Multimedia Privacy

Handling sensitive dataDefinition of sensitive data in EU:

• religious beliefs• political opinions• health• sexual orientation• race• trade union membership

Processing sensitive data comes under stricter set of rules (Article 8)

Page 57: Multimedia Privacy

Enforcing data protection in EU?

• The Directive states that every EU country must provide one or more independent supervisory authorities to monitor its application.

• In principle, all data controllers must notify their supervisory authorities when they process personal data.

• The national authorities are also in charge of receiving and handling complaints from individuals.

Page 58: Multimedia Privacy

Data Protection: US vs EU• US has no legislation that is comparable to EU’s

Data Protection Directive.• US privacy legislation is adopted on ad hoc basis,

e.g. when certain sectors and circumstances require (HIPAA, CTPCA, FCRA)

• US adopts a more laissez-faire approach• In general, US privacy legislation is considered

“weaker” compared to EU

Page 59: Multimedia Privacy

Example: What Is Sensitive Data?

Public records indicate you own a house.

Page 60: Multimedia Privacy

Example: What Is Sensitive Data?

A geo-tagged photo taken by a friend reveals who attended your party!

Page 61: Multimedia Privacy

Example: What Is Sensitive Data?

Facial recognition match with a public record: Prior arrest for drug offense!

Page 62: Multimedia Privacy

Example: What Is Sensitive Data?

1) Public records indicate you own a house2) A geo-tagged photo taken by a friend reveals who

attended your party3) Facial recognition match with a public record:

Prior arrest for drug offense!

→ “You associate with convicts”

Page 63: Multimedia Privacy

Example: What Is Sensitive Data?

“You associate with convicts”

What will this do for your reputation when you:• Date?• Apply for a job?• Want to be elected to public office?

Page 64: Multimedia Privacy

Example: What Is Sensitive Data?

But: Which of these is the sensitive data?a) Public record: You own a houseb) Geotagged photo taken by a friend at your partyc) Public record: A friend’s prior arrest for a drug

offensed) Conclusion: “You associate with convicts.”e) None of the above.

Page 65: Multimedia Privacy

Who Is to Blame?

a) The government, for its Open Data policy?b) Your friend who posted the photo?c) The person who inferred data from publicly

available information?

Page 66: Multimedia Privacy

Part II: User Perceptions About Privacy

Page 67: Multimedia Privacy

Study 1: Users’ Understandings of Privacy

Page 68: Multimedia Privacy

The Teaching Privacy Project

• Goal: Create a privacy curriculum for K-12 and undergrad, with lesson plans, teaching tools, visualizations, etc.

• NSF sponsored. (CNS 1065240 and DGE-1419319; ‐all conclusions ours.)

• Check It Out: Info, public education, and teaching resources: http://teachingprivacy.org

Page 69: Multimedia Privacy

Based on Several Research Strands• Joint work between Friedland, Bernd, Serge Egelman,

Dan Garcia, Blanca Gordo, and many others! • Understanding of user perceptions comes from:

• Decades of research comparing privacy comprehension, preferences, concerns, and behaviors, including by Egelman and colleagues at CMU

• Research on new Internet users’ privacy perceptions, including Gordo’s evaluations of digital-literacy programs

• Observation of multimedia privacy leaks, e.g. “cybercasing” study

• Reports from high school and undergraduate teachers about students’ misperceptions

• Summer programs for high schoolers interested in CS

Page 70: Multimedia Privacy

Common Research Threads• What happens on the Internet affects the “real”

world.• However: Group pressure, impulse, convenience,

and other factors usually dominate decision making.

• Aggravated by lack of understanding of how sharing on the Internet really works.

• Wide variation in both comprehension and actual preferences.

Page 71: Multimedia Privacy

Multimedia Motivation• Many current multimedia R&D applications have a

high potential to compromise the privacy of Internet users.

• We want to continue pursuing fruitful and interesting research programs!

• But we can also work to mitigate negative effects by using our expertise to educate the public about effects on their privacy.

Page 72: Multimedia Privacy

What Do People Need to Know?Starting point: 10 observations about frequent misperceptions + 10 “privacy principles” to address them

Illustrations by Ketrina Yim.

Page 73: Multimedia Privacy

Misconception #1• Perception: I keep track of what I’m posting. I am in

control. Websites are like rooms, and I know what’s in each of them.

• Reality: Your information footprint is larger than you think!

• An empty Twitter post has kilobytes of publicly available metadata.

• Your footprint includes what others post about you, hidden data attached by services, records of your offline activities… Not to mention inferences that can be drawn across all those “rooms”!

Page 74: Multimedia Privacy

Misconception #2• Perception: Surfing is anonymous. Lots of sites allow

anonymous posting.• Reality: There is no anonymity on the Internet.

•Bits of your information footprint — geo-tags, language patterns, etc. — may make it possible for someone to uniquely identify you, even without a name.

Page 75: Multimedia Privacy

Misconception #3• Perception: There’s nothing interesting about what I

do online.• Reality: Information about you on the Internet will

be used by somebody in their interest — including against you.

•Every piece of informationhas value to somebody: otherpeople, companies, organizations, governments...•Using or selling your data is how Internet companies that provide “free” services make money.

Page 76: Multimedia Privacy

Misconception #4• Perception: Communication on the Internet is

secure. Only the person I’m sending it to will see the data.

• Reality: Communication over a network, unless strongly encrypted, is never just between two parties.

•Online data is always routed through intermediary computersand systems… •Which are connected to many more computers and systems...

Page 77: Multimedia Privacy

Misconception #5• Perception: If I make a mistake or say something

dumb, I can delete it later. Anyway, people will get what I mean, right?

• Reality: Sharing information over a network means you give up control over that information — forever!

•The Internet never forgets. Search engines, archives, and reposts duplicate data; you can’t “unshare”.•Websites sell your information, and data can be subpoenaed.

•Anything shared online is open to misinterpretation. The Internet can’t take a joke!

Page 78: Multimedia Privacy

Misconception #6• Perception: Facial recognition/speaker ID isn’t good

enough to find this. As long as no one can find it now, I’m safe.

• Reality: Just because it can’t be found today, it doesn’t mean it can’t be found tomorrow.

•Search engines get smarter.•Multimedia retrieval gets better.•Analog information gets digitized.•Laws, privacy settings, and privacy policies change.

Page 79: Multimedia Privacy

Misconception #7

• Perception: What happens on the Internet stays on the Internet.

• Reality: The online world is inseparable from the “real” world.

•Your online activities are as much a part of your life as your offline activities.•People don’t separate what they know about Internet-you from what they know about in-person you.

Page 80: Multimedia Privacy

Misconception #8• Perception: I don’t chat with strangers. I don’t

“friend” people on Facebook that I don’t know.• Reality: Are you sure? Identity isn’t guaranteed on

the Internet.•Most information that “establishes” identity in social networks may already be public.•There is no foolproof way to match a real person with their online identity.

Page 81: Multimedia Privacy

Misconception #9• Perception: I don’t use the Internet. I am safe.• Reality: You can’t avoid having an information

footprint by not going online.•Friends and family will post about you.•Businesses and government share data about you.•Companies track transactions online.•Smart cards transmit data online.

Page 82: Multimedia Privacy

Misconception #10• Perception: There’s laws that keep companies and

people from sharing my data. If a website has a privacy policy, that means they won’t share my information. It’s all good.

• Reality: Only you have an interest in maintaining your privacy!

•Internet technology is rarely designed to protect privacy. •“Privacy policies” are there to protect providers from lawsuits. •Laws are spotty and vary from place to place. •Like it or not, your privacy is your own responsibility!

Page 83: Multimedia Privacy

What Came of All This?Example: “Ready or Not?” educational app

LINK

Page 84: Multimedia Privacy

What Came of All This?Example: “Digital Footprints” video

Page 85: Multimedia Privacy

Study 2: Perceived vs. Actual Predictability of Personal Information in Social Nets

Papadopoulos and Kompatsiaris with Eleftherios Spyromitros- Xioufis, Giorgos Petkos, and Rob Heyman (iMinds)

Page 86: Multimedia Privacy

Personal Information in OSNsParticipation in OSNs comes at a price!• User-related data is shared with:

• a) other OSN users, b) the OSN itself, c) third parties (e.g. ad networks)

• Disclosure of specific types of data:• e.g. gender, age, ethnicity, political or religious beliefs,

sexual preferences, employment status, etc.

• Information isn’t always explicitly disclosed!• Several types of personal information can be accurately

inferred based on implicit cues (e.g. Facebook likes) and machine learning! (cf. Part III)

Page 87: Multimedia Privacy

Inferred Information & Privacy in OSNs

• Study of user awareness with regard to inferred information largely neglected by social research.

• Privacy usually presented as a question of giving access or communicating personal information to some party, e.g.:

“The claim of individuals, groups, or institutions to determine for themselves when, how, and to what extent information about them is communicated to others.” (Westin, 1970)

[1] Alan Westin. Privacy and freedom. Bodley Head, London, 1970.

Page 88: Multimedia Privacy

Inferred Information & Privacy in OSNs

• However, access control is non-existent for inferred information:

• Users are unaware of the inferences being made.• Users have no control over the way inferences are made.

• Goal: Investigate whether and how users intuitively grasp what can be inferred from their disclosed data!

Page 89: Multimedia Privacy

Main Research Questions1. Predictability: How predictable are different types of

personal information, based on users’ OSN data?2. Actual vs. perceived predictability: How realistic are

user perceptions about the predictability of their personal information?

3. Predictability vs. sensitivity: What is the relationship between perceived sensitivity and predictability of personal information?

• Previous work has focused mainly on Q1• We address Q1 using a variety of data and methods,

and additionally we address Q2 and Q3

Page 90: Multimedia Privacy

Data Collection• Three types of data about 170 Facebook users:

• OSN data: Likes, posts, images -- collected through a test Facebook application

• Answers to questions about 96 personal attributes, organized into 9 categories, e.g. health factors, sexual orientation, income, political attitude, etc.

• Answers to questions related to their perceptions about the predictability and sensitivity of the 9 categories

http://databait.eu http://www.usemp-project.eu

Page 91: Multimedia Privacy

Example From Questionnaire• What is your sexual orientation? →

ground truth• Do you think the information on your

Facebook profile reveals your sexual orientation? Either because you yourself have put it online, or it could be inferred from a combination of posts. → perceived predictability

• How sensitive do you find the information you had to reveal about your sexual orientation? (1=not sensitive at all, 7= very sensitive) → perceived sensitivity

Response #heterosexual 147

homosexual 14

bisexual 7

n/a 2

Response #yes 134

no 33

n/a 3

Page 92: Multimedia Privacy

Features Extracted From OSN Data• likes: binary vector denoting presence/absence of a like (#3.6K)• likesCats: histogram of like category frequencies (#191)• likesTerms: Bag-of-Words (BoW) of terms in description, title,

and about sections of likes (#62.5K)• msgTerms: BoW vector of terms in user posts (#25K)• lda-t: Distribution of topics in the textual contents of both likes

(description, title, and about section) and posts• Latent Dirichlet Allocation with t=20,30,50,100

• visual: concepts depicted in user images (#11.9K), detected using CNN, top 12 concepts per images, 3 variants

• visual-bin: hard 0/1 encoding• visual-freq: concept frequency histogram• visual-conf: sum of detection scores across all images

Page 93: Multimedia Privacy

Experimental Setup• Evaluation method: repeated random sub-sampling

• Data split randomly =10 times into train (67%) / test (33%)𝑛• Model fit on train / accuracy of inferences assessed on test• 96 questions (user attributes) were considered

• Evaluation measure: area under ROC curve (AUC)• Appropriate for imbalanced classes

• Classification algorithms• Baseline: -nearest neighbors, decision tree, naïve Bayes𝑘• SoA: Adaboost, random forest, regularized logistic regression

Page 94: Multimedia Privacy

Predictability per Attribute

nationality

is employed

can be moodysmokes cannabis

plays volleyball

Page 95: Multimedia Privacy

What Is More Predictable?Rank

Perceived Actual predictability Predictability SoA*

1 Demographics Demographics - Demographics2 Relationship status

and living conditionPolitical views +3 Political views

3 Sexual orientation Sexual orientation - Religious views4 Consumer profile Employment/

Income+4 Sexual orientation

5 Political views Consumer profile -1 Health status6 Personality traits Relationship status

and living condition-4 Relationship status

and living condition7 Religious views Religious views -8 Employment/

IncomeHealth status +1

9 Health status Personality traits -3

* Kosinski, et al. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 2013.

Page 96: Multimedia Privacy

Predictability Versus Sensitivity

Page 97: Multimedia Privacy

Part III: Multimodal Inferences

Page 98: Multimedia Privacy

Personal Data: Truly Multimodal• Text: posts, comments, content of articles you

read/like, etc.• Images/Videos: posted by you, liked by you, posted

by others but containing you• Resources: likes, visited websites, groups, etc.• Location: check-ins, GPS of posted images, etc.• Network: what your friends look like, what they

post, what they like, community where you belong• Sensors: wearables, fitness apps, IoT

Page 99: Multimedia Privacy

What Can Be Inferred?A lot….

Page 100: Multimedia Privacy

Three Main Approaches

• Content-based• What you post is what/where/how/etc. you are

• Supervised learning• Learn by example

• Network-based• Show me your friends and I’ll tell you who you are

Page 101: Multimedia Privacy

Content-BasedBeware of your posts…

Page 102: Multimedia Privacy

LocationMultimodal Location Estimation

Page 103: Multimedia Privacy

Multimodal Location Estimation

http://mmle.icsi.berkeley.edu

Page 104: Multimedia Privacy

Multimodal Location Estimation

We infer the location of a video based on visual stream, audio stream, and tags:• Use geo-tagged data as training data• Allows faster search, inference, and intelligence-

gathering, even without GPS.

G. Friedland, O. Vinyals, and T. Darrell: "Multimodal Location Estimation," pp. 1245-1251, ACM Multimedia, Florence, Italy, October 2010.

Page 105: Multimedia Privacy

Intuition for the Approach{berkeley, sathergate, campanile}

{berkeley, haas}

{campanile} {campanile, haas}

Node: Geolocation of video

Edge: Correlated locations (e.g. common tag, visual, acoustic features)

Edge Potential: Strength of an edge (e.g. posterior distribution of locations given common tags)

Page 106: Multimedia Privacy

MediaEval

J. Choi, G. Friedland, V. Ekambaram, K. Ramchandran: "Multimodal Location Estimation of Consumer Media: Dealing with Sparse Training Data," in Proceedings of IEEE ICME 2012, Melbourne, Australia, July 2012.

Page 107: Multimedia Privacy

YouTube Cybercasing Revisited

YouTube Cybercasing With Geo-Tags vs. Multimodal Location Estimation

Old Experiment No Geo-Tags

Initial Videos 1000 (max) 107

User Hull ~50k ~2000

Potential Hits 106 112

Actual Targets >12 >12

Page 108: Multimedia Privacy

Account LinkingCan we link accounts based on their content?

Page 109: Multimedia Privacy

Using Internet Videos: Dataset

Test videos from Flickr (~40 sec)• 121 users to be matched, 50k trials• 70% have heavy noise• 50% speech• 3% professional content

H. Lei, J. Choi, A. Janin, and G. Friedland: “Persona Linking: Matching Uploaders of Videos Across Accounts”, at IEEE International Conference on Acoustic, Speech, and Signal Processing (ICASSP), Prague, May 2011.

Page 110: Multimedia Privacy

Matching Users Within Flickr

Algorithm: 1) Take 10 seconds of the soundtrack of a video2) Extract the Spectral Envelope3) Compare using Manhattan Distance

Page 111: Multimedia Privacy

Spectral Envelope

Page 112: Multimedia Privacy

User ID on Flickr Videos

Page 113: Multimedia Privacy

Persona Linking Using Internet Videos

Result:• On average, having 40 seconds in the

test and training sets leads to a 99.2% chance for a true positive match!

Page 114: Multimedia Privacy

Another Linkage AttackExploiting users’ online activity to link accounts• Link based on where and when a user is posting • Attack model is individual targeting • Datasets: Yelp, Flickr, Twitter• Methods

• Location profile• Timing profile

Page 115: Multimedia Privacy

When a User Is Posting

Page 116: Multimedia Privacy

Where a User Is Posting

- Twitter locations

- Yelp locations

Page 117: Multimedia Privacy

De-Anonymization Model

Targeted account(YELP users are ID’d) Candidate

list

how similar?

Page 118: Multimedia Privacy

Datasets

• Three social networks: Yelp, Twitter, Flickr• Two types of data sets

• Ground truth data set• Yelp-Twitter: 2,363 -> 342 (with geotags) -> 57 (in SF bay)• Flickr-Twitter: 6,196 -> 396 (with geotags) -> 27 (in SF bay)

• Candidate Twitter list data set: 26,204

Page 119: Multimedia Privacy

Performance on Matching

Page 120: Multimedia Privacy

Supervised LearningLearn by example

Page 121: Multimedia Privacy

Inferring Personal Information• Supervised learning algorithms

• Learn a mapping (model) from inputs 𝒙𝑖 to outputs 𝑦𝑖 by analyzing a set of training examples =(𝐷 𝒙𝑖,𝑦𝑖 )i

𝑁• In this case

• 𝑦𝑖 corresponds to a personal user attribute, e.g. sexual orientation• 𝒙𝑖 corresponds to a set of predictive attributes or features, e.g. user likes

• Some previous results• Kosinski et al. [1]: likes features (SVD) + logistic regression: Highly accurate

inferences of ethnicity, gender, sexual orientation, etc.• Schwartz et al. [2] status updates (PCA) + linear SVM: Highly accurate

inference of gender

[1] Kosinski, et al. Private traits and attributes are predictable from digital records of human behavior. Proceedings of the National Academy of Sciences, 2013.[2] Schwartz, et al. Personality, gender, and age in the language of social media: The open-vocabulary approach. PloS one, 2013.

Page 122: Multimedia Privacy

What Do Your Likes Say About You?

M. Kosinski, D. Stillwell, T. Graepel. “Private Traits and Attributes are Predictable from Digital Records of Human Behavior”. PNAS 110: 5802 – 5805, 2013

Page 123: Multimedia Privacy

M. Kosinski, D. Stillwell, T. Graepel. “Private Traits and Attributes are Predictable from Digital Records of Human Behavior”. PNAS 110: 5802 – 5805, 2013

Results: Prediction Accuracy

Page 124: Multimedia Privacy

M. Kosinski, D. Stillwell, T. Graepel. “Private Traits and Attributes are Predictable from Digital Records of Human Behavior”. PNAS 110: 5802 – 5805, 2013

The More You Like…

Page 125: Multimedia Privacy

Our Results: USEMP Dataset (Part II)

Testing different classifiers

Page 126: Multimedia Privacy

Our Results: USEMP Dataset (Part II)

Testing different features

Page 127: Multimedia Privacy

Our Results: USEMP Dataset (Part II)

Testing combinations of features

Page 128: Multimedia Privacy

Caution: Reliability of Predictions

MODEL 1

MODEL 2

MODEL N

α% training setENSEMBLE

α%

α%

Page 129: Multimedia Privacy

Caution: Reliability of Predictions

Percentage of users, for which individual models have low agreement (Sx<0.5).

Classification accuracy for those users.

MyPersonality dataset (subset)

Page 130: Multimedia Privacy

Conclusions• Representing users as feature vectors and using

supervised learning can help achieve pretty good accuracy in several cases.

However:• There will be several cases where the output of the

trained model will be unreliable (close to random).• For many classifiers and for abstract feature

representations (e.g. SVD), it is very hard to explain why a particular user has been classified as belonging to a given class.

Page 131: Multimedia Privacy

Network-Based Learning Show me your friends….

with Georgios Rizos

Page 132: Multimedia Privacy

Network-Based Classification• People with similar interests tend to connect

→ homophily• Knowing about one’s connections

could reveal information about them

• Knowing aboutthe whole networkstructure could revealeven more…

Page 133: Multimedia Privacy

My Social Circles

A variety of affiliations:• Work• School• Family• Friends…

Page 134: Multimedia Privacy

SoA: User Classification (1)Graph-based semi-supervised learning:• Label propagation (Zhu and Ghahramani, 2002)• Local and global consistency (Zhou et al., 2004)

Other approaches to user classification:• Hybrid feature engineering for inferring user behaviors

(Pennacchiotti et al., 2011 , Wagner et al., 2013)• Crowdsourcing Twitter list keywords for popular users

(Ghosh et al., 2012)

Page 135: Multimedia Privacy

SoA: Graph Feature Extraction (2)Use of community detection:• EdgeCluster: Edge centric k-means (Tang and Liu, 2009)• MROC: Binary tree community hierarchy (Wang et al., 2013)

Low-rank matrix representation methods:• Laplacian Eigenmaps: k eigenvectors of the graph Laplacian

(Belkin and Niyogi, 2003 , Tang and Liu, 2011)• Random-Walk Modularity Maximization: Does not suffer from the

resolution limit of ModMax (Devooght et al., 2014)• Deepwalk: Deep representation learning (Perozzi et al., 2014)

Page 136: Multimedia Privacy

Overview of FrameworkOnline social interactions (retweets, mentions, etc.)

Social interaction user graph

ARCTE

Partial/Sparse Annotation

Supervised graph feature representation

Feature Weighting

User Label Learning

Classified Users

Page 137: Multimedia Privacy

ARCTE: Intuition

Page 138: Multimedia Privacy

Evaluation: Datasets

Ground truth generation:• SNOW2014 Graph: Twitter list aggregation & post-processing• IRMV-PoliticsUK: Manual annotation• ASU-YouTube: User membership to group• ASU-Flickr: User subscription to interest group

Datasets Labels Vertices Vertex Type Edges Edge Type

SNOW2014 Graph(Papadopoulos et al., 2014)

90 533,874 Twitter Account

949,661 Mentions + Retweets

IRMV-PoliticsUK(Greene & Cunningham, 2013)

5 419 Twitter Account

11,349 Mentions + Retweets

ASU-YouTube(Mislove et al., 2007)

47 1,134,890 YouTube Channel

2,987,624 Subscriptions

ASU-Flickr(Tang and Liu, 2009)

195 80,513 Flickr Account 5,899,882 Contacts

Page 139: Multimedia Privacy

Example: Twitter

Twitter Handle Labels@nytimes usa, press,

new york@HuffPostBiz finance@BBCBreaking press,

journalist, tv@StKonrath journalist

Examples from SNOW 2014 Data Challenge dataset

Page 140: Multimedia Privacy

Evaluation: SNOW 2014 datasetSNOW2014 Graph (534K, 950K): Twitter mentions + retweets, ground truth based on Twitter list processing

Page 141: Multimedia Privacy

Evaluation: ASU-YouTube• ASU-YouTube (1.1M, 3M): YouTube subscriptions, ground

truth based on membership to groups

Page 142: Multimedia Privacy

Part IV: Some Possible Solutions

Page 143: Multimedia Privacy

Solution 1: Disclosure Scoring Framework

with Georgios Petkos

Page 144: Multimedia Privacy

Problem and Motivation• Several studies have shown that privacy is a challenging

issue in OSNs. •Madejski et al. performed a study with 65 users asking them to carefully examine their profiles → all of them identified a sharing violation.

• Information about a user may appear not only explicitly, but also implicitly, and may therefore be inferred (also think of institutional privacy).

• Different users have different attitudes towards privacy and online information sharing (Knijnenbourg, 2013).

Madejski et al., “A study of privacy setting errors in an online social network”. PERCOM, 2012Knijnenbourg, “Dimensionality of information disclosure behavior”. IJHCS, 2013

Page 145: Multimedia Privacy

Disclosure Scoring“A framework for quantifying the type of information one is sharing, and the extent of such disclosure.”

Requirements:• It must take into account the fact that privacy

concerns are different across users.• Different types of information have different

significance to users.• Must take into account both explicit and inferred

information.

Page 146: Multimedia Privacy

Related Work1. Privacy score [Liu10]: based on the concepts of

visibility and sensitivity:

2. Privacy Quotient and Leakage [Srivastava13]

3. Privacy Functionality Score [Ferrer10]

4. Privacy index [Nepali13]

5. Privacy Scores [Sramka15]

Page 147: Multimedia Privacy

Types of Personal Information

aka Disclosure Dimensions

Page 148: Multimedia Privacy

Overview of PScore

A F

A.1 A.6 F.1 F.3A.5

• Explicitly Disclosed / Inferred• Value / Predicted Value• Confidence of Prediction• Level of Sensitivity

• Level of Disclosure• Reach of Disclosure• Level of Sensitivity

Observed data (URLs, likes, posts)

Inference Algorithms

0101 1101 1001

Disclosure Dimensions

User Attributes

Page 149: Multimedia Privacy

Example

Page 150: Multimedia Privacy

Visualization

Bubble color/size proportional to disclosure score → red/big corresponds to more sensitive/risky

Page 151: Multimedia Privacy

Visualization

Hierarchical exploration of types of personal information.

http://usemp-mklab.iti.gr/usemp/

Page 152: Multimedia Privacy

Solution 2: Personalized Privacy-Aware Image Classification

with Eleftherios Spyromitros-Xioufis and Adrian Popescu (CEA-LIST)

Page 153: Multimedia Privacy

Privacy-Aware Image Classification

• Photo sharing may compromise privacy

• Can we make photo sharing safer?• Yes: build “private” image detectors

• Alerts whenever a “private” image is shared• Personalization is needed because privacy is subjective!

-Would you share such an image? -Does it depend with whom?

Page 154: Multimedia Privacy

Previous Work, and Limitations• Focus on generic (“community”) notion of privacy

• Models trained on PicAlert [1]: Flickr images annotated according to a common privacy definition

• Consequences:• Variability in user perceptions not captured • Over-optimistic performance estimates

• Justifications are barely comprehensible

[1] Zerr et al., I know what you did last summer!: Privacy-aware image classification and search, CIKM, 2012.

Page 155: Multimedia Privacy

Goals of the Study

• Study personalization in image privacy classification• Compare personalized vs. generic models• Compare two types of personalized models

• Semantic visual features• Better justifications and privacy insights

• YourAlert: more realistic than existing benchmarks

Page 156: Multimedia Privacy

Personalization Approaches

• Full personalization: • A different model for each user, relying only on their

feedback• Disadvantage: requires a lot of feedback

• Partial personalization: • Models rely on user feedback + feedback from other

users• Amount of personalization controlled via instance

weighting

Page 157: Multimedia Privacy

Visual and Semantic Features• vlad [1]: aggregation of local image descriptors

• cnn [2]: deep visual features

• semfeat [3]: outputs of ~17K concept detectors • Trained using cnn• Top 100 concepts per image

[1] Spyromitros-Xioufis et al., A comprehensive study over vlad and product quantization in large-scale image retrieval. IEEE Transactions on Multimedia, 2014.[2] Simonyan and Zisserman, Very deep convolutional networks for large-scale image recognition, ArXiv, 2014.[3] Ginsca et al., Large-Scale Image Mining with Flickr Groups, MultiMedia Modeling, 2015.

Page 158: Multimedia Privacy

Explanations via Semfeat

• Semfeat can be used to justify predictions• A tag cloud of the most discriminative visual concepts

• Explanations may often be confusing• Concept detectors are not perfect• Semfeat vocabulary (ImageNet) is not privacy-oriented

knitwear

young-back

hand-glasscigar-smoker

smoker

drinker

Freudian

Page 159: Multimedia Privacy

semfeat-LDA: Enhanced Explanations

• Project semfeat to a latent space (second level semantic representation)

• Images treated as text documents (top 10 concepts)• Text corpus created from private images (Pic+YourAlert)• LDA is applied to create a topic model (30 topics)• 6 privacy-related topics are identified (manually)

Topic Top 5 semfeat concepts assigned to each topicchildren dribbler child godson wimp niecedrinking drinker drunk tipper thinker drunkarderotic slattern erotic cover-girl maillot back

relatives great-aunt second-cousin grandfather mother great-grandchild

vacations seaside vacationer surf-casting casting sandbankwedding groom bride celebrant wedding costume

Page 160: Multimedia Privacy

semfeat-LDA: Example

knitwear

young-back

hand-glasscigar-smoker

smoker

drinker

Freudian

1st level semantic representation

2nd level semantic representation

Page 161: Multimedia Privacy

YourAlert: A Realistic Benchmark• User study

• Participants annotate their own photos (informed consent, only extracted features shared)

• Annotation based on the following definitions:• Private: “would share only with close OSN friends or not at all”• Public: “would share with all OSN friends or even make public”

• Resulting dataset: YourAlert• 1.5K photos, 27 users, ~16 private/40 public per user• Main advantages:

•Facilitates realistic evaluation of privacy models •Allows development of personalized models

Publicly available at: http://mklab.iti.gr/datasets/image-privacy/

Page 162: Multimedia Privacy

Generic Models: PicAlert vs. YourAlert

Page 163: Multimedia Privacy

Key Findings• Almost perfect performance for PicAlert with CNN

• semfeat performs similarly to CNN

• Significantly worse performance for YourAlert• Similar performance for all features

• Additional findings• Using more generic training examples does not help• Large variability in performance across users

Page 164: Multimedia Privacy

Personalized privacy models• Evaluation carried out on YourAlert

• A modified k-fold cross-validation for unbiased estimates

• Personalized model types• ‘user’: only user-specific examples from YourAlert• ‘hybrid’: a mixture of user-specific examples from

YourAlert and generic examples from PicAlert• User-specific examples are weighted higher

Page 165: Multimedia Privacy

Evaluation of Personalized Models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

Model type: ‘user’

Page 166: Multimedia Privacy

Evaluation of Personalized Models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

Model type: ‘user’

Page 167: Multimedia Privacy

Evaluation of Personalized Models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

Model type: ‘user’

Page 168: Multimedia Privacy

Evaluation of Personalized Models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

Model type: ‘hybrid w=1’

Page 169: Multimedia Privacy

Evaluation of Personalized Models

PicAlert YourAlertu1 3-fold cv

k=1 test set

u2 u3

Model type: ‘hybrid w=1’

Page 170: Multimedia Privacy

Evaluation of Personalized Models

PicAlert YourAlertu1 3-fold cv

k=1 test set

u2 u3

Model type: ‘hybrid w=1’

Page 171: Multimedia Privacy

Evaluation of Personalized Models

PicAlert YourAlertu1 3-fold cv

k=1 test set

u2 u3

Model type: ‘hybrid w=2’

Page 172: Multimedia Privacy

Evaluation of Personalized Models

PicAlert YourAlertu1 3-fold cv

k=1 test set

u2 u3

Model type: ‘hybrid w=2’

Page 173: Multimedia Privacy

Evaluation of Personalized Models

PicAlert YourAlertu1

3-fold cv

k=1 test set

u2 u3

Model type: ‘hybrid w=2’

Page 174: Multimedia Privacy

Results

Page 175: Multimedia Privacy

Privacy Insights via Semfeat

child mate son

privateuphill

lakefront waterside

public

Page 176: Multimedia Privacy

Identifying Recurring Privacy Themes

• A prototype semfeat-LDA vector for each user• The centroid of the semfeat-LDA vectors of their private

images

• K-means (k=5) clustering on the prototype vectors

Page 177: Multimedia Privacy

Would you share the following?With whom would you share the photos in the following slides:

a)familyb)friendsc)colleaguesd)your Facebook friendse)everyone (public)

Page 178: Multimedia Privacy
Page 179: Multimedia Privacy
Page 180: Multimedia Privacy
Page 181: Multimedia Privacy

Part V: Future Directions

Page 182: Multimedia Privacy

Towards Private Multimedia Systems

We should:• Research methods to help mitigate risks and offer

choice.• Develop privacy policies and APIs that take into

account multimedia retrieval.• Educate users and engineers on privacy issues.

...before panic slows progress in the multimedia field.

Page 183: Multimedia Privacy

The Role of Research

Research can help:• Describe and quantify risk factors• Visualize and offer choices in UIs• Identify privacy-breaking information• Filter out “irrelevant information” through content

analysis

Page 184: Multimedia Privacy

Reality Check

Can we build a privacy-proof system?

No. We can’t build a theft-proof car either.

However, we can make it more or less privacy-proof.

Page 185: Multimedia Privacy

Emerging Issue: Internet of Things

Graphic by Applied Materials using International Data Corporation data.

Page 186: Multimedia Privacy

Emerging Issue: Wearables

Source: Amish Gandhi via SlideShare

Page 187: Multimedia Privacy

Multimedia Things• Much of the IoT data collected is multimedia data.

•Requires (exciting!) new approaches to real-time multimedia content analysis. →•Presents new threats to security and privacy. →•Requires new best practices for Security and Privacy by Design and new privacy enhancing technologies (PETs). →•Presents opportunities to work on privacy enhancements to multimedia!

Page 188: Multimedia Privacy

Example IoT Advice From Future of Privacy Forum• Get creative with using multimedia affordances (visual,

audio, tactile) to alert users to data collection.• Respect for context: Users may have different expectations

for data they input manually and data collected by sensors.• Inform users about how their data will be used.• Choose de-identification practices according to your

specific technical situation. •In fact, multimedia expertise can contribute to improving de-identification!

• Build trust by allowing users to engage with their own data, and to control who accesses it.

Source: Christopher Wolf, Jules Polonetsky, and Kelsey Finch, A Practical Privacy Paradigm for Wearables. Future of Privacy Forum, 2015.

Page 189: Multimedia Privacy

One Privacy Design Practice Above All

Think about privacy (and security) as you BEGIN designing a system or planning a

research program. Privacy is not an add-on!

Page 190: Multimedia Privacy

Describing RisksA Method from Security Research• Build a model for potential attacks

as a set of:• attacker properties • attack goals

• Proof your system against it as much as possible.• Update users’ expectations about residual risk.

Page 191: Multimedia Privacy

Attacker Properties: Individual Privacy

• Resources• individual/institutional/moderate resource

• Target Model• targeted individual/easiest k of N/everyone

• Database access• full (private, public) data access/well-indexed

access/poorly indexed access/hard retrieval/soft retrieval (multimedia)

Page 192: Multimedia Privacy

Goals of Privacy Attacks• Cybercasing (attack preparation)• Cyberstalking• Socio-Economic profiling• Espionage (industry, country)• Cybervetting• Cyberframing

Page 193: Multimedia Privacy

Towards Privacy-Proof MM Systems

• Match users’ expectations of privacy in system behavior (e.g. include user evaluation)

• If that’s not possible, educate users about risks• Ask yourself: What is the best trade-off for the

users between privacy, utility, and convenience?• Don’t expose as much information as possible,

expose only as much information as is required!

Page 194: Multimedia Privacy

Engineering Rules From the Privacy Community

• Inform users of the privacy model and quantify the possible audience:

• Public/link-to-link/semi-public/private • How many people will see the information (avg. friends-

of-friends on Facebook: 70k people!)

• If users expect anonymity, explain the risks of exposure

• Self-posting of PII, hidden meta-data, etc.• Provide tools that make it easier to stay (more)

anonymous based on expert knowledge (e.g. erase EXIF)

Page 195: Multimedia Privacy

Engineering Rules from the Privacy Community

• Show users what metadata is collected by your service/app and to whom it is made available (AKA “Privacy Nutrition Label”)

• At the least, offer an opt-out!• Make settings easily configurable (Facebook is not

easily configurable)• Offer methods to delete and correct data• If possible, trigger search engine updating after deletion• If possible, offer “deep deletion” (i.e. delete re-posts, at

least within-system)

Page 196: Multimedia Privacy

Closing Thought Exercise: Part 1Take two minutes to think about the following questions:• What’s your area of expertise? What are you

working on right now?• How does it interact with privacy? What are the

potential attacks and potential consequences?• What can you do to mitigate negative privacy

effects?• What can you do to educate users about possible

privacy implications?

Page 197: Multimedia Privacy

Closing Thought Exercise: Part 2• Turn to the person next to you and share your

thoughts. Ask each other questions!• You have five minutes.

Page 198: Multimedia Privacy

AcknowledgmentsWork together with: • Jaeyoung Choi, Luke Gottlieb, Robin Sommer,

Howard Lei, Adam Janin, Oana Goga, Nicholas Weaver, Dan Garcia, Blanca Gordo, Serge Egelman, and others

• Georgios Petkos, Eleftherios Spyromitros-Xioufis, Adrian Popescu, Rob Heyman, Georgios Rizos, Polychronis Charitidis, Thomas Theodoridis and others

Page 199: Multimedia Privacy

Thank You!Acknowledgements: • This material is based upon work supported by the

US National Science Foundation under Grant No. CNS-1065240 and DGE-1419319, and by the European Commission under Grant No. 611596 for the USEMP project.

• Any opinions, findings, and conclusions or recommendations expressed in this material are those of the authors and do not necessarily reflect the views of the funding bodies.