What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND...

63
USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 1 Usable Privacy Policy Project HONG KONG UNIVERSITY EXPERT ADDRESS Copyright © 2019, Norman Sadeh What if Computers Understood Privacy Policies? A Look at Advances in NLP through the Lens of Privacy Norman Sadeh Carnegie Mellon University www.normsadeh.org usableprivacy.org privacyassistant.org explore.usableprivacy.org

Transcript of What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND...

Page 1: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 1Usable Privacy Policy Project

HONG KONG UNIVERSITY EXPERT ADDRESS

Copyright © 2019, Norman Sadeh

What if Computers Understood

Privacy Policies?

A Look at Advances in NLP through

the Lens of Privacy

Norman Sadeh

Carnegie Mellon University

www.normsadeh.org

usableprivacy.org privacyassistant.org

explore.usableprivacy.org

Page 2: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 2

What is Privacy?• Moral right of individuals to be left

alone, free from surveillance or

interference from other individuals or

organizations, including state

– There are obviously conflicting

considerations

• e.g. security and safety

• Legal Protection: founding documents of

many countries

Page 3: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 3

Data Privacy

• The claim that certain information should not be collected by governments, businesses or other entities– or possibly only under special circumstances and subject to various rules

– Hong Kong Data (Privacy) Ordinance

– EU General Data Protection Directive

– US Children Online Privacy Protection Act (COPPA), etc.

– etc.

Page 4: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 4

Privacy Policies

• Disclose an entity’s

data practices –

“notice and choice”

• In practice: long,

complex, vague and

ambiguous

• Hardly anyone

reads them

Privacy Policy

Page 5: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 5

Yet People Care About Privacy

• Vast majority of people care about privacy

• 91% of people in the US report feeling they

have lost control over their informationPew Survey 2014 http://www.pewinternet.org/2014/11/12/public-privacy-perceptions/

Page 6: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6

One Size Fits All Doesn’t Work

Attempts to summarize privacy policies have

been shown to have limitations

– Either, the summary is too long and people

still don’t read it

– …Or the summary is too short and people fail

to understand critical issues

– Different people care about different issues

• Different concerns, different expectations, what they

know/don’t know is different

J. Gluck, F. Schaub, A. Friedman, H. Habib, N. Sadeh, L.F. Cranor, Y. Agarwal, "How Short is Too Short?

Implications of Length and Framing on the Effectiveness of Privacy Notices", Symposium on Usable Privacy

and Security (SOUPS '16), Denver, CO, Jun 2016

Page 7: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 7

What if…

• Computers understood the text of privacy

policies?

Page 8: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 8

Annotation Tool Select a category

S. Wilson, F. Schaub, A. Dara, F. Liu, S. Cherivirala, P.G. Leon, M.S. Andersen, S. Zimmeck, K. Sathyendra, N.C.

Russell, T.B. Norton, E. Hovy, J.R. Reidenberg, N. Sadeh, "The Creation and Analysis of a Website Privacy Policy

Corpus", ACL '16: Annual Meeting of the Association for Computational Linguistics, Aug 2016

Select an attribute

Select a value

Highlight text span for an attribute,

value pair

Page 9: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 9

OPP 115 Privacy Policy Corpus

Page 10: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 10

Interpreting Annotations

Page 11: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 11

A First Task: Segment Annotation

Machine Learning

Model

Predict

Disclosure of Your Information Sci-News.com

does not sell, trade or rent your personal

information to third parties. If we choose to do

so in the future, you will be notified by email of

our intentions, and have the right to be

removed prior to the disclosure.

This policy segment discusses:

• Third Party Sharing/Collection

Privacy Policy

Page 12: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 12

A Number of Possible Classifiers

• Traditional Methods

– Bag of N-grams as features

– Multinomial Naïve Bayes (MNB)

– Logistic Regression (LR)

– Support Vector Machines (SVM)

• Neural Methods

– One-hot vector as input

– Recurrent Neural Networks (RNNs)

– Convolutional Neural Networks (CNNs)

Page 13: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 14

Performance (Precision/Recall/F1)

Simple techniques are not easy to beat

Picking a techniques purely based on F1 metric may be simplistic

Could use different techniques for different categories – or just LR for all

Performance strongly impacted by number of instances available for training

Precision: How often am I correct?; Recall: What percentage of instances do I catch?; F1:

combines both

Shomir Wilson, Florian Schaub, Frederick Liu, Kanthashree Mysore Sathyendra, Daniel Smullen, Sebastian Zimmeck, Rohan

Ramanath, Peter Story, Fei Liu, Norman Sadeh, Noah A. Smith, "Analyzing Privacy Policies at Scale: From Crowdsourcing

to Automated Annotations", ACM Transactions on the Web, 13, 1, Dec 2018 [pdf]

Page 14: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 15

Page 15: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 16

Page 16: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 17

Another Task: User Choice Instance Extraction

• User choices often buried

deep in the text of long policies

• Is it possible to automatically

extract information about

such “choice instances” from

privacy policies?

• Use Natural Language Toolkit

tokenizer to subdivide

segments into sentences &

build classifiers

Choice Instance !!!

If you do not want us to use

personal information that we

gather to allow third parties to

personalize advertisements

we display to you, please

adjust your Advertising

Preferences .

K.M. Sathyendra, S. Wilson, F. Schaub, S. Zimmeck, N. Sadeh. Identifying the Provision of Choices in Privacy Policies, EMNLP Conference,

2017

Page 17: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 18

Privacy Choices

• A number of

choices:

– De-activate account

– Delete account

– Opt-in of some practice

(e.g. collection of

location)

– Opt-out (e.g., email,

cookies, etc.)

Page 18: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 19

Privacy Choices in OPP-115 Corpus

Page 19: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 20

Initial Results (opt-out links)

Better results since then: Precision of 0.93 and recall of 0.98

Page 20: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 21

Annotated 7,000+ policies

https://explore.usableprivacy.org/

Page 21: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 22

Word Embeddings - I• Models learnt from raw text are brittle – fail to

exploit similarities between words

• ”A word is characterized by the company it keeps”

– John Firth – collocational meaning

• Distributional Semantics: capture semantic

similarities between words based on distributional

properties in large corpora of documents

• Different contexts – e.g. Apple the fruit vs. Apple the

company; or browser cookie vs. chocolate chip

cookieVinayshekhar Bannihatti Kumar, Abhilasha Ravichander, Peter Story, and Norman Sadeh, "Quantifying the Effect of In-

Domain Distributed Word Representations: A Study of Privacy Policies", AAAI Spring Symposium on Privacy Enhancing

AI and Language Technologies (PAL 2019), Mar 2019 [pdf]

Page 22: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 23

Word Embeddings - II• Two styles of word embeddings: word represented

as either:

– Vectors of co-occurring words

– Vectors of linguistic contexts

• These days, these models are typically trained using

neural nets – rather than traditional n-gram models.

– Unsupervised learning

• Do domain-specific embeddings help improve

performance over generic embeddings?

– GloVe: generic embeddings

– Word2Vec privacy-specific embeddings we trained –

using 150,000 mobile app privacy policies (Google

Play Store)

Page 23: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 24

OPP-115: Number of Instances

Page 24: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 25

Performance Improvement Using

Domain-Specific Embeddings

• Comparison between generic GloVe embeddings and

Word2Vec embeddings trained on large corpus of

privacy policies

• Significant improvements in categories with small

number of instances (OPP-115)

Data Practice Category Relative Percentage gain in F1

performance (Dev set)

Data Security 9%

Policy Change 6%

Data Retention 10%

Page 25: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 26

Visualizing Word Embeddings

Note also the different scales!

Page 26: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 27

How many privacy policies do we

need?

27

Varies based on data practice – finer data practices might benefit from greater corpus

Page 27: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 28

Deep Contextualized Word

Embeddings (“BERT”)

[ Devlin et al. 2017]

• BERT is a transformer model trained

on the Book Corpus and Wikipedia.

• BERT has contextualised word

embeddings which ensure that a word

in different contexts has different

embeddings.

• BERT is the state of the art model on

most classification tasks.

Page 28: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 29

BERT vs. in-domain word embeddings

Data Practice BERT In-Domain

Embedding

1st Party Collection & Use

3rd Party Sharing & Collection

User Choice Control

Data Security

International and Specific Audiences

Access, Edit & Delete

Policy Change

Data Retention

Do Not Track

Some further improvements, though domain-specific embeddings continue to

provide value for practices with small number of instances

Page 29: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 30

Mobile App Privacy Compliance

• Millions of apps in Google Play and iOS/iTunes app

stores

• These apps access a large number of sensitive APIs

(e.g., location, calendar, camera)

• Most developers lack the resources and know-how to

ensure that their apps are compliant

• Regulations are changing (e.g. GDPR, CCPA)

• Manually vetting apps does not scale: In 2014, the

Global Privacy Enforcement Network assessed 1211

apps in one week

Page 30: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 31

Can We Automatically Check for Potential

Compliance Issues?

• Training machine learning classifiers to extract relevant policy statements

• Compare these statements against:

– Regulatory requirements

– What the software actually does

• Static and dynamic code analysis

S. Zimmeck, Z. Wang, L. Zou, R. Iyengar, B. Liu, F. Schaub, S. Wilson, N. Sadeh, S.M. Bellovin, J.R. Reidenberg, "Automated

Analysis of Privacy Requirements for Mobile Apps", NDSS'17: Network and Distributed System Security Symposium, Feb

2017 [pdf]

Page 31: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 32

Performance metrics of our

classifiers’ ability to determine

whether a privacy policy

states that a practice is

performed, calculated using

the held-out test set (n = 100).

+ number of ground-truth instances

(cases where policies truly describe the

practice being performed)

- number of negative ground-truth

instances (cases where policies truly

don’t describe the practice being

performed).

NPV stands for negative predictive

value, or the precision for negative

instances.

Specificity is the recall for negative

instances. Negative F1 is the F1 for

negative instances.

Page 32: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 33

Performance metrics

of our static

analysis’s ability to

detect the practices

in our app test set

(n = 100). Dynamic

analysis used as

ground truth

(conservative).

+ number of positive

ground truth instances

- number of negative

ground truth instances

? number of instances

where either the

ground truth is

unknown or the manual

analysis failed.

Page 33: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 34

Scaling our Analysis

• We analyzed as many free apps as we

could find on the US Google Play Store

• Successfully analyzed 1,035,853 free

apps

Peter Story, Sebastian Zimmeck, Abhilasha Ravichander, Daniel Smullen, Ziqi Wang, Joel Reidenberg, N. Cameron Russell,

and Norman Sadeh, "Natural Language Processing for Mobile App Privacy Compliance", AAAI Spring Symposium on

Privacy Enhancing AI and Language Technologies (PAL 2019), Mar 2019 [pdf]

Page 34: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 35

Findings: Number of Potential

Compliance Issues

•Average number of

potential compliance

issues per app is 3.47

and the median is 3

•Note: “No Policies”

apps are excluded from

following figures

Page 35: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 36

•Location and

certain Identifiers

are quite

common

•3rdParty are

more common

than 1stParty

Findings: Prevalence of Practices and

Potential Compliance Issues

Page 36: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 37

• Lighter colors indicate greater transparency of practices. Darker colors

indicate that practices are being performed but not disclosed.

• Cells with fewer than 25 apps performing the practice are annotated with

the respective number of apps.

• Third party libraries: transparency issues

• Greater transparency for FAMILY categories, but still not great.

Findings: Play Store Categories

Page 37: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 38

Applications

• COPPA report compiled for FTC

• Focusing on location, apps with a large

number of downloads, and companies based

in the US

• Work with large European electronics

manufacturer – checking for GDPR

compliance of mobile apps

Page 38: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 39

Reports View with Search and

FiltersSearch and Filter

according to

- Practices

(e.g., GPS

location

collection)

- Type

(Is GPS

location not

mentioned in

the policy or

does the

policy explicitly

state it is not

occurring?

- Specificity

(Perhaps

“location”

suffices in

some

jurisdictions

not specifically

requiring to

mention

“GPS.”)

- Many more …

Page 39: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 40

Flashlight App Metadata Details

All available app metadata is

extracted from the Google Play

Store

Page 40: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 41

Flashlight App Policy ResultsRelevant parts of the policies are

extracted and displayed

alongside the analysis results

Page 41: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 42

Flashlight App Static Analysis

ResultsRelevant parts of the

code are extracted

and displayed

alongside the

analysis results

Page 42: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 43

Answering People’s Privacy

Questions• Generating summaries of

privacy policies has significant

limitations

• Different people are interested

in different sets of issues

• How about developing

functionality capable of

answering people’s privacy

questions?

”One Size

Fits All”

does not

work

Abhilasha Ravichander, Alan Black, Eduard Hovy, Joel Reidenberg, N. Cameron Russell, and Norman Sadeh, "Challenges

in Automated Question Answering for Privacy Policies", AAAI Spring Symposium on Privacy Enhancing AI and

Language Technologies (PAL 2019), Mar 2019 [pdf]

Page 43: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 44

Research Questions• What types of questions do people have?

• How do people formulate their privacy questions?

– Are they able to articulate their questions?

• Do their questions make sense?

• Are there questions ambiguous?

• Do their questions require clarifications?

• What types of answers do people find useful?

– Content, length, assumptions about what people

understand/know

• Can questions be answered based on the text of

privacy policies?

– If not, what other sources of information are available?

Page 44: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 45

Collecting a Corpus of Privacy Questions

Crowdsourcing privacy questions about mobile apps

Scenario: Asked to assume there is a privacy assistant capable of answering

their privacy questions

Page 45: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 46

Crowdsourcing Task – Amazon Turk

Page 46: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 47

Legal Analysis

Page 47: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 48

Data Collected• 10 mobile app categories – app categories with

more than 2% of total number of apps

• Total of 27 mobile apps – 50% with over 5M

downloads and 50% with fewer downloads

• 50 questions per policy – 5 crowdworkers for

each app and 10 questions per crowdworker:

total of 1,350 questions

• Avg. question length: 8.4 words

• Avg. policy length: 3,273 words

• Avg. answer length: 104.5 words

Page 48: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 49

Types of Questions – Word Analysis

Page 49: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 50

Relevance and Subjectivity

Privacy Related Not Privacy Related

Subjective 4.9%

“Is my data safe?”

1.4%

“Are there any in-game

purchases I should be

concerned about?”

Objective 74%

“What information are

they collecting?”

19.7%

“How much does it cost?”

Not hopeless….but not trivial either…

Page 50: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 51

Data Practice Categories

Relatively well aligned with distribution of practice statements in privacy policies…

Page 51: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 52

What Makes Questions Unanswerable? (I)

• 56% of unanswerable questions would typically not be

addressed in the text of privacy policies

• “How does the currency within the game work?” ---- out of

scope

• “Has Viber had data breaches in the past?” ---- not typically

disclosed in a privacy policy….would have to look for

other sources of information (e.g. news sites, social media,

etc.)

• 24% of unanswerable questions should ideally be

answerable based on the text of the privacy policy but policy

was silent (e.g., “Is my data encrypted?”)

– Public policy implications

– Could potentially fall back on background knowledge

Page 52: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 53

• 6% of unanswerable questions are too

vague to be correctly interpreted

– “Who can contact me through the app?”

– Would benefit from dialogue to clarify what the

person is asking

• 4% are ambiguously phrased

– “Any difficulty to occupy the privacy

assistant?”

– Dialogue could help too

What Makes Questions Unanswerable? (II)

Page 53: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 54

• 3% of unanswerable questions are too

specific to typically be addressed in

privacy policy

– “Does it have access to financial apps I use?”

– Could potentially fall back on general

knowledge

• 7% are subjective

– “How do I know this app is legit?”

– Could potentially fall back on general

knowledge

What Makes Questions Unanswerable? (III)

Page 54: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 55

Observations

• Privacy Q&A could be more effective than one-size-fits-

all summaries

• Privacy policies are underspecified

– Would benefit from accessing other sources of

information

• Some questions can automatically be answered

• Some questions will require dialogues with users to

disambiguate what the user is interested in

Page 55: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 56

Ongoing Work on Privacy Q&A

• Building classifiers to determine if question is

answerable and, if it is not, why (e.g.

differentiating between subjective questions,

questions that require disambiguation, questions

that show lack of knowledge, questions that

would benefit from others sources of

information)

• Building answer templates – incl.

qualifications/disclaimers, background

knowledge, etc.

Page 56: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 57

Concluding Remarks - I• Data-centric economy: Impossible for

people to keep up with all the different

ways in which their data is collected & used

• Privacy policies are required by a number

of regulations around the world, but people

can’t be expected to read them

• Advances in NLP/ML make it possible to

develop solutions that automatically

extract statements from the text of

policies

Page 57: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 58

Concluding Remarks - II

• Applications

– Tools to help people more effectively navigate

the text of privacy policies

– Tools to automatically extract Policy

Choices buried in the text of policies (e.g. opt

outs)

– Tools to help regulators, app stores and app

developers with privacy compliance issues

– Tools to help answer those privacy

questions a given individual cares about

Page 58: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 59

Q&A

Acknowledgements: Work funded by the National

Science Foundation, DARPA and Google

The Usable Privacy Policy Project and the

Personalized Privacy Assistant Project both involve a

collaborations with a number of individuals.

See usableprivacy.org and privacyassistant.org for

additional details incl. lists of collaborators and

publications

Page 59: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 60

Six Principles:

1. Purpose & Manner of Collection has to be disclosed to data subject

2. Accuracy and Duration of Retention of Personal Data: data should be uptodate and only retained as long as necessary

3. Use of Personal Data: only for the purpose for which data was collected – unless otherwise agreed by data subject

4. Security of Personal Data: protection against unauthorized or accidental access, processing or deletion

5. Notification: Open policies about data being collected & for what purpose

6. Access to personal data: right to review and correct data about oneself

Hong Kong Personal Data Ordinance (Dec. 1996)

Page 60: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 61

Hong Kong Personal Data Ordinance

• Personal data can only be used for the purpose for which it was collected – no frivolous collection

– This also restricts sharing

• Purpose has to be stated from the beginning

• People should have the right to inspect information held about them within 40 days of their asking

– May involve a fee

• Data has to be corrected if erroneous

• Data has to be secure

• No direct marketing or teleselling if someone opts out

• Individuals can sue if damage results from the release of confidential data, or from inaccurate data or other breach

• Note: This is a very approximate summary – read the text of the Ordinance for a more detailed & accurate understanding

Page 61: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 62

Recent Amendments to HK Personal Data Ordinance

• Effective April 1, 2013

• Imposes additional requirements for data users that seek to:

– Sell personal data

– Use personal data for their own direct marketing purposes

– Provide personal data to another person for that other person’s direct marketing purposes

Page 62: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 63

More Details

Page 63: What if Computers Understood Privacy Policies? A …...2019/05/30  · USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 6 One Size Fits All Doesn’t Work Attempts

USABLE PRIVACY POLICY AND PERSONALIZED PRIVACY ASSISTANT PROJECTS 64

EU – GDPR

• Took effect May 25, 2018

• Stricter provisions than EU Data Protection Directive

– Single set of rules and one-stop shop model: each company coordinates with a single Supervisory Authority

– Privacy by Design and by Default – incl. default privacy settings that are protective

– Opt-in: data controllers must be able to prove consent & consent may be withdrawn

– Severe penalties for violations: up to 20MEUR or 4% of worldwide turnover, whichever is greater

– Right to “erasure”

– Data portability