Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design,...

16
Privacy-Preserving Data Analysis & Security by Design Daniel Kraschewski Big Techday 11 2018-05-18

Transcript of Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design,...

Page 1: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Privacy-Preserving Data Analysis& Security by Design

Daniel Kraschewski

Big Techday 11

2018-05-18

Page 2: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Motivation

personal data is omnipresent� internet browsing history� cell phone movements� smart metering, smart homes, IoT� social media, cloud� . . .

Photo: “Base.” by Instant Vantage (CC BY-SA 2.0), clipped to fit page layout

Daniel Kraschewski Privacy-Preserving Data Analysis 1 / 15

Page 3: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Use Cases

customer analytics city planning

medical surveys social surveys

Photos by @nordwood, @hellocolor, @rawpixel, @jaseess on Unsplash

Daniel Kraschewski Privacy-Preserving Data Analysis 2 / 15

Page 4: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Legal Situation

GDPR (from 2018-05-25)� strong notion of consent

� informed� freely given� specific� unambiguous� clear affirmative act

� high fines (up to 4% of annual turnover) for data privacy violations

Privacy-Preserving Data Analysis� derive large-scale statistical insights� still preserve individual’s privacy� security by design� provable security/privacy

Daniel Kraschewski Privacy-Preserving Data Analysis 3 / 15

Page 5: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Exemplary Mechanism Stack

privacy-preserving results� aggregated statistics must not reveal personal information� proper formal notion of anonymity� Differential Privacy, Laplace Mechanism

privacy-preserving computation� personal data must be unaccessible even for system owner� Cryptography� Secret Sharing, Secure Multi-Party Computation

privacy-preserving environment� no unauthorized third party may access any data� IT-Security� Access Control, encrypted & authenticated channels

Photo by @kristina on UnsplashDaniel Kraschewski Privacy-Preserving Data Analysis 4 / 15

Page 6: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Privacy-Preserving Computation

Example: Privacy-Preserving Averaging

12 18

27−15 2 16

27 + 2 = 29 −15 + 16 = 1

29

29+12 = 15

Daniel Kraschewski Privacy-Preserving Data Analysis 5 / 15

Page 7: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Secure Multi-Party Computation (MPC)

general setting� mutually mistrusting parties P1,P2, . . .

� secret inputs x1, x2, . . .

� want to compute some agreed on function value f (x1, x2, . . .)

� nothing but f (x1, x2, . . .) should be revealed about x1, x2, . . .

a universal solution1. write f as arithmetic circuit2. transform each xi into

unintelligible Secret Sharing3. evaluate f gate-by-gate,

preserving Secret Sharing4. recombine result

1 0 1 1

+ × + ×

× + × +

× + ×

x1 x2

f (x1, x2)Photo by Sahmeditor on Wikimedia Commons

(CC BY-SA 2.0), clipped to fit page layout

Daniel Kraschewski Privacy-Preserving Data Analysis 6 / 15

Page 8: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

A Concrete Protocol for General MPC

Shared Addition

xA, yA xB, yB

zA zB s.t. zA + zB = x + y

easy: zA = xA + yA and zB = xB + yB

Shared Multiplication

xA, yA xB, yB

zA zB s.t. zA + zB = x × y

problematic: z = xAyA + xAyB + xByA + xByB

missing building block

vA vB

wA wB s.t. wA + wB = vAvB

Daniel Kraschewski Privacy-Preserving Data Analysis 7 / 15

Page 9: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Building Block for Shared Multiplication

vA vB

r A, s A

rB , sB

InvariantsrArB = sA + sB

vArB = sA + wB

vAvB = wA + wB

random rA, rB, sAsB := rArB − sA

vA − rA

wB := sB + rB(vA − rA)

vB − rB

wA := sA + vA(vB − rB)

Daniel Kraschewski Privacy-Preserving Data Analysis 8 / 15

Page 10: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Application to Privacy-Preserving Data Analysis

Secre

t Sha

ring Secret Sharing

request re

quest

MPC

result part resu

lt par

t

Daniel Kraschewski Privacy-Preserving Data Analysis 9 / 15

Page 11: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Anonymity/Privacy Notions

k -Anonymity� published data must coincide with at lest k individuals

De-anonymization attack on correlated data1

� published data: number of people in mobile cell at time ti

991842

814

764830

857

963733

990

t1 = 02:30

991842

814

764831

857

962733

990

t2 = 03:00

991843

814

764830

857

962733

990

t3 = 03:30

991842

815

764830

857

962733

990

t4 = 04:00

� trajectory recovery = optimization problem� higher “costs” for sudden/far movements� higher “costs” for irregular movements and/or movements at night

� 50% – 91% accuracy, depending on space-time resolution1Fengli Xu, Zhen Tu, Yong Li, Pengyu Zhang, Xiaoming Fu, Depeng Jin: Trajectory Recovery From Ash: User Privacy Is NOT

Preserved in Aggregated Mobility Data, 26th International Conference on World Wide Web (WWW 2017)

Daniel Kraschewski Privacy-Preserving Data Analysis 10 / 15

Page 12: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Secure Anonymization

ε-Differential Privacy� statistical similarity: κ(real data) ≈ κ(real data \me) up to factor eε

Laplace Mechanism1. calculate histogram2. add Laplace noise3. output noisy group sizes

Laplace Distributions

Example histogram ( 110 -Differential Privacy)

Age Sex Diagnosis count noise result< 35 f infection 48 9 57< 35 f NCD 61 -1 60< 35 m infection 75 -5 70< 35 m NCD 44 -7 37≥ 35 f infection 165 6 171≥ 35 f NCD 127 -4 123≥ 35 m infection 228 2 230≥ 35 m NCD 168 -2 166

Daniel Kraschewski Privacy-Preserving Data Analysis 11 / 15

Page 13: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

IT-Security

Secre

t Sha

ring Secret Sharing

request reque

st

MPC

result part resu

lt par

t

Daniel Kraschewski Privacy-Preserving Data Analysis 12 / 15

Page 14: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Design Principles

inte

grat

ion Strictness

� fail-safe defaults� need-to-know principle� principle of least privilege

Robustness� separation of duties� multi-factor/layered security� forward secrecy

Consistency� complete security-model� defense in depth� homogeneity/uniformity

simplicity

Photo by BalticServers.com on Wikimedia Commons (CC BY-SA 3.0), clipped to fit page layout

Daniel Kraschewski Privacy-Preserving Data Analysis 13 / 15

Page 15: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Conclusion & Outlook

Summary� large-scale statistics can be calculated in a privacy-preserving way� security/privacy by design, not just by contract� security/privacy mathematically defined and provable� though, inefficient universal solutions

Improvements� less generic, optimized MPC constructions� less MPC, more IT-Security (e.g., “self-sealing” hardware)� tailored Differential Privacy mechanisms� . . .

Photo by @dnevozhai on Unsplash

Daniel Kraschewski Privacy-Preserving Data Analysis 14 / 15

Page 16: Privacy-Preserving Data Analysis & Security by Design · 2018-05-24 · security/privacy by design, not just by contract security/privacy mathematically defined and provable though,

Thank you for your attention!

DATENSCHUTZ IN ZEITEN VON BIG DATAModerne Methoden für gesetzeskonformen Datenschutz

im Kontext von Customer Analytics und Big Data

Autoren: Maximilian Bode, Dr. Daniel Kraschewski, Michael Pisula, Christoph Stock, Dr. Mayra Stuhldreier und Prof. Dr. Gregor Thüsing

Dr. Daniel KraschewskiSenior Consultant

TNG Technology Consulting GmbHBetastraße 13a85774 Unterfohring

tel: +49 89 2158 9960fax: +49 89 2158 9969

[email protected]

click for

download

Cliparts by AhNinniah, Anonymous, gswanson, j4p4n, qubodup, TikiGiki on openclipart, and succo on pixabay

Daniel Kraschewski Privacy-Preserving Data Analysis 15 / 15