Post on 31-Aug-2018
Privacy preserving data mining –
Introduction and randomization
techniques
Li X
ion
g
CS
57
3 D
ata
Pri
va
cy a
nd
An
on
ym
ity
What Is Data Mining?
�D
ata
min
ing (
know
ledge d
iscovery
fro
m d
ata
)
�E
xtr
action o
f in
tere
sting (non-t
rivia
l,im
plic
it,
pre
vio
usly
unknow
nand p
ote
ntially
usefu
l)patt
ern
s
or
know
ledg
e f
rom
huge a
mount
of data
Know
ledge d
iscovery
in d
ata
bases (
KD
D),
February 12, 2009
2
�K
now
ledge d
iscovery
in d
ata
bases (
KD
D),
know
ledge e
xtr
action,
data
/patt
ern
analy
sis
,
info
rmation h
arv
esting,
busin
ess inte
llige
nce
Privacy preserving data mining
�S
up
po
rt d
ata
min
ing
wh
ile p
rese
rvin
g p
riva
cy
�S
en
sitiv
e r
aw
da
ta
�S
en
sitiv
e m
inin
g r
esu
lts
Sem
inal work
�P
rivacy p
reserv
ing d
ata
min
ing, A
gra
wal and S
rikant,
2000
�C
en
tra
lize
d d
ata
�D
ata
ra
nd
om
iza
tio
n (
ad
ditiv
e n
ois
e)
�D
ecis
ion
tre
e c
lassifie
r
February 12, 2009
4
�D
ecis
ion
tre
e c
lassifie
r
�P
rivacy p
reserv
ing d
ata
min
ing,
Lin
dell
and P
inkas,
2000
�D
istr
ibu
ted
da
ta m
inin
g
�S
ecu
re m
ulti-
pa
rty c
om
pu
tatio
n
�D
ecis
ion
tre
e c
lassifie
r
Taxonomy of PPDM algorithms
�D
ata
dis
trib
ution
�C
en
tra
lize
d
�D
istr
ibu
ted
–P
riva
cy p
rese
rvin
g d
istr
ibu
ted
da
ta m
inin
g
�A
ppro
aches
Inp
ut p
ert
urb
atio
n –
ad
ditiv
e n
ois
e (
ran
do
miz
atio
n),
February 12, 2009
5
�In
pu
t p
ert
urb
atio
n –
ad
ditiv
e n
ois
e (
ran
do
miz
atio
n),
mu
ltip
lica
tive
no
ise
, g
en
era
liza
tio
n, sw
ap
pin
g,
sa
mp
ling
�O
utp
ut
pe
rtu
rba
tio
n –
rule
hid
ing
�C
ryp
to t
ech
niq
ue
s –
se
cu
re m
ultip
art
y c
om
pu
tatio
n
�D
ata
min
ing a
lgorith
ms
�C
lassific
atio
n
�A
sso
cia
tio
n r
ule
min
ing
�C
luste
rin
g
Input Perturbation
�R
eve
al e
ntire
da
tab
ase
, b
ut
ran
do
miz
e e
ntr
ies
Da
tab
ase
Use
r
slid
e 6
x1
5 xn
x1+ε 1
…xn+ε n
Add r
andom
nois
e ε
ito
each d
ata
base e
ntr
y x
i
Fo
r e
xa
mp
le,
if d
istr
ibu
tio
n o
f n
ois
e h
as
me
an
0,
use
r ca
n c
om
pu
te a
ve
rag
e o
f x
i
Randomization techniques
�P
riva
cy p
rese
rvin
g d
ata
min
ing
, A
gra
wa
la
nd
Sri
ka
nt,
20
00
�S
em
ina
l w
ork
on
de
cis
ion
tre
e c
lassifie
r
�L
imitin
g P
riva
cy B
rea
ch
es in
Pri
va
cy-
�L
imitin
g P
riva
cy B
rea
ch
es in
Pri
va
cy-
Pre
se
rvin
g D
ata
Min
ing
, E
vfim
ievskia
nd
Ge
hrk
e, 2
00
3
�R
efin
ed
priva
cy d
efin
itio
n
�A
sso
cia
tio
n r
ule
min
ing
Randomization Based Decision Tree Learning
(Agrawal and Srikant ’00)
�B
asic
id
ea
: P
ert
urb
Da
ta w
ith
Va
lue
Dis
tort
ion
�U
ser
pro
vid
es x i+r
inste
ad o
f x i
�r
is a
random
valu
e�
Un
ifo
rm, u
nifo
rm d
istr
ibu
tio
n b
etw
ee
n [-α, α]
�G
au
ssia
n, n
orm
al d
istr
ibu
tio
n w
ith
µ= 0, σ
�H
yp
oth
esis
�H
yp
oth
esis
�M
iner
doesn’t s
ee t
he r
eal data
or
can’t
reconstr
uct re
al valu
es
�M
iner
can r
econstr
uct enough info
rmation t
o b
uild
decis
ion t
ree for
cla
ssific
ation
Randomization Approach
50
| 4
0K
| .
..3
0 | 7
0K
| .
..
...
Ra
nd
om
ize
rR
an
do
miz
er
65
| 2
0K
| .
..
25
| 6
0K
| .
....
.
Alic
e’s
age
Add r
andom
num
ber
to
Age
...
Cla
ssific
atio
n
Alg
ori
thm
Mo
de
l
65
| 2
0K
| .
..
25
| 6
0K
| .
....
.
30
becom
es
65
(30+
35)
?
�C
lassific
ation
�pre
dic
ts c
ate
gorical cla
ss labels
(dis
cre
te o
r nom
inal)
�P
redic
tion (
Regre
ssio
n)
�m
odels
continuous-v
alu
ed f
unctions,
i.e.,
pre
dic
ts
unknow
n o
r m
issin
g v
alu
es
Classification
Fe
bru
ary
12
, 2
00
81
0
�Typic
al applic
ations
�C
redit a
ppro
val
�Targ
et m
ark
eting
�M
edic
al dia
gnosis
�F
raud d
ete
ction
Motivating Example for Classification –Fruit
Identification
Dangero
us
Soft
Red
Sm
ooth
Safe
Hard
Larg
eG
reen
Hairy
safe
Hard
Larg
eB
row
nH
airy
Conclu
sio
nF
lesh
Siz
eC
olo
rS
kin
Larg
e
Li X
ion
g11
11
5
Dangero
us
Hard
Sm
all
Sm
ooth
Safe
Soft
Larg
eG
reen
Hairy
Red
Another Example –Credit Approval
Nam
eA
ge
Incom
e5
Cre
dit
Cla
rk35
Hig
h5
Excelle
nt
Milt
on
38
Hig
h5
Excelle
nt
Neo
25
Mediu
m5
Fair
55
55
5
Fe
bru
ary
12
, 2
00
81
2
�C
lassific
atio
n r
ule
:
�If a
ge
= “
31
...4
0”
an
d in
co
me
= h
igh
th
en
cre
dit_
ratin
g =
exce
llen
t
�F
utu
re c
usto
me
rs
�P
au
l: a
ge
= 3
5, in
co
me
= h
igh
⇒e
xce
llen
t cre
dit r
atin
g
�Jo
hn
: a
ge
= 2
0, in
co
me
= m
ed
ium
⇒fa
ir c
red
it r
atin
g
55
55
5
Classification—A Two-Step Process
�M
odel constr
uction:
describin
g a
set of pre
dete
rmin
ed
cla
sses
�E
ach tuple
/sam
ple
is a
ssum
ed t
o b
elo
ng t
o a
pre
defined c
lass,
as d
ete
rmin
ed b
y the c
lass label
att
ribute
The s
et of tu
ple
s u
sed f
or
model constr
uction is
Fe
bru
ary
12
, 2
00
8D
ata
Min
ing: C
on
ce
pts
an
d T
ech
niq
ue
s1
3
�T
he s
et of tu
ple
s u
sed f
or
model constr
uction is
train
ing s
et
�T
he m
odel is
repre
sente
d a
s c
lassific
ation r
ule
s,
decis
ion t
rees, or
math
em
atical fo
rmula
e
�M
odel usage:
for
cla
ssifyin
g f
utu
re o
r unknow
n o
bje
cts
Training Dataset
ag
ein
co
me
stu
de
nt
cre
dit_
ratin
gb
uys_
co
mp
ute
r
<=
30
hig
hn
ofa
irn
o
<=
30
hig
hn
oe
xce
lle
nt
no
31
54
0h
igh
no
fair
ye
s
>4
0m
ed
ium
no
fair
ye
s
>4
0lo
wye
sfa
irye
s
Fe
bru
ary
12
, 2
00
8D
ata
Min
ing: C
on
ce
pts
an
d T
ech
niq
ue
s1
4
>4
0lo
wye
sfa
irye
s
>4
0lo
wye
se
xce
lle
nt
no
31
54
0lo
wye
se
xce
lle
nt
ye
s
<=
30
me
diu
mn
ofa
irn
o
<=
30
low
ye
sfa
irye
s
>4
0m
ed
ium
ye
sfa
irye
s
<=
30
me
diu
mye
se
xce
lle
nt
ye
s
31
54
0m
ed
ium
no
exc
elle
nt
ye
s
31
54
0h
igh
ye
sfa
irye
s
>4
0m
ed
ium
no
exc
elle
nt
no
Output: A Decision Tree for “buys_computer”
age?
overcast
<=
30
>40
31
..4
0
Fe
bru
ary
12
, 2
00
8D
ata
Min
ing: C
on
ce
pts
an
d T
ech
niq
ue
s1
5
student?
credit rating?
no
yes
yes
yes
fair
excellent
yes
no
Algorithm for Decision Tree Induction
�ID
3 (
Itera
tive D
ichoto
mis
er)
, C
4.5
�C
AR
T (
Cla
ssific
ation a
nd R
egre
ssio
n T
rees)
�B
asic
alg
orith
m (
a g
reedy a
lgorith
m)
-tr
ee is c
onstr
ucte
d in a
top-d
ow
n
recurs
ive d
ivid
e-a
nd-c
onquer
manner
�A
t sta
rt, all
the t
rain
ing e
xam
ple
s a
re a
t th
e r
oot
�A
test
att
ribute
is s
ele
cte
d t
hat
“best”
separa
te t
he d
ata
into
part
itio
ns
Fe
bru
ary
12
, 2
00
8D
ata
Min
ing: C
on
ce
pts
an
d T
ech
niq
ue
s1
6
�A
test
att
ribute
is s
ele
cte
d t
hat
“best”
separa
te t
he d
ata
into
part
itio
ns
�H
eu
ristic o
r sta
tistica
l m
ea
su
re
�S
am
ple
s a
re p
art
itio
ned r
ecurs
ively
based o
n s
ele
cte
d a
ttribute
s
�C
onditio
ns f
or
sto
ppin
g p
art
itio
nin
g
�A
ll sam
ple
s f
or
a g
iven n
ode b
elo
ng t
o the s
am
e c
lass
�T
here
are
no r
em
ain
ing a
ttribute
s f
or
furt
her
part
itio
nin
g –
majo
rity
voting
is e
mplo
yed f
or
cla
ssifyin
g t
he leaf
�T
here
are
no s
am
ple
s left
Attribute Selection Measures
�Id
ea
: se
lect a
ttri
bu
te th
at p
art
itio
n s
am
ple
s in
to h
om
og
en
eo
us g
rou
ps
�M
ea
su
res
�In
form
atio
n g
ain
(ID
3)
�G
ain
ra
tio
(C
4.5
)
�G
ini in
de
x (
CA
RT
)
Fe
bru
ary
12
, 2
00
8D
ata
Min
ing: C
on
ce
pts
an
d T
ech
niq
ue
s1
7
Attribute Selection Measure: Information
Gain (ID3)
�S
ele
ct th
e a
ttri
bu
te w
ith
th
e h
igh
est in
form
atio
n g
ain
�L
et pib
e th
e p
rob
ab
ility
th
at a
n a
rbitra
ry tu
ple
in
D b
elo
ng
s to
cla
ss C
i,
estim
ate
d b
y |Ci, D
|/|D
|
�E
xp
ecte
d in
form
atio
n(e
ntr
op
y)
ne
ed
ed
to
cla
ssify a
tu
ple
in
D:
)(
log
)(
2
1
i
m i
ip
pD
Info
∑ =
−=
Fe
bru
ary
12
, 2
00
8D
ata
Min
ing: C
on
ce
pts
an
d T
ech
niq
ue
s1
8
�In
form
atio
nn
ee
de
d (
afte
r u
sin
g A
to
sp
lit D
in
to v
pa
rtitio
ns)
to c
lassify
D:
�In
form
atio
n g
ain
–d
iffe
ren
ce
be
twe
en
ori
gin
al in
form
atio
n r
eq
uir
em
en
t
an
d th
e n
ew
in
form
atio
n r
eq
uir
em
en
t b
y b
ran
ch
ing
on
attri
bu
te A
1i=
)(
||
||
)(
1
j
v j
j
AD
Info
DDD
Info
×=∑ =
(D)
Info
Info(D)
Gain(A)
A−
=
Attribute Selection Measure: Gini index (CART)
�If a
da
ta s
et D c
on
tain
s e
xa
mp
les fro
m n
cla
sse
s, g
ini in
de
x, gini(D
) is
de
fin
ed
as
wh
ere
pjis
th
e r
ela
tive
fre
qu
en
cy o
f cla
ss j
in D
�If a
da
ta s
et D
is s
plit
on
A in
to tw
o s
ub
se
ts D1
an
d D2, th
e giniin
de
x
∑ =−
=n j
pj
Dgini
1
21
)(
Fe
bru
ary
12
, 2
00
8D
ata
Min
ing: C
on
ce
pts
an
d T
ech
niq
ue
s1
9
12
gini(D
) is
de
fin
ed
as
�R
ed
uctio
n in
Im
pu
rity
:
�T
he
attri
bu
te p
rovid
es th
e s
ma
llest gini split(D
) (o
r th
e la
rge
st re
du
ctio
n
in im
pu
rity
) is
ch
ose
n to
sp
lit th
e n
od
e
)(
||
||
)(
||
||
)(
22
11
Dgini
DDD
gini
DDD
giniA
+=
)(
)(
)(
Dgini
Dgini
Agini
A−
=∆
Information-Gain for Continuous-Value
Attributes
�Let att
ribute
A b
e a
continuous-v
alu
ed a
ttribute
�M
ust dete
rmin
e t
he best split point
for A
�S
ort
the v
alu
e A
in incre
asin
g o
rder
�Typic
ally
, th
e m
idpoin
t betw
een e
ach p
air o
f adja
cent
Fe
bru
ary
12
, 2
00
8D
ata
Min
ing: C
on
ce
pts
an
d T
ech
niq
ue
s2
0
valu
es is c
onsid
ere
d a
s a
possib
le split point
�(a
i+a
i+1)/
2 is th
e m
idp
oin
t b
etw
ee
n th
e v
alu
es o
f a
ia
nd
ai+
1
�T
he p
oin
t w
ith t
he minimum expected information
requirement
for A
is s
ele
cte
d a
s t
he s
plit
-poin
t fo
r A
�S
plit
:
�D
1 is t
he s
et of tu
ple
s in D
satisfy
ing A
≤ s
plit
-poin
t, a
nd
D2 is t
he s
et of tu
ple
s in D
satisfy
ing A
> s
plit
-poin
t
Randomization Approach
50
| 4
0K
| .
..3
0 | 7
0K
| .
..
...
Ra
nd
om
ize
rR
an
do
miz
er
65
| 2
0K
| .
..
25
| 6
0K
| .
....
.
Alic
e’s
age
Add r
andom
num
ber
to
Age
...
Cla
ssific
atio
n
Alg
ori
thm
Mo
de
l
65
| 2
0K
| .
..
25
| 6
0K
| .
....
.
30
becom
es
65
(30+
35)
?
Attribute Selection Measure: Gini index (CART)
�If a
da
ta s
et D c
on
tain
s e
xa
mp
les fro
m n
cla
sse
s, g
ini in
de
x, gini(D
) is
de
fin
ed
as
wh
ere
pjis
th
e r
ela
tive
fre
qu
en
cy o
f cla
ss j
in D
�If a
da
ta s
et D
is s
plit
on
A in
to tw
o s
ub
se
ts D1
an
d D2, th
e giniin
de
x
∑ =−
=n j
pj
Dgini
1
21
)(
Fe
bru
ary
12
, 2
00
8D
ata
Min
ing: C
on
ce
pts
an
d T
ech
niq
ue
s2
2
12
gini(D
) is
de
fin
ed
as
�R
ed
uctio
n in
Im
pu
rity
:
�T
he
attri
bu
te p
rovid
es th
e s
ma
llest gini split(D
) (o
r th
e la
rge
st re
du
ctio
n
in im
pu
rity
) is
ch
ose
n to
sp
lit th
e n
od
e
)(
||
||
)(
||
||
)(
22
11
Dgini
DDD
gini
DDD
giniA
+=
)(
)(
)(
Dgini
Dgini
Agini
A−
=∆
Randomization Approach Overview
50
| 4
0K
| .
..3
0 | 7
0K
| .
..
...
Ra
nd
om
ize
rR
an
do
miz
er
65
| 2
0K
| .
..
25
| 6
0K
| .
....
.
Alic
e’s
age
Add r
andom
num
ber
to
Age
...
Re
co
nstr
uct
Dis
trib
utio
n
of A
ge
Re
co
nstr
uct
Dis
trib
utio
n
of
Sa
lary
Cla
ssific
atio
n
Alg
ori
thm
Mo
de
l
65
| 2
0K
| .
..
25
| 6
0K
| .
....
.
30
becom
es
65
(30+
35)
Original Distribution Reconstruction
�x 1, x 2, …, x n
are
the n
origin
al data
valu
es
�D
raw
n fro
m n
iid r
an
do
m v
ari
ab
les X
1, X2, …, Xn
sim
ilar
to X
�U
sin
g value distortion,
�T
he
giv
en
va
lue
s a
re w
1= x1 + y1, w2= x2 + y2, …, wn= xn+ yn
�y i
’s a
re fro
m n
iid r
an
do
m v
ari
ab
les Y
1, Y2, …, Yn
sim
ilar
to Y
�y i
’s a
re fro
m n
iid r
an
do
m v
ari
ab
les Y
1, Y2, …, Yn
sim
ilar
to Y
�R
econstr
uction P
roble
m:
�G
ive
n FY
an
d wi’s
, e
stim
ate
FX
Original Distribution Reconstruction: Method
�B
ayes’ t
heore
m for
continuous d
istr
ibution
�T
he e
stim
ate
d d
ensity f
unction:
()
()
()
∑−
=′
nX
iY
af
aw
fa
f1
�It
era
tive e
stim
ation
�T
he
in
itia
l e
stim
ate
fo
r f X
at j=0:
un
ifo
rm d
istr
ibu
tio
n
�Ite
rative
estim
atio
n
�S
top
pin
g C
rite
rio
n: χ2
test betw
een s
uccessiv
e ite
rations
()
()
()
()
()
∑∫
=∞ ∞−
−=
′i
Xi
Y
Xi
YX
dz
zf
zw
fn
af
1
1
()
()
()
()
()
∑∫
=∞ ∞−
+
−−=
n ij
Xi
Y
j
Xi
Yj
X
dz
zf
zw
f
af
aw
f
na
f1
11
Reconstruction of Distribution
80
0
10
00
12
00
�umber of People
Ori
gin
al
0
20
0
40
0
60
0
20
60
Ag
e
�umber of People
Ori
gin
al
Ran
do
miz
ed
Rec
on
stru
cted
Original Distribution Construction for
Decision Tree �
Wh
en
to
re
co
nstr
uct d
istr
ibu
tio
ns?
�G
lobal
�R
eco
nstr
uct fo
r e
ach
attri
bu
te o
nce
at th
e
be
gin
nin
g
�B
uild
th
e d
ecis
ion
tre
e u
sin
g th
e
reco
nstr
ucte
d d
ata
reco
nstr
ucte
d d
ata
�B
yC
lass
�F
irst sp
lit th
e tra
inin
g d
ata
�R
eco
nstr
uct fo
r e
ach
cla
ss s
ep
ara
tely
�B
uild
th
e d
ecis
ion
tre
e u
sin
g th
e
reco
nstr
ucte
d d
ata
�Local
�F
irst sp
lit th
e tra
inin
g d
ata
�R
eco
nstr
uct fo
r e
ach
cla
ss s
ep
ara
tely
�R
eco
nstr
uct a
t e
ach
no
de
wh
ile b
uild
ing
th
e
tre
e
Accuracy vs. Randomization Level
Fn
3
90
10
0
40
50
60
70
80
10
20
40
60
80
10
01
50
20
0
Ran
do
miz
ati
on
Lev
el
Accuracy
Ori
gin
al
Ran
do
miz
ed
ByC
lass
More Results
�G
lob
al p
erf
orm
s w
ors
e th
an
ByC
lass a
nd
Lo
ca
l
�B
yC
lass a
nd
Lo
ca
l h
ave
accu
racy w
ith
in 5
% to
15
% (
ab
so
lute
e
rro
r) o
f th
e O
rig
ina
l a
ccu
racy
�O
ve
rall,
all
are
mu
ch
be
tte
r th
an
th
e R
an
do
miz
ed
accu
racy
How to Measure Privacy Breach
�W
ea
k: n
o s
ing
le d
ata
ba
se
en
try h
as b
ee
n
reve
ale
d
�S
tro
ng
er:
no
sin
gle
pie
ce
of in
form
atio
n is
reve
ale
d(w
ha
t’s t
he
diffe
ren
ce
fro
m t
he
slid
e 3
2
reve
ale
d(w
ha
t’s t
he
diffe
ren
ce
fro
m t
he
“we
ak”
ve
rsio
n?
)
�S
tro
ng
est:
th
e a
dve
rsa
ry’s
be
liefs
ab
ou
t th
e
da
ta h
ave
no
t ch
an
ge
d
Kullback-Leibler Distance
�M
ea
su
res t
he
“d
iffe
ren
ce
” b
etw
ee
n t
wo
pro
ba
bili
ty d
istr
ibu
tio
ns
slid
e 3
3
Privacy of Input Perturbation
�X
is a
ra
nd
om
va
ria
ble
, R
is th
e r
an
do
miz
atio
n
op
era
tor,
Y=
R(X
) is
th
e p
ert
urb
ed
da
tab
ase
�M
ea
su
re m
utu
al in
form
atio
n b
etw
ee
n o
rig
ina
l
an
d r
an
do
miz
ed
da
tab
ase
s
slid
e 3
4
an
d r
an
do
miz
ed
da
tab
ase
s
�A
ve
rag
e K
L d
ista
nce
be
twe
en
(1
) d
istr
ibu
tio
n o
f X
an
d (
2)
dis
trib
utio
n o
f X
co
nd
itio
ne
d o
n Y
=y
�E
y (K
L(P
X|Y
=y
|| P
x))
�In
tuitio
n: if th
is d
ista
nce
is s
ma
ll, th
en
Y le
aks little
info
rmatio
n a
bo
ut a
ctu
al va
lue
s o
f X
�W
hy is t
his
de
fin
itio
n p
rob
lem
atic?
Is the randomization sufficient?
Gla
dys:
85
Doris:
90
Na
me
: A
ge
da
tab
ase
Gla
dys:
72
Doris: 110
Bery
l:
85
Age is a
n inte
ger
betw
een 0
and 9
0
slid
e 3
5
Gla
dys:
85
Doris:
90
Bery
l:
82
Random
ize d
ata
base e
ntr
ies
by a
ddin
g r
andom
inte
gers
betw
een -
20 a
nd 2
0
Random
ization o
pera
tor
has to b
e p
ublic
(w
hy?)
Doris’s
age is 9
0!!
Privacy Definitions
�M
utu
al in
form
atio
n c
an
be
sm
all
on
ave
rag
e,
bu
t
an
in
div
idu
al ra
nd
om
ize
d v
alu
e c
an
still
lea
k a
lo
t
of in
form
atio
n a
bo
ut
the
ori
gin
al va
lue
�B
ett
er:
co
nsid
er
so
me
pro
pe
rty Q
(x)
slid
e 3
6
Be
tte
r: c
on
sid
er
so
me
pro
pe
rty Q
(x)
�A
dve
rsa
ry h
as a
prio
ri p
rob
ab
ility
Pith
at
Q(x
i) is
tru
e
�P
riva
cy b
rea
ch
if
reve
alin
g y
i=R
(xi) s
ign
ific
an
tly
ch
an
ge
s a
dve
rsa
ry’s
pro
ba
bili
ty t
ha
t Q
(xi) is tru
e
�In
tuitio
n: a
dve
rsa
ry le
arn
ed
so
me
thin
g a
bo
ut e
ntr
y
xi(n
am
ely
, lik
elih
oo
d o
f p
rop
ert
y Q
ho
ldin
g fo
r th
is
en
try)
Example
�D
ata
: 0≤x≤1
00
0,
p(x
=0
)=0
.01
, p
(x=
k)=
0.0
00
99
�R
eve
al y=
R(x
)
�T
hre
e p
ossib
le r
an
do
miz
atio
n o
pe
rato
rs R
�R
1(x
) =
xw
ith
pro
b. 2
0%
; a
un
ifo
rmly
ran
do
m
slid
e 3
7
�R
1(x
) =
xw
ith
pro
b. 2
0%
; a
un
ifo
rmly
ran
do
m
nu
mb
er
with
pro
b. 8
0%
�R
2(x
) =
x+ξ
mo
d 1
00
1, ξ
un
ifo
rm in
[-1
00
,10
0]
�R
3(x
) =
R2(x
)w
ith
pro
b. 5
0%
, a
un
ifo
rmra
nd
om
nu
mb
er
with
pro
b. 5
0%
�W
hic
h r
an
do
miz
atio
n o
pe
rato
r is
be
tte
r?
Some Properties
�Q
1(x
): x
=0
; Q
2(x
): x∉
{20
0,
...,
80
0}
�W
ha
t a
re t
he
a p
rio
ri p
rob
ab
ilitie
s f
or
a g
ive
n x
tha
t th
ese
pro
pe
rtie
s h
old
?
�Q
1(x
): 1
%, Q
2(x
): 4
0.5
%
slid
e 3
8
�Q
1(x
): 1
%, Q
2(x
): 4
0.5
%
�N
ow
su
pp
ose
ad
ve
rsa
ry le
arn
ed
th
at y=
R(x
)=0
.
Wh
at a
re p
rob
ab
ilitie
s o
f Q
1(x
) a
nd
Q2(x
)?
�If R
= R
1th
en
Q1(x
): 7
1.6
%,
Q2(x
): 8
3%
�If R
= R
2th
en
Q1(x
): 4
.8%
, Q
2(x
): 1
00
%
�If R
= R
3th
en
Q1(x
): 2
.9%
, Q
2(x
): 7
0.8
%
Privacy Breaches
�R
1(x
) le
aks in
form
atio
n a
bo
ut
pro
pe
rty Q
1(x
)
�B
efo
re s
ee
ing
R1(x
), a
dve
rsa
ry th
inks th
at
pro
ba
bili
ty o
f x=
0 is o
nly
1%
, b
ut a
fte
r n
oticin
g th
at
R1(x
)=0
, th
e p
rob
ab
ility
th
at
x=
0 is 7
2%
slid
e 3
9
�R
2(x
) le
aks in
form
atio
n a
bo
ut
pro
pe
rty Q
2(x
)
�B
efo
re s
ee
ing
R2(x
), a
dve
rsa
ry th
inks th
at
pro
ba
bili
ty o
f x∉
{20
0, ..., 8
00
} is
41
%, b
ut
afte
r
no
ticin
g th
at
R2(x
)=0
, th
e p
rob
ab
ility
th
at
x∉
{20
0,
..., 8
00
} is
10
0%
�R
an
do
miz
atio
n o
pe
rato
r sh
ou
ld b
e s
uch
th
at
po
ste
rio
r d
istr
ibu
tio
n is c
lose
to
th
e p
rio
r
dis
trib
utio
n fo
r a
ny
pro
pe
rty
Privacy Breach: Definitions
�Q
(x)
is s
om
e p
rop
ert
y, ρ
1, ρ 2
are
pro
ba
bili
tie
s
�ρ 1∼“
ve
ry u
nlik
ely
”, ρ
2∼“
ve
ry lik
ely
”
�S
tra
igh
t p
riva
cy b
rea
ch
:
P(Q
(x))
≤ρ 1
, b
ut P
(Q(x
) | R
(x)=
y) ≥ρ 2[E
vfim
ievski e
t a
l.]
slid
e 4
0
P(Q
(x))
≤ρ 1
, b
ut P
(Q(x
) | R
(x)=
y) ≥ρ 2
�Q
(x)
is u
nlik
ely
a p
rio
ri, b
ut lik
ely
afte
r se
ein
g
ran
do
miz
ed
va
lue
of
x
�In
ve
rse
pri
va
cy b
rea
ch
:
P(Q
(x))
≥ρ 2
, b
ut P
(Q(x
) | R
(x)=
y) ≤ρ 1
�Q
(x)
is lik
ely
a p
rio
ri, b
ut u
nlik
ely
afte
r se
ein
g
ran
do
miz
ed
va
lue
of
x
How to check privacy breach
�H
ow
to
en
su
re t
ha
t ra
nd
om
iza
tio
n o
pe
rato
r
hid
es e
ve
ry p
rop
ert
y?
�T
he
re a
re 2
|X|p
rop
ert
ies
�O
fte
n r
an
do
miz
atio
n o
pe
rato
r h
as to
be
se
lecte
d
slid
e 4
1
�O
fte
n r
an
do
miz
atio
n o
pe
rato
r h
as to
be
se
lecte
d
eve
n b
efo
re d
istr
ibu
tio
n P
xis
kn
ow
n (
wh
y?
)
�Id
ea
: lo
ok a
t o
pe
rato
r’s tra
nsitio
n p
rob
ab
ilitie
s
�H
ow
lik
ely
is x
ito
be
ma
pp
ed
to
a g
ive
n y
?
�In
tuitio
n: if a
ll p
ossib
le v
alu
es o
f x
ia
re e
qu
ally
like
ly to
be
ra
nd
om
ize
d to
a g
ive
n y
, th
en
reve
alin
g y
=R
(xi) w
ill n
ot
reve
al m
uch
ab
ou
t
actu
al va
lue
of x
i
Amplification
�R
an
do
miz
atio
n o
pe
rato
r is
γ-a
mp
lifyin
g f
or
y if
γ≤
→→∈
∀
y)
p(x
y)
p(x
:V
x ,
x 21
x2
1
[Evfim
ievski e
t a
l.]
slid
e 4
2
�F
or
giv
en
ρ1, ρ 2
, n
o s
tra
igh
t o
r in
ve
rse
pri
va
cy
bre
ach
es o
ccu
r if
γρρ
ρρ
) -
(1
) -
(1
21
12>
Amplification: Example
�R
1(x
) =
xw
ith
pro
b.
20
%;
a u
nifo
rmly
ran
do
m
nu
mb
er
with
pro
b. 8
0%
�R
2(x
) =
x+ξ
mo
d 1
00
1, ξ
un
ifo
rm in
[-1
00
,10
0]
�R
3(x
) =
R2(x
)w
ith
pro
b. 5
0%
, a
un
ifo
rmra
nd
om
slid
e 4
3
�R
3(x
) =
R2(x
)w
ith
pro
b. 5
0%
, a
un
ifo
rmra
nd
om
nu
mb
er
with
pro
b. 5
0%
�F
or
R3,
p(x→
y)
= ½
(1
/20
1 +
1/1
00
1)
if
y∈
[x-1
00
,x+
10
0]
½(0
+ 1
/10
01
)
o
the
rwis
e
�F
ractio
na
l d
iffe
ren
ce
= 1
+ 1
00
1/2
01
< 6
(=
γ)
�T
he
refo
re,
no
str
aig
ht o
r in
ve
rse
pri
va
cy
bre
ach
es w
ill o
ccu
r w
ith
ρ1=
14
%, ρ 2
=5
0%