unige.it - Differential Privacy and Generalization: Sharper...
Transcript of unige.it - Differential Privacy and Generalization: Sharper...
UniversityofGenoa
PolytechnicSchoolandtheSchoolofScience
ResearchandTechnologyTransferLaboratory
Differential Privacy and Generalization: Sharper Bounds, Theoretically Grounded
Algorithms, and ThresholdoutLuca Oneto
Summer School on Applied Harmonic Analysis,Genoa, Italy, 24th July 2017
SmartLab
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 2
D.AnguitaAssociateProf.
S.RidellaEmeritusProf.
L.OnetoAssistantProfessor
G.ClericoRes.Assistant
A.LulliPostdoc
M.CambiasoRes.Assistant
I.OrlandiPhDStud.
E.FumeoPhDStud.
F.CipolliniPhDStud.
P.SanettiRes.Assistant
Expertise
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 3
Scientific Research
• Topics• Neural Networks• Kernel Methods• Ensemble Methods• Statistical Learning
Theory• Machine Learning• Data Mining• High Performance
Computing• Big & Small Data
Analysis• CBM, EDM, HAR,
Sentiment Analysis, Cybercecurity
• ···
• Publications• > 50 International High
Ranked Journals• > 100 International High
Ranked Conferences• ···
Technology Transfer
• aizoOn S.r.l.• Ansaldo STS S.p.A.• Brembo S.p.A• Bombardier Transportation• Cetena S.p.A.• Damen Shipyards Group• Ferrari S.p.A. - Scuderia Ferrari• VarGroup
• ···
European Projects
• Basic Research• EC NeuroNet I & II -
Network of Excellence on Neural Networks
• EC RAIN - Redundant Array of Inexpensive Workstations for Neurocomputing
• EC EUNITE - European Network of Excellence on Information Technology for Smart Adaptive Systems
• EC-FET NiSIS - Nature-inspired Smart Information Systems
• ···
• Applied Research• EC-H2020 IN2DREAMS • EC-H2020 In2Rail• EC-FP7 MAXBE• ···
Privacy• In the last years researchers have studied many
ways to access data in a private way (aggregate, noise, etc.)
• Privacy is a bad thing from a data scientist point of view (we cannot access data if not aggregate, etc.)
• The breakthrough was to find a way to exploit privacy as a new regularization method and as a tool for better assessing the generalization performances of a learning algorithm
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 4
Supervised Learning
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] - [email protected] 5
The only things available for learning is a set of examples of the mapping.[1] Vapnik, V.N., 1998. Statistical learning theory. Wiley New York.
x ⇠ PX , y = f(x)x ⇠ PX , y = f(x)
Deterministic Functions/Learning AlgorithmsGiven a set of data the algorithm always returns the same model.
The model is a function chosen in a set of functions: given the function and a point, the predicted output is always the same.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 6
x ⇠ PX , y = f(x)
f = A (s)
Randomized Functions
Given a set of data the algorithm always returns the same model.
The model is a distribution over a set of functions. Given the model and a point, the predicted output may be different.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 7
⇢ A (s)
x ⇠ PX , f ⇠ ⇢, y = f(x)
Randomized Learning AlgorithmsGiven a set of data the algorithm may return different models.
The model can be a deterministic or randomized function. In our case the function is deterministic.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 8
x ⇠ PX , y = F (x)
F = A (s)
Notation
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 9
[1] Oneto, Luca, Sandro Ridella, and Davide Anguita. "Differential privacy and generalization: Sharper bounds with applications." Pattern Recognition Letters 89 (2017): 31-38.[2] Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., Roth, A., 2015b.Preserving statistical validity in adaptive data analysis, in: Annual ACM Symposium on Theory of Computing.
x 2 X , y 2 Y, z 2 Z = X ⇥ Y, PX , PY , PZ
Zn = S 3 s = {z1, · · ·, zn} = {(x1, y1), · · ·, (xn, yn)} i.i.d. PS
Z 3 Z ⇠ PZ , S 3 S ⇠ PS
s = {z1, · · ·, zi�1, zi, zi+1, · · ·, zn} zi i.i.d. zi
S ✓ Sf : X ! Y, f 2 FF ✓ FA : S ! F , PA
D : F ! S` : F ⇥ Z ! [0, 1]
L(f) = EZ`(f,Z), V (f) = EZ [`(f,Z)� L(f)]2
bL
sn(f) = 1
/n
nX
i=1
`(f, zi), bV
sn (f) = 1
/n(n � 1)
nX
i=1
nX
j=i+1
[`(f, zi)� `(f, zj)]2
Goal
Estimate the true (generalization) error of the model based on the empirical data
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 10
P{|L(f)� bLn(f)| � ✏} �
|L(f)� bLn(f)| ✏, @(1� �)
� $ ✏
Differentially Private (DP) Randomized Learning Algorithms
A Randomized Learning Algorithm is-DP if
[1] Dwork, C., Roth, A., 2014. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9, 1–277.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 11
PA
n
A (s) 2 Fo
e✏PA
n
A (s) 2 Fo
+ �
(✏, �)
8F ✓ F , 8s, s 2 S
Differentially Private (DP) Randomized Learning Algorithms
A Randomized Learning Algorithm is-DP if
Proof:
[1] Oneto, Luca, Sandro Ridella, and Davide Anguita. "Differential privacy and generalization: Sharper bounds with applications." Pattern Recognition Letters 89 (2017): 31-38.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 12
✏
PA
n
A (s) 2 Fo
=R
FPA {A (s) = f}df
R
Fe✏PA {A (s) = f}df = e✏PA
n
A (s) 2 Fo
PA {A (s) = f}PA {A (s) = f} e✏, 8f 2 F , 8s, s 2 S
Hold Out, Compression, Complexity, Stability, and… Privacy (I)The first option to estimate the generalization performance of an algorithm is to split the data
[1] Hoeffding, Wassily. "Probability inequalities for sums of bounded random variables." Journal of the American statistical association 58.301 (1963): 13-30.[2] Anguita D, Ghio A, Oneto L, Ridella S. In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Transactions on Neural Networks and Learning Systems. 2012 Sep;23(9):1390-406.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 13
s = s1 [ s2, s1 \ s2 = ↵
PS2
n
L(A (s1))� bLS2
|S2|(A (s1)) � to
e�2|s2|t2
Hold Out, Compression, Complexity, Stability, and… Privacy (II)Another option is to check how much the algorithm compresses the data
[1] Floyd S, Warmuth M. Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine learning. 1995 Dec 1;21(3):269-304.[2] Langford J, McAllester D. Computable shell decomposition bounds. Journal of Machine Learning Research. 2004;5(May):529-47.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 14
s0 ✓ s
PS
n
L(A (S))� bLSn (A (S)) � t
o
n
✓
n
|s0|
◆
e�2nt2
Hold Out, Compression, Complexity, Stability, and… Privacy (III)Another option is to check how large is the function space in which the algorithm chooses the solution
[1] V. N. Vapnik, Statistical learning theory, Wiley-Interscience, 1998.[2] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Global rademacher com- plexity bounds: From slow to fast convergence rates, Neural Processing Letters 43 (2) (2015) 567–602.[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademacher complex- ity: Sharper risk bounds with and without unlabeled samples, Neural Networks 65 (2015) 115–125.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 15
C(F) : ndV C(F), enR2(F)
PS
n
L(A (S))� bLSn (A (S)) � t
o
c2C(F)e�c1nt2
PS
n
L(A (S))� bLSn (A (S)) � t
o
c2C⇣n
f : f 2 F , bLsn(f) c3
o⌘
e�c1nt2
c3(L, C, t, n)
Hold Out, Compression, Complexity, Stability, and… Privacy (IV)
Another way is to check how close the functions chosen by the algorithm are
[1] O. Bousquet, A. Elisseeff, Stability and generalization, The Journal of Machine Learning Research 2 (2002) 499–526.[2] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Fully empirical and data- dependent stability-based bounds, IEEE Transactions on Cybernetics 45 (9) (2015) 1913–1926.[3] Maurer A. A Second-order Look at Stability and Generalization. InConference on Learning Theory 2017 Jun 18 (pp. 1461-1475).
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 16
|` (A (s), ·)� `(A (s), ·)|1 �
PS
n
L(A (S))� bLSn (A (S)) � t
o
c2en�2�c1nt
2
DP Main Result
Proof: rather technical…
[1] C. Dwork, V. Feldman, M. Hardt, T. Pitassi, O. Reingold, A. Roth, Preserving statistical validityin adaptive data analysis, in: Symposium on Theory of Computing, 2015.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 17
If PS{S 2 D(f)} �
8f 2 F and ✏ p
ln (1/�)/2n
! PS,F {S 2 D(F )} 3p�
Hoeffding-type Bounds
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 18
Proof:
[1] Hoeffding, Wassily. Probability inequalities for sums of bounded random variables. Journal of the American statistical association 58.301 (1963): 13-30.[2] C. Dwork, V. Feldman, M. Hardt, T. Pitassi, O. Reingold, A. Roth, Preserving statistical validity in adaptive data analysis, in: Symposium on Theory of Computing, 2015.[3] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.
✏ t
! PS,F {L(F ) � bLSn (F ) + t} 3e�nt2
✏ p
t2 � ln(2)/2n
! PS,F {|L(F )� bLSn (F )| � t} 3
p2e�nt2
PS{L(f)� bLSn (f) � t} e�2nt2
D(f) = {s 2 S : L(f)� bLSn (f) > t}
� = e�2nt2
O (1/pn)
Chernoff and Bennett-type Bounds
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 19
[1] H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, The Annals of Mathematical Statistics 23 (4) (1952) 493–507.[2] A. Maurer, M. Pontil, Empirical bernstein bounds and sample variance penalization, in: Conference on Learning Theory, 2009.[3] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.
✏ t
! PS,F {L(F ) � bLSn (F ) +
p4L(F )t} 3e�nt2
✏ pt2 � ln(2)/2n
! PS,F {|L(F )� bLSn (F )| �
p6L(F )t} 3
p2e�nt2
O (1/pn)÷O (1/n)
Chernoff and Bennett-type Bounds
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 20
[1] H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, The Annals of Mathematical Statistics 23 (4) (1952) 493–507.[2] A. Maurer, M. Pontil, Empirical bernstein bounds and sample variance penalization, in: Conference on Learning Theory, 2009.[3] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.
✏ r
t2 � ln(2)
2n
! PS,F
⇢L(F ) bLS
n (F ) +q
4bV Sn (F )t+
14nt2
3(n� 1)
� 3
p2e�nt2
✏ p
t2 � ln(3)/2n
! PS,F
⇢���L(F )� bLSn (F )
��� �q
4bV Sn (F )t+
14nt2
3(n� 1)
� 3
p3e�nt2
O (1/pn)÷O (1/n)
Clopper-Pearson (Binary Classification)
[1] C. J. Clopper, E. S. Pearson, The use of confidence or fiducial limits illustrated in the case of the binomial, Biometrika 26 (4) (1934) 404– 413.[2] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 21
✏ p
ln (1/�)/2n
! PS,F {L(F ) � Q[1� �;nbLSn (F ) + 1, n� nbLS
n (F )]} 3p�
✏ p
ln (1/2�)/2n
! PS,F {Q[�;nbLSn (F ), n� nbLS
n (F ) + 1] L(F )
Q[1� �;nbLSn (F ) + 1, n� nbLS
n (F )]} 3p2�
O (1/pn)÷O (1/n)
Clopper-Pearson (Regression)
Proof:
[1] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.[2] X. Chen, A link between binomial parameters and means of bounded random variables, arXiv preprint arXiv:0802.3946.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 22
P {h � u}=Eh,u {h � u}=Eh {Eu {h � u}}=Eh {h}
✏ p
ln (1/�)/2n
! PS,F
(L(F ) � Q
"1� �;
nX
i=1
[`(F ,Zi) � ui] + 1, n�nX
i=1
[`(F ,Zi) � ui]
#) 3
p�
✏ p
ln (1/2�)/2n
! PS,F
(Q
"�;
nX
i=1
[`(F ,Zi) � ui] , n�nX
i=1
[`(F ,Zi) � ui] + 1
# L(F )
Q
"1� �;
nX
i=1
[`(F ,Zi) � ui] + 1, n�nX
i=1
[`(F ,Zi) � ui]
#) 3
p2�
O (1/pn)÷O (1/n)
⇢P{u = ↵,↵ 2 [0, 1]} = 1P{u = ↵,↵ 62 [0, 1]} = 1
Example: DP Random Forest (RF) (I)www.openml.org
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 23
Abb. ID Name n d Abb. ID Name n d
D01 40 sonar 208 61 D02 59 ionosphere 351 35D03 785 wind correlations 45 47 D04 882 pollution 60 16D05 1104 leukemia 72 7130 D06 1446 CostaMadre1 296 38D07 1453 PieChart3 1077 38 D08 1458 arcene 200 10001D09 1485 madelon 2600 501 D10 1566 hill-valley 1212 101D11 37 diabetes 768 9 D12 1005 glass 214 10D13 1494 qsar-biodeg 1055 42 D14 1134 OVA Kidney 1545 10937D15 1217 Click prediction small 149639 12 D16 1149 AP Ovary Kidney 458 10937D17 907 chscase census4 400 8 D18 976 kdd JapaneseVowels 9961 15D19 1443 PizzaCutter1 61 38 D20 871 pollen 3848 6
Example: DP RF (II)• the Random Forests (RF): the original RF
formulation• the Random Rotation Ensembles (RRE): a recent
improvement over the original RF• the Random Decision Trees (RDT): a fully
random RF implementation which is faster to be trained• the Differentially Private RDT (DPRDT): a RDT
formulation which is also DP
[1] L. Breiman, Random forests, Machine learning 45 (1) (2001) 5–32.[2] R. Blaser, P. Fryzlewicz, Random rotation ensembles, Journal of Ma- chine Learning Research 17 (4) (2015) 1–15.[3] M. Bojarski, A. Choromanska, K. Choromanski, Y. LeCun, Differentially-and non-differentially-private random decision trees, in: arXiv preprint arXiv:1410.6973, 2014.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 24
Example: DP RF (III)
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 25
• kCV for RF, RRE, and RDT •DP for DPRDT
D01 D02 D03 D04 D05 D06 D07 D08 D09 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20Dataset
0
0.2
0.4
0.6
0.8
1
Gen
eral
izat
ion
Erro
r
nt = 50, nd = 30, k = 3
RFRRERDTDPRDT
Randomized Functions or Randomized Algorithms?For studying Randomized Algorithms we have different options•Hold out• Stability•DP
For studying Randomized Functions, instead we only have one powerful option• PAC-Bayes theory
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 26
PAC-Bayes Theory (I)
[1] Germain, P., et al. Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm. Journal of Machine Learning Research. 2015. [2] Germain, Pascal, et al. Risk bounds for the majority vote: From a PAC-Bayesian analysis to a learning algorithm. The Journal of Machine Learning Research 16.1 (2015): 787-860. [3] McAllester D. A. Some pac-bayesian theorems. Computational learning theory. 1998Langford J. Tutorial on practical prediction theory for classification.”Journal of machine learning research. 2005 42[4]Germain P. Lacasse A. Laviolette F. Marchand M. PAC-Bayesian learning of linear classifiers. International Conference on Machine Learning. 2009
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 27
⇡ : Prior over F⇢ : Posterior over FG⇢(X) : x ⇠ PX , f ⇠ ⇢, y = f(x)
B⇢(X) : x ⇠ PX , y = E⇢{f(X)}L(B⇢) 2L(G⇢)
PAC-Bayes Theory (II)
[1] Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The annals of mathematical statistics, 22(1), 79-86.[2] Bégin, L., Germain, P., Laviolette, F., & Roy, J. F. (2016). PAC-Bayesian Bounds based on the RényiDivergence. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (pp. 435-444).
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 28
KL(⇢||⇡) = E⇢
n
ln⇣ ⇢
⇡
⌘o
P
8<
:klhbLSn (GQ)||L(GQ)
i�KL+ ln
h2pn
�
i
n
9=
; �
O⇣p
ln(n)/n⌘÷O ( ln(n)/n)
Catoni’s Result: CatoniRandomized Function (CRF)
[1] O.Catoni.Pac-bayesian supervised classification: The thermodynamics of statistical learning. arXivpreprint arXiv:0712. 0248, 2007. [2] Lever G. Laviolette F. Shawe-Taylor J. Tighter PAC-Bayes bounds through distribution-dependent priors. Theoretical Computer Science. 2013[3] Oneto, Luca, Davide Anguita, and Sandro Ridella. PAC-bayesian analysis of distribution dependent priors: Tighter risk bounds and stability analysis. Pattern Recognition Letters 80 (2016): 200-207.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 29
⇡(f) =1
Zexp(��L(f))
⇢(f) =1
Z 0 exp(��bLsn(f))
KL(⇢||⇡) �2
2n+ �
vuut2 ln
h2pn
�
i
n, at: (1� �)
Why not a Catoni Randomized Algorithm (CRA)?Instead of building CDF based on the Catoni’s posterior we can think about a Randomized Algorithm which chooses, inside our space of function, the best function (the one with the smallest empirical error) perturbed with the Catoni’s noise (CRA).
[1] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 30
CRA is DP
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 31
P{A (s)=f}P{A (s)=f}
=e�
�n
Pni=1 `(f,zi)
Pf12F e�
�n
Pni=1 `(f1,zi)
Pf12F e�
�n (
Pni=1,i 6=j `(f1,zi)+`(f1,zj))
e��n (
Pni=1,i 6=j `(f,zi)+`(f,zj))
e0P
f12F e��n
Pni=1,i 6=j `(f1,zi)e�
�n
Pf12F e�
�n
Pni=1,i 6=j `(f1,zi)e0
e��n
= e2�n .
[1] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.
CRF and CRA Generalization Properties
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 32
� = 1/2p
n ln (3p2/2�)
! PS,F
⇢Q
�2
18;nbLS
n (F ), n� nbLSn (F ) + 1
� L(F )
Q
1� �2
18;nbLS
n (F ) + 1, n� nbLSn (F )
�� �
CRA
CRFO⇣
4p
ln(n)/n⌘÷O
⇣pln(n)/n
⌘
O (1/pn)÷O (1/n)
� = 1/2pn ln (3
p2/2�)
! PS
8>><
>>:kl[bLS
n (GQ)||L(GQ)]�ln
⇣3p2
�
⌘
8n+
vuut2 ln⇣
3p2
�
⌘ln⇣
2pn
�
⌘
4n+ln
⇣2pn
�
⌘
n
9>>=
>>;2�.
Example: CRF and CRA
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 33
Functions space counts of trees built with an hold out set.
D01 D02 D03 D04 D05 D06 D07 D08 D09 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20
Dataset
0
0.2
0.4
0.6
0.8
1
Ge
ne
raliza
tio
n E
rro
r
nt = 50, k = 3
CRC
CPD
CRFCRA
Non-Adaptive Data Analysis (I)NAS: the non-adaptive setting is the case when the procedures for building our models exploit just the training set.
[1] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 34
Algorithm 1: Union Bound for the NAS
Input: st, sh, and P1, · · ·,Pm
Output: bLshn (f1), · · ·, bLsh
n (fm)
1 for i 1 to m do
2 fi = Pi (st) and compute
bLshn (fi);
Non-Adaptive Data Analysis (II)NAS: the non-adaptive setting is the case when the procedures for building our models exploit just the training set.
Then we can use the Bonferroni Correction:
[1] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 35
PSh
8<
:9i 2 Im :���L(Pi(st))� bLSh
n (Pi(st))��� �
sln
�2m�
�
2n
9=
; �
O⇣p
ln(m)/n⌘÷O ( ln(m)/n)
Adaptive Data Analysis (I)AS: the adaptive setting is the case when the procedures for building our models exploit both the training set and the performance of the procedure at previous step over the hold out set.
[1] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 36
Algorithm 1: Hold out for the ASInput: st, sh, and P1, · · ·,Pm
Output: bLs1hn (f1), · · ·, bL
smhn (fm)
1 Split sh in sih with i 2 Im;
2 for i 1 to m do
3 fi = Pi
✓st, bL
s1hn (f1), · · ·, bL
si�1h
n (fi�1)
◆and compute
bLsihn (fi);
Adaptive Data Analysis (II)AS: the adaptive setting is the case when the procedures for building our models exploit both the training set and the performance of the procedure at previous step over the hold out set.
Then we need one test set at each step:
[1] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 37
fi = Pi(st,Pi�1, · · · ,P1)
PSih
8<
:9i 2 Im :���L(fi)� bLSi
hn (fi)
��� �
sm ln
�2�
�
2n
9=
; �
O⇣p
m/n⌘÷O (m/n)
ThresholdoutThe idea is to look at the test set error only when is far from the one on the training set, but, when we look at the error on the test set, we look at it in a “private way”.
[1] Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., Roth, A., 2015c. The reusable holdout: Preserving validity in adaptive data analysis. Science 349, 636–638.[2] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 38
Algorithm 1: Thresholdout for the AS
Input: st, sh, T,�, B, and P1, · · ·,Pm
Output: a1, · · ·, am1 � ⇠ Lap(2�), bT = T + �;2 for i 1 to m do3 if B < 1 then4 ai = ?;5 else6 fi = Pi(st, a1, · · ·, ai�1), ⌘ ⇠ Lap(4�);
7 if |bLshn (fi)� bLst
n (fi)| � bT + ⌘ then
8 ⇠ ⇠ Lap(�), � ⇠ Lap(2�), bT = T + �, B = B � 1;
9 ai = bLshn (fi) + ⇠;
10 else
11 ai = bLstn (fi);
Hoeffding-type BoundWe obtain a result which is analogous to the one of the NAS setting
[1] Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., Roth, A., 2015c. The reusable holdout: Preserving validity in adaptive data analysis. Science 349, 636–638.[2] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 39
PAi,F i
8>><
>>:9i 2 Im : Ai 6= ?^|Ai � L(F i)| � 40
vuutB ln⇣
12m�
⌘
n
9>>=
>>; �
O⇣p
ln(m)/n⌘
Chernoff-type BoundWe can obtain a sharp result (at least asymptotically).
[1] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 40
t = 40
v
u
u
t
B ln⇣
12m�
⌘
n
PAi,F i
n
9i 2 Im : Ai 6= ?^|Ai � L(F i)| � 30p
Ait+ 50t2o
�
O⇣p
ln(m)/n⌘÷O ( ln(m)/n)
Open Problems• Is privacy reducing our ability to learn something from
data?
• Can we improve the rate of convergence and the constants in the bounds?
• How can we exploit privacy to derive new learning algorithms?
• Can we improve the Thresholdout?
• How many times can we access the data without compromising the privacy?
24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 41