unige.it - Differential Privacy and Generalization: Sharper...

41
University of Genoa Polytechnic School and the School of Science Research and Technology Transfer Laboratory Differential Privacy and Generalization: Sharper Bounds, Theoretically Grounded Algorithms, and Thresholdout Luca Oneto www.lucaoneto.com [email protected] Summer School on Applied Harmonic Analysis, Genoa, Italy, 24th July 2017

Transcript of unige.it - Differential Privacy and Generalization: Sharper...

Page 1: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

UniversityofGenoa

PolytechnicSchoolandtheSchoolofScience

ResearchandTechnologyTransferLaboratory

Differential Privacy and Generalization: Sharper Bounds, Theoretically Grounded

Algorithms, and ThresholdoutLuca Oneto

[email protected]

Summer School on Applied Harmonic Analysis,Genoa, Italy, 24th July 2017

Page 2: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

SmartLab

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 2

D.AnguitaAssociateProf.

S.RidellaEmeritusProf.

L.OnetoAssistantProfessor

G.ClericoRes.Assistant

A.LulliPostdoc

M.CambiasoRes.Assistant

I.OrlandiPhDStud.

E.FumeoPhDStud.

F.CipolliniPhDStud.

P.SanettiRes.Assistant

Page 3: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Expertise

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 3

Scientific Research

• Topics• Neural Networks• Kernel Methods• Ensemble Methods• Statistical Learning

Theory• Machine Learning• Data Mining• High Performance

Computing• Big & Small Data

Analysis• CBM, EDM, HAR,

Sentiment Analysis, Cybercecurity

• ···

• Publications• > 50 International High

Ranked Journals• > 100 International High

Ranked Conferences• ···

Technology Transfer

• aizoOn S.r.l.• Ansaldo STS S.p.A.• Brembo S.p.A• Bombardier Transportation• Cetena S.p.A.• Damen Shipyards Group• Ferrari S.p.A. - Scuderia Ferrari• VarGroup

• ···

European Projects

• Basic Research• EC NeuroNet I & II -

Network of Excellence on Neural Networks

• EC RAIN - Redundant Array of Inexpensive Workstations for Neurocomputing

• EC EUNITE - European Network of Excellence on Information Technology for Smart Adaptive Systems

• EC-FET NiSIS - Nature-inspired Smart Information Systems

• ···

• Applied Research• EC-H2020 IN2DREAMS • EC-H2020 In2Rail• EC-FP7 MAXBE• ···

Page 4: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Privacy• In the last years researchers have studied many

ways to access data in a private way (aggregate, noise, etc.)

• Privacy is a bad thing from a data scientist point of view (we cannot access data if not aggregate, etc.)

• The breakthrough was to find a way to exploit privacy as a new regularization method and as a tool for better assessing the generalization performances of a learning algorithm

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 4

Page 5: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Supervised Learning

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] - [email protected] 5

The only things available for learning is a set of examples of the mapping.[1] Vapnik, V.N., 1998. Statistical learning theory. Wiley New York.

x ⇠ PX , y = f(x)x ⇠ PX , y = f(x)

Page 6: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Deterministic Functions/Learning AlgorithmsGiven a set of data the algorithm always returns the same model.

The model is a function chosen in a set of functions: given the function and a point, the predicted output is always the same.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 6

x ⇠ PX , y = f(x)

f = A (s)

Page 7: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Randomized Functions

Given a set of data the algorithm always returns the same model.

The model is a distribution over a set of functions. Given the model and a point, the predicted output may be different.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 7

⇢ A (s)

x ⇠ PX , f ⇠ ⇢, y = f(x)

Page 8: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Randomized Learning AlgorithmsGiven a set of data the algorithm may return different models.

The model can be a deterministic or randomized function. In our case the function is deterministic.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 8

x ⇠ PX , y = F (x)

F = A (s)

Page 9: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Notation

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 9

[1] Oneto, Luca, Sandro Ridella, and Davide Anguita. "Differential privacy and generalization: Sharper bounds with applications." Pattern Recognition Letters 89 (2017): 31-38.[2] Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., Roth, A., 2015b.Preserving statistical validity in adaptive data analysis, in: Annual ACM Symposium on Theory of Computing.

x 2 X , y 2 Y, z 2 Z = X ⇥ Y, PX , PY , PZ

Zn = S 3 s = {z1, · · ·, zn} = {(x1, y1), · · ·, (xn, yn)} i.i.d. PS

Z 3 Z ⇠ PZ , S 3 S ⇠ PS

s = {z1, · · ·, zi�1, zi, zi+1, · · ·, zn} zi i.i.d. zi

S ✓ Sf : X ! Y, f 2 FF ✓ FA : S ! F , PA

D : F ! S` : F ⇥ Z ! [0, 1]

L(f) = EZ`(f,Z), V (f) = EZ [`(f,Z)� L(f)]2

bL

sn(f) = 1

/n

nX

i=1

`(f, zi), bV

sn (f) = 1

/n(n � 1)

nX

i=1

nX

j=i+1

[`(f, zi)� `(f, zj)]2

Page 10: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Goal

Estimate the true (generalization) error of the model based on the empirical data

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 10

P{|L(f)� bLn(f)| � ✏} �

|L(f)� bLn(f)| ✏, @(1� �)

� $ ✏

Page 11: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Differentially Private (DP) Randomized Learning Algorithms

A Randomized Learning Algorithm is-DP if

[1] Dwork, C., Roth, A., 2014. The algorithmic foundations of differential privacy. Foundations and Trends in Theoretical Computer Science 9, 1–277.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 11

PA

n

A (s) 2 Fo

e✏PA

n

A (s) 2 Fo

+ �

(✏, �)

8F ✓ F , 8s, s 2 S

Page 12: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Differentially Private (DP) Randomized Learning Algorithms

A Randomized Learning Algorithm is-DP if

Proof:

[1] Oneto, Luca, Sandro Ridella, and Davide Anguita. "Differential privacy and generalization: Sharper bounds with applications." Pattern Recognition Letters 89 (2017): 31-38.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 12

PA

n

A (s) 2 Fo

=R

FPA {A (s) = f}df

R

Fe✏PA {A (s) = f}df = e✏PA

n

A (s) 2 Fo

PA {A (s) = f}PA {A (s) = f} e✏, 8f 2 F , 8s, s 2 S

Page 13: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Hold Out, Compression, Complexity, Stability, and… Privacy (I)The first option to estimate the generalization performance of an algorithm is to split the data

[1] Hoeffding, Wassily. "Probability inequalities for sums of bounded random variables." Journal of the American statistical association 58.301 (1963): 13-30.[2] Anguita D, Ghio A, Oneto L, Ridella S. In-sample and out-of-sample model selection and error estimation for support vector machines. IEEE Transactions on Neural Networks and Learning Systems. 2012 Sep;23(9):1390-406.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 13

s = s1 [ s2, s1 \ s2 = ↵

PS2

n

L(A (s1))� bLS2

|S2|(A (s1)) � to

e�2|s2|t2

Page 14: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Hold Out, Compression, Complexity, Stability, and… Privacy (II)Another option is to check how much the algorithm compresses the data

[1] Floyd S, Warmuth M. Sample compression, learnability, and the Vapnik-Chervonenkis dimension. Machine learning. 1995 Dec 1;21(3):269-304.[2] Langford J, McAllester D. Computable shell decomposition bounds. Journal of Machine Learning Research. 2004;5(May):529-47.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 14

s0 ✓ s

PS

n

L(A (S))� bLSn (A (S)) � t

o

n

n

|s0|

e�2nt2

Page 15: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Hold Out, Compression, Complexity, Stability, and… Privacy (III)Another option is to check how large is the function space in which the algorithm chooses the solution

[1] V. N. Vapnik, Statistical learning theory, Wiley-Interscience, 1998.[2] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Global rademacher com- plexity bounds: From slow to fast convergence rates, Neural Processing Letters 43 (2) (2015) 567–602.[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademacher complex- ity: Sharper risk bounds with and without unlabeled samples, Neural Networks 65 (2015) 115–125.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 15

C(F) : ndV C(F), enR2(F)

PS

n

L(A (S))� bLSn (A (S)) � t

o

c2C(F)e�c1nt2

PS

n

L(A (S))� bLSn (A (S)) � t

o

c2C⇣n

f : f 2 F , bLsn(f) c3

o⌘

e�c1nt2

c3(L, C, t, n)

Page 16: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Hold Out, Compression, Complexity, Stability, and… Privacy (IV)

Another way is to check how close the functions chosen by the algorithm are

[1] O. Bousquet, A. Elisseeff, Stability and generalization, The Journal of Machine Learning Research 2 (2002) 499–526.[2] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Fully empirical and data- dependent stability-based bounds, IEEE Transactions on Cybernetics 45 (9) (2015) 1913–1926.[3] Maurer A. A Second-order Look at Stability and Generalization. InConference on Learning Theory 2017 Jun 18 (pp. 1461-1475).

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 16

|` (A (s), ·)� `(A (s), ·)|1 �

PS

n

L(A (S))� bLSn (A (S)) � t

o

c2en�2�c1nt

2

Page 17: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

DP Main Result

Proof: rather technical…

[1] C. Dwork, V. Feldman, M. Hardt, T. Pitassi, O. Reingold, A. Roth, Preserving statistical validityin adaptive data analysis, in: Symposium on Theory of Computing, 2015.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 17

If PS{S 2 D(f)} �

8f 2 F and ✏ p

ln (1/�)/2n

! PS,F {S 2 D(F )} 3p�

Page 18: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Hoeffding-type Bounds

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 18

Proof:

[1] Hoeffding, Wassily. Probability inequalities for sums of bounded random variables. Journal of the American statistical association 58.301 (1963): 13-30.[2] C. Dwork, V. Feldman, M. Hardt, T. Pitassi, O. Reingold, A. Roth, Preserving statistical validity in adaptive data analysis, in: Symposium on Theory of Computing, 2015.[3] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.

✏ t

! PS,F {L(F ) � bLSn (F ) + t} 3e�nt2

✏ p

t2 � ln(2)/2n

! PS,F {|L(F )� bLSn (F )| � t} 3

p2e�nt2

PS{L(f)� bLSn (f) � t} e�2nt2

D(f) = {s 2 S : L(f)� bLSn (f) > t}

� = e�2nt2

O (1/pn)

Page 19: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Chernoff and Bennett-type Bounds

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 19

[1] H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, The Annals of Mathematical Statistics 23 (4) (1952) 493–507.[2] A. Maurer, M. Pontil, Empirical bernstein bounds and sample variance penalization, in: Conference on Learning Theory, 2009.[3] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.

✏ t

! PS,F {L(F ) � bLSn (F ) +

p4L(F )t} 3e�nt2

✏ pt2 � ln(2)/2n

! PS,F {|L(F )� bLSn (F )| �

p6L(F )t} 3

p2e�nt2

O (1/pn)÷O (1/n)

Page 20: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Chernoff and Bennett-type Bounds

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 20

[1] H. Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, The Annals of Mathematical Statistics 23 (4) (1952) 493–507.[2] A. Maurer, M. Pontil, Empirical bernstein bounds and sample variance penalization, in: Conference on Learning Theory, 2009.[3] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.

✏ r

t2 � ln(2)

2n

! PS,F

⇢L(F ) bLS

n (F ) +q

4bV Sn (F )t+

14nt2

3(n� 1)

� 3

p2e�nt2

✏ p

t2 � ln(3)/2n

! PS,F

⇢���L(F )� bLSn (F )

��� �q

4bV Sn (F )t+

14nt2

3(n� 1)

� 3

p3e�nt2

O (1/pn)÷O (1/n)

Page 21: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Clopper-Pearson (Binary Classification)

[1] C. J. Clopper, E. S. Pearson, The use of confidence or fiducial limits illustrated in the case of the binomial, Biometrika 26 (4) (1934) 404– 413.[2] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 21

✏ p

ln (1/�)/2n

! PS,F {L(F ) � Q[1� �;nbLSn (F ) + 1, n� nbLS

n (F )]} 3p�

✏ p

ln (1/2�)/2n

! PS,F {Q[�;nbLSn (F ), n� nbLS

n (F ) + 1] L(F )

Q[1� �;nbLSn (F ) + 1, n� nbLS

n (F )]} 3p2�

O (1/pn)÷O (1/n)

Page 22: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Clopper-Pearson (Regression)

Proof:

[1] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.[2] X. Chen, A link between binomial parameters and means of bounded random variables, arXiv preprint arXiv:0802.3946.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 22

P {h � u}=Eh,u {h � u}=Eh {Eu {h � u}}=Eh {h}

✏ p

ln (1/�)/2n

! PS,F

(L(F ) � Q

"1� �;

nX

i=1

[`(F ,Zi) � ui] + 1, n�nX

i=1

[`(F ,Zi) � ui]

#) 3

p�

✏ p

ln (1/2�)/2n

! PS,F

(Q

"�;

nX

i=1

[`(F ,Zi) � ui] , n�nX

i=1

[`(F ,Zi) � ui] + 1

# L(F )

Q

"1� �;

nX

i=1

[`(F ,Zi) � ui] + 1, n�nX

i=1

[`(F ,Zi) � ui]

#) 3

p2�

O (1/pn)÷O (1/n)

⇢P{u = ↵,↵ 2 [0, 1]} = 1P{u = ↵,↵ 62 [0, 1]} = 1

Page 23: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Example: DP Random Forest (RF) (I)www.openml.org

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 23

Abb. ID Name n d Abb. ID Name n d

D01 40 sonar 208 61 D02 59 ionosphere 351 35D03 785 wind correlations 45 47 D04 882 pollution 60 16D05 1104 leukemia 72 7130 D06 1446 CostaMadre1 296 38D07 1453 PieChart3 1077 38 D08 1458 arcene 200 10001D09 1485 madelon 2600 501 D10 1566 hill-valley 1212 101D11 37 diabetes 768 9 D12 1005 glass 214 10D13 1494 qsar-biodeg 1055 42 D14 1134 OVA Kidney 1545 10937D15 1217 Click prediction small 149639 12 D16 1149 AP Ovary Kidney 458 10937D17 907 chscase census4 400 8 D18 976 kdd JapaneseVowels 9961 15D19 1443 PizzaCutter1 61 38 D20 871 pollen 3848 6

Page 24: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Example: DP RF (II)• the Random Forests (RF): the original RF

formulation• the Random Rotation Ensembles (RRE): a recent

improvement over the original RF• the Random Decision Trees (RDT): a fully

random RF implementation which is faster to be trained• the Differentially Private RDT (DPRDT): a RDT

formulation which is also DP

[1] L. Breiman, Random forests, Machine learning 45 (1) (2001) 5–32.[2] R. Blaser, P. Fryzlewicz, Random rotation ensembles, Journal of Ma- chine Learning Research 17 (4) (2015) 1–15.[3] M. Bojarski, A. Choromanska, K. Choromanski, Y. LeCun, Differentially-and non-differentially-private random decision trees, in: arXiv preprint arXiv:1410.6973, 2014.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 24

Page 25: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Example: DP RF (III)

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 25

• kCV for RF, RRE, and RDT •DP for DPRDT

D01 D02 D03 D04 D05 D06 D07 D08 D09 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20Dataset

0

0.2

0.4

0.6

0.8

1

Gen

eral

izat

ion

Erro

r

nt = 50, nd = 30, k = 3

RFRRERDTDPRDT

Page 26: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Randomized Functions or Randomized Algorithms?For studying Randomized Algorithms we have different options•Hold out• Stability•DP

For studying Randomized Functions, instead we only have one powerful option• PAC-Bayes theory

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 26

Page 27: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

PAC-Bayes Theory (I)

[1] Germain, P., et al. Risk Bounds for the Majority Vote: From a PAC-Bayesian Analysis to a Learning Algorithm. Journal of Machine Learning Research. 2015. [2] Germain, Pascal, et al. Risk bounds for the majority vote: From a PAC-Bayesian analysis to a learning algorithm. The Journal of Machine Learning Research 16.1 (2015): 787-860. [3] McAllester D. A. Some pac-bayesian theorems. Computational learning theory. 1998Langford J. Tutorial on practical prediction theory for classification.”Journal of machine learning research. 2005 42[4]Germain P. Lacasse A. Laviolette F. Marchand M. PAC-Bayesian learning of linear classifiers. International Conference on Machine Learning. 2009

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 27

⇡ : Prior over F⇢ : Posterior over FG⇢(X) : x ⇠ PX , f ⇠ ⇢, y = f(x)

B⇢(X) : x ⇠ PX , y = E⇢{f(X)}L(B⇢) 2L(G⇢)

Page 28: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

PAC-Bayes Theory (II)

[1] Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The annals of mathematical statistics, 22(1), 79-86.[2] Bégin, L., Germain, P., Laviolette, F., & Roy, J. F. (2016). PAC-Bayesian Bounds based on the RényiDivergence. In Proceedings of the 19th International Conference on Artificial Intelligence and Statistics (pp. 435-444).

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 28

KL(⇢||⇡) = E⇢

n

ln⇣ ⇢

⌘o

P

8<

:klhbLSn (GQ)||L(GQ)

i�KL+ ln

h2pn

i

n

9=

; �

O⇣p

ln(n)/n⌘÷O ( ln(n)/n)

Page 29: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Catoni’s Result: CatoniRandomized Function (CRF)

[1] O.Catoni.Pac-bayesian supervised classification: The thermodynamics of statistical learning. arXivpreprint arXiv:0712. 0248, 2007. [2] Lever G. Laviolette F. Shawe-Taylor J. Tighter PAC-Bayes bounds through distribution-dependent priors. Theoretical Computer Science. 2013[3] Oneto, Luca, Davide Anguita, and Sandro Ridella. PAC-bayesian analysis of distribution dependent priors: Tighter risk bounds and stability analysis. Pattern Recognition Letters 80 (2016): 200-207.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 29

⇡(f) =1

Zexp(��L(f))

⇢(f) =1

Z 0 exp(��bLsn(f))

KL(⇢||⇡) �2

2n+ �

vuut2 ln

h2pn

i

n, at: (1� �)

Page 30: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Why not a Catoni Randomized Algorithm (CRA)?Instead of building CDF based on the Catoni’s posterior we can think about a Randomized Algorithm which chooses, inside our space of function, the best function (the one with the smallest empirical error) perturbed with the Catoni’s noise (CRA).

[1] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 30

Page 31: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

CRA is DP

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 31

P{A (s)=f}P{A (s)=f}

=e�

�n

Pni=1 `(f,zi)

Pf12F e�

�n

Pni=1 `(f1,zi)

Pf12F e�

�n (

Pni=1,i 6=j `(f1,zi)+`(f1,zj))

e��n (

Pni=1,i 6=j `(f,zi)+`(f,zj))

e0P

f12F e��n

Pni=1,i 6=j `(f1,zi)e�

�n

Pf12F e�

�n

Pni=1,i 6=j `(f1,zi)e0

e��n

= e2�n .

[1] L. Oneto, S. Ridella, D. Anguita, Differential privacy and generalization: Sharper bounds with applications, Pattern Recognition Letters 89 (2017) 31–38.

Page 32: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

CRF and CRA Generalization Properties

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 32

� = 1/2p

n ln (3p2/2�)

! PS,F

⇢Q

�2

18;nbLS

n (F ), n� nbLSn (F ) + 1

� L(F )

Q

1� �2

18;nbLS

n (F ) + 1, n� nbLSn (F )

�� �

CRA

CRFO⇣

4p

ln(n)/n⌘÷O

⇣pln(n)/n

O (1/pn)÷O (1/n)

� = 1/2pn ln (3

p2/2�)

! PS

8>><

>>:kl[bLS

n (GQ)||L(GQ)]�ln

⇣3p2

8n+

vuut2 ln⇣

3p2

⌘ln⇣

2pn

4n+ln

⇣2pn

n

9>>=

>>;2�.

Page 33: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Example: CRF and CRA

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 33

Functions space counts of trees built with an hold out set.

D01 D02 D03 D04 D05 D06 D07 D08 D09 D10 D11 D12 D13 D14 D15 D16 D17 D18 D19 D20

Dataset

0

0.2

0.4

0.6

0.8

1

Ge

ne

raliza

tio

n E

rro

r

nt = 50, k = 3

CRC

CPD

CRFCRA

Page 34: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Non-Adaptive Data Analysis (I)NAS: the non-adaptive setting is the case when the procedures for building our models exploit just the training set.

[1] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 34

Algorithm 1: Union Bound for the NAS

Input: st, sh, and P1, · · ·,Pm

Output: bLshn (f1), · · ·, bLsh

n (fm)

1 for i 1 to m do

2 fi = Pi (st) and compute

bLshn (fi);

Page 35: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Non-Adaptive Data Analysis (II)NAS: the non-adaptive setting is the case when the procedures for building our models exploit just the training set.

Then we can use the Bonferroni Correction:

[1] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 35

PSh

8<

:9i 2 Im :���L(Pi(st))� bLSh

n (Pi(st))��� �

sln

�2m�

2n

9=

; �

O⇣p

ln(m)/n⌘÷O ( ln(m)/n)

Page 36: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Adaptive Data Analysis (I)AS: the adaptive setting is the case when the procedures for building our models exploit both the training set and the performance of the procedure at previous step over the hold out set.

[1] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 36

Algorithm 1: Hold out for the ASInput: st, sh, and P1, · · ·,Pm

Output: bLs1hn (f1), · · ·, bL

smhn (fm)

1 Split sh in sih with i 2 Im;

2 for i 1 to m do

3 fi = Pi

✓st, bL

s1hn (f1), · · ·, bL

si�1h

n (fi�1)

◆and compute

bLsihn (fi);

Page 37: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Adaptive Data Analysis (II)AS: the adaptive setting is the case when the procedures for building our models exploit both the training set and the performance of the procedure at previous step over the hold out set.

Then we need one test set at each step:

[1] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 37

fi = Pi(st,Pi�1, · · · ,P1)

PSih

8<

:9i 2 Im :���L(fi)� bLSi

hn (fi)

��� �

sm ln

�2�

2n

9=

; �

O⇣p

m/n⌘÷O (m/n)

Page 38: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

ThresholdoutThe idea is to look at the test set error only when is far from the one on the training set, but, when we look at the error on the test set, we look at it in a “private way”.

[1] Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., Roth, A., 2015c. The reusable holdout: Preserving validity in adaptive data analysis. Science 349, 636–638.[2] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 38

Algorithm 1: Thresholdout for the AS

Input: st, sh, T,�, B, and P1, · · ·,Pm

Output: a1, · · ·, am1 � ⇠ Lap(2�), bT = T + �;2 for i 1 to m do3 if B < 1 then4 ai = ?;5 else6 fi = Pi(st, a1, · · ·, ai�1), ⌘ ⇠ Lap(4�);

7 if |bLshn (fi)� bLst

n (fi)| � bT + ⌘ then

8 ⇠ ⇠ Lap(�), � ⇠ Lap(2�), bT = T + �, B = B � 1;

9 ai = bLshn (fi) + ⇠;

10 else

11 ai = bLstn (fi);

Page 39: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Hoeffding-type BoundWe obtain a result which is analogous to the one of the NAS setting

[1] Dwork, C., Feldman, V., Hardt, M., Pitassi, T., Reingold, O., Roth, A., 2015c. The reusable holdout: Preserving validity in adaptive data analysis. Science 349, 636–638.[2] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 39

PAi,F i

8>><

>>:9i 2 Im : Ai 6= ?^|Ai � L(F i)| � 40

vuutB ln⇣

12m�

n

9>>=

>>; �

O⇣p

ln(m)/n⌘

Page 40: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Chernoff-type BoundWe can obtain a sharp result (at least asymptotically).

[1] Oneto L, Ridella S, Anguita D. Differential privacy and generalization: Sharper bounds with applications. Pattern Recognition Letters. 2017 Apr 1;89:31-8.

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 40

t = 40

v

u

u

t

B ln⇣

12m�

n

PAi,F i

n

9i 2 Im : Ai 6= ?^|Ai � L(F i)| � 30p

Ait+ 50t2o

O⇣p

ln(m)/n⌘÷O ( ln(m)/n)

Page 41: unige.it - Differential Privacy and Generalization: Sharper ...anarm.dima.unige.it/genova2017/files/oneto.pdf[3] L. Oneto, A. Ghio, S. Ridella, D. Anguita, Local rademachercomplex-ity:

Open Problems• Is privacy reducing our ability to learn something from

data?

• Can we improve the rate of convergence and the constants in the bounds?

• How can we exploit privacy to derive new learning algorithms?

• Can we improve the Thresholdout?

• How many times can we access the data without compromising the privacy?

24 July 2017 University of Genoa – DIBRIS – SmartLabwww.lucaoneto.com - [email protected] 41