Correlation Immune Functions and Learning
-
Upload
mariko-burke -
Category
Documents
-
view
20 -
download
0
description
Transcript of Correlation Immune Functions and Learning
![Page 1: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/1.jpg)
Correlation Immune Functions and Learning
Lisa Hellerstein
Polytechnic Institute of NYU
Brooklyn, NY
Includes joint work with Bernard Rosell (AT&T), Eric Bach and David Page (U. of Wisconsin), and Soumya Ray (Case
Western)
![Page 2: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/2.jpg)
2
Identifying relevant variables from random examples
107210...21 ),,,( xxxxxxf
x f(x)(1,1,0,0,0,1,1,0,1,0) 1(0,1,0,0,1,0,1,1,0,1) 1(1,0,0,1,0,1,0,0,1,0) 0
![Page 3: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/3.jpg)
3
Technicalities
• Assume random examples drawn from uniform distribution over {0,1}n
• Have access to source of random examples
![Page 4: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/4.jpg)
4
Detecting that a variable is relevant
• Look for dependence between input variables and output
If xi irrelevant P(f=1|xi=1) = P(f=1|xi=0)
If xi relevant P(f=1|xi=1) ≠ P(f=1|xi=0)
for previous function f
![Page 5: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/5.jpg)
5
Unfortunately…
),,(parity),,,( 107210...21 xxxxxxf
xi relevant P(f=1|xi=1) = 1/2 = P(f=1|xi=0) xi irrelevant P(f=1|xi=1) = 1/2 = P(f=1|xi=0)
Finding a relevant variable easy for some functions.
Not so easy for others.
![Page 6: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/6.jpg)
6
How to find the relevant variables
• Suppose you know r (# of relevant vars)Assume r << n
(Think of r = log n)
• Get m random examples, where
m = poly(2r ,log n,1/δ)
• With probability > 1-δ, have enough info to determine which r variables are relevant– All other sets of r variables can be ruled out
![Page 7: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/7.jpg)
7
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 f(1, 1, 0, 1, 1, 0, 1, 0, 1, 0) 1(0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0(1, 1, 1, 0, 0, 0, 0, 0, 0, 0) 1(0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 0(1, 1, 1, 0, 0, 0, 1, 1, 1, 1) 0
![Page 8: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/8.jpg)
8
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 f(1, 1, 0, 1, 1, 0, 1, 0, 1, 0) 1(0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0(1, 1, 1, 0, 0, 0, 0, 0, 0, 0) 1(0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 0(1, 1, 1, 0, 0, 0, 1, 1, 0, 1) 0
![Page 9: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/9.jpg)
9
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 f(1, 1, 0, 1, 1, 0, 1, 0, 1, 0) 1(0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0(1, 1, 1, 0, 0, 0, 0, 0, 0, 0) 1(0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 0(1, 1, 1, 0, 0, 0, 1, 1, 0, 1) 0
x3, x5, x9 can’t be the relevant variables
![Page 10: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/10.jpg)
10
x1 x2 x3 x4 x5 x6 x7 x8 x9 x10 f(1, 1, 0, 1, 1, 0, 1, 0, 1, 0) 1(0, 1, 1, 1, 1, 0, 1, 1, 0, 0) 0(1, 1, 1, 0, 0, 0, 0, 0, 0, 0) 1(0, 0, 0, 1, 1, 0, 0, 0, 0, 0) 0(1, 1, 1, 0, 0, 0, 1, 1, 1, 1) 0
x1, x3, x10 ok
![Page 11: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/11.jpg)
11
• Naïve algorithm: Try all combinations of r variables. Time ≈ nr
• Mossel, O’Donnell, Servedio [STOC 2003]– Algorithm that takes time ≈ ncr where c ≈ .704– Subroutine: Find a single relevant variable
Still open: Can this bound be improved?
![Page 12: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/12.jpg)
12
If output of f is dependent on xi, can detect dependence (whp) in time poly(n, 2r) and identify xi as relevant.
Problematic Functions
Every variable is independent of output of f P[f=1|xi=0] = P[f=1|xi=1] for all xi
Equivalently, all degree 1 Fourier coeffs = 0
Functions with this property said to be CORRELATION-IMMUNE
![Page 13: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/13.jpg)
13
P[f=1|xi=0] = P[f=1|xi=1] for all xi
Geometrically:
00 01
10 11
e.g. n=2
![Page 14: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/14.jpg)
14
P[f=1|xi=0] = P[f=1|xi=1] for all xi
Geometrically:
00 01
10 111
10
0
Parity(x1,x2)
![Page 15: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/15.jpg)
15
P[f=1|xi=0] = P[f=1|xi=1] for all xi
Geometrically:
00 01
10 111
10
0
X1=1
X1=0
![Page 16: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/16.jpg)
16
P[f=1|xi=0] = P[f=1|xi=1] for all xi
00 01
10 111
10
0X2=0 X2=1
![Page 17: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/17.jpg)
17
• Other correlation-immune functions besides parity?– f(x1,…,xn) = 1 iff x1 = x2 = … = xn
![Page 18: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/18.jpg)
18
• Other correlation-immune functions besides parity?– All reflexive functions
xallfor )xf(f(x)
![Page 19: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/19.jpg)
19
• Other correlation-immune functions besides parity?– All reflexive functions
– More…
xallfor )xf(f(x)
![Page 20: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/20.jpg)
20
Correlation-immune functions and decision tree learners
• Decision tree learners in ML– Popular machine learning approach (CART, C4.5)– Given set of examples of Boolean function, build a
decision tree
• Heuristics for decision tree learning– Greedy, top-down– Differ in way choose which variable to put in node– Pick variable having highest “gain”– P[f=1|xi=1] = P[f=1|xi=0] means 0 gain
• Correlation-immune functions problematic for decision tree learners
![Page 21: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/21.jpg)
21
• Lookahead
• Skewing: An efficient alternative to lookahead for decision tree induction. IJCAI 2003 [Page, Ray]
• Why skewing works: learning difficult Boolean functions with greedy tree learners. ICML 2005 [Rosell, Hellerstein, Ray, Page]
![Page 22: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/22.jpg)
22
StoryPart One
![Page 23: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/23.jpg)
23
• How many difficult functions?
• More than
n
# fns
0 1 2 3 4 5
2 2 4 18 648 3140062
n-1 22
![Page 24: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/24.jpg)
24
• How many different hard functions?
• More than
SOMEONE MUST HAVE STUDIED THESE FUNCTIONS BEFORE…
n
# fns
0 1 2 3 4 5
2 2 4 18 648 3140062
n/2 22
![Page 25: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/25.jpg)
25
![Page 26: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/26.jpg)
26
![Page 27: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/27.jpg)
27
StoryPart Two
![Page 28: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/28.jpg)
28
• I had lunch with Eric Bach
![Page 29: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/29.jpg)
29
Roy, B. K. 2002. A Brief Outline of Research on Correlation Immune Functions. In Proceedings of the 7th Australian Conference on information Security and Privacy (July 03 - 05, 2002). L. M. Batten and J. Seberry, Eds. Lecture Notes In Computer Science, vol. 2384. Springer-Verlag, London, 379-394.
![Page 30: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/30.jpg)
30
Correlation-immune functions• k-correlation immune function
– For every subset S of the input variables s.t. 1 ≤ |S| ≤ k
P[f | S] = P[f]– [Xiao, Massey 1988] Equivalently, all Fourier
coefficients of degree i are 0, for 1 ≤ i ≤ k
![Page 31: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/31.jpg)
31
Siegenthaler’s Theorem
If f is k-correlation immune, then the GF[2] polynomial for f has degree at most n-k.
![Page 32: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/32.jpg)
32
Siegenthaler’s Theorem [1984]
If f is k-correlation immune, then the GF[2] polynomial for f has degree at most n-k.
Algorithm of Mossel, O’Donnell, Servedio [STOC 2003] based on this theorem
![Page 33: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/33.jpg)
33
End of Story
![Page 34: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/34.jpg)
34
Non-uniform distributions• Correlation-immune functions are defined
wrt the uniform distribution
• What if distribution is biased?
e.g. each bit 1 with probability ¾
![Page 35: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/35.jpg)
35
f(x1,x2) = parity(x1,x2)each bit 1 with probability 3/4
x parity(x) P[x]
00 0 1/16
01 1 3/16
10 1 3/16
11 0 9/16
P[f=1|x1=1] ≠ P[f=1|x1=0]
![Page 36: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/36.jpg)
36
f(x1,x2) = parity(x1,x2)p=1 with probability 1/4
x parity(x) P[x]
00 0 1/16
01 1 3/16
10 1 3/16
11 0 9/16
P[f=1|x1=1] ≠ P[f=1|x1=0]
For added irrelevant variables, would be equal
![Page 37: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/37.jpg)
37
Correlation-immunity wrt p-biased distributions
Definitions• f is correlation-immune wrt distribution D if PD[f=1|xi=1] = PD[f=1|xi=0]for all xi• p-biased distribution Dp: each bit set to 1
independently with probability p– For all p-biased distributions D,
PD[f=1|xi=1] = PD[f=1|xi=0] for all irrelevant xi
![Page 38: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/38.jpg)
38
Lemma: Let f(x1,…,xn) be a Boolean function with r relevant variables. Then f is correlation immune w.r.t. Dp for at most r-1 values of p.
Pf: Correlation immune wrt Dp means P[f=1|xi=1] – P[f=1|xi=0] = 0 (*)for all xi.Consider fixed f and xi. Can write lhs of (*)as polynomial h(p).
![Page 39: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/39.jpg)
39
• e.g. f(x1,x2, x3) = parity(x1,x2, x3)p-biased distribution Dph(p) = PDp[f=1|x1=1] - PDp[f=1|x1=0] =
( p2 + p(1-p) ) – ( p(1-p) + (1-p)p )
If add irrelevant variable, this polynomial doesn’t change
• h(p) for arbitrary f, variable xi, has degree <= r-1, where r is number of variables.
• f correlation-immune wrt at most r-1 values of p, unless h(p) identically 0 for all xi.
![Page 40: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/40.jpg)
40
h(p) = PDp[f=1|xi=1] -PDp[f=1|xi=0]
where wd is number of inputs x for which f(x)=1, xi=1, and x contains exactly d additional 1’s.
i.e. wd = number of positive assignments of fxi<-1 of Hamming weight d
• Similar expression for PDp[f=1|xi=0]
1n
0d
d1nddiDp p)(1pw1]x|1[fP
![Page 41: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/41.jpg)
41
PDp[f=1|xi=1] - PDp[f=1|xi=0] =
where wd = number of positive assignments of fxi<-1 of Hamming weight d
rd = number of positive assignments of fxi<-0 of Hamming weight d
Not identically 0 iff wd ≠ rd for some d
1
0
1)1()(]1|1[n
d
dndddiDp pprwxfP
![Page 42: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/42.jpg)
42
Property of Boolean functions
Lemma: If f has at least one relevant variable, then for some relevant variable xi, and some d,
wd ≠ rd for some dwhere
wd = number of positive assignments of fxi<-1 of Hamming weight d
rd = number of positive assignments of fxi<-0 of Hamming weight d
![Page 43: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/43.jpg)
43
How much does it help to have access to examples from different distributions?
![Page 44: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/44.jpg)
44
How much does it help to have access to examples from different distributions?
Hellerstein, Rosell, Bach, Page, Ray
Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune Functions
Exploiting Product Distributions to Identify Relevant Variables of Correlation Immune
Functions [Hellerstein, Rosell, Bach, Ray, Page]
![Page 45: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/45.jpg)
45
• Even if f is not correlation-immune wrt Dp, may need very large sample to detect relevant variable– if value of p very near root of h(p)
• Lemma: If h(p) not identically 0, then for some value of p in the set
{ 1/(r+1),2/(r+1),3/(r+1)…, (r+1)/(r+1) },
h(p) ≥ 1/(r+1)r-1
![Page 46: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/46.jpg)
46
• Algorithm to find a relevant variable– Uses examples from distributions Dp, for
p = 1/(r+1),2/(r+1),3/(r+1)…, (r+1)/(r+1)– sample size poly((r+1) r, log n, log 1/δ)
[Essentially same algorithm found independently by Arpe and Mossel, using very different techniques]
• Another algorithm to find a relevant variable– Based on proving (roughly) that if choose random p,
then h2(p) likely to be reasonably large. Uses prime number theorem.
– Uses examples from poly(2r, log 1/ δ) distributions Dp.– Sample size poly(2r, log n, log 1/ δ)
![Page 47: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/47.jpg)
47
Better algorithms?
![Page 48: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/48.jpg)
48
Summary
• Finding relevant variables (junta-learning)
• Correlation-immune functions
• Learning from p-biased distributions
![Page 49: Correlation Immune Functions and Learning](https://reader035.fdocuments.in/reader035/viewer/2022062321/56812ddb550346895d932a0e/html5/thumbnails/49.jpg)
49
Moral of the Story
• Handbook of integer sequences can be useful in doing literature search
• Eating lunch with the right person can be much more useful