Uniform Approximation of Functions with Random...

19
Uniform Approximation of Functions with Random Bases Ali Rahimi Intel Research Ben Recht UW Madison

Transcript of Uniform Approximation of Functions with Random...

Page 1: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Uniform Approximation of Functions with Random Bases

Ali Rahimi Intel Research

Ben Recht UW Madison

Page 2: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

•  Goal: Find a class F which is easy to search over, but can approximate complex behavior.

dictated by application Which space of functions?

classes covariate

state

Typically a list of

example inputs

Page 3: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Approximation Schemes

•  Approximate by

•  Jones (1992),

•  Barron (1993),

•  Girosi & Anzellotti (1995),

•  Using nearly identical analysis, all of these schemes achieve

Page 4: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Approximation Schemes

•  Approximate by

•  Parameter tuning is tricky…

•  (Can achieve via a greedy “algorithm”).

Simultaneously optimize

Page 5: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Randomize, don’t optimize

•  Approximate by

•  For which functions can we achieve ?

•  How are these functions related to objects we already know and love?

•  Practical Implementations

optimize sample

Page 6: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Function Class

•  Fix parameterized basis functions

•  Fix a probability distribution

•  Our target space will be:

•  With the convention that

Page 7: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Random Features: Example

•  Fourier basis functions:

•  Gaussian parameters

•  If , then means

that the frequency distribution of f has subgaussian tails.

Page 8: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

•  Thm: Let f be in with . Let θ1,…, θn be sampled iid from p. Then with probability at least 1 - δ:

•  If additionally, φ(x;θ)=φ(θ'x), with φ:R→R L-Lipschitz, φ(0)=0, and |φ|<1 and p has a finite second moment, then with probability at least 1- δ

where

Page 9: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Reproducing Kernel Hilbert Spaces

•  A symmetric function k:X£X ! R is a positive definite kernel if for all N

•  Reproducing Kernel Hilbert Space:

•  Extensive Applications: Support Vector Machines, Kernel Machines, etc.

Page 10: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

•  RKHS generated by k:

•  Fp is dense in H, and for any f 2 Fp

Page 11: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Gaussian RKHS vs Random Features

•  Representer Theorem: for many applications, the optimal function in an RKHS is of the form

•  RKHS form is preferred: when number of data points is small or the function is not smooth

•  Random Features are preferred: when number of data points is very large or the Representer theorem doesn’t apply

given data set

Page 12: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Fourier Random Features

RKHS is dense in continuous functions

Page 13: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Random Decision Stumps

Boosting Features

Page 14: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Binning Random Features

Lay a random grid so that for any x and y Pr[x and y are binned together] = k(x,y)

φ(x) is the bin ID, encoded as a binary indicator vector.

1 2 3 4

5 6 7 8

9 10 11 12

13 14 15 …

δ -δ

1

Page 15: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

% Approximates Gaussian Process regression

% with Gaussian kernel of variance gamma

% lambda: regularization parameter

% dataset: X is dxN, y is 1xN

% test: xtest is dx1

% D: dimensionality of random feature

% training

w = randn(D, size(X,1));

b = 2*pi*rand(D,1);

Z = cos(sqrt(gamma)*w*X + repmat(b,1,size(X,2)));

alpha = (lambda*eye(size(X,2)+Z*Z')\(Z*y);

% testing

ztest = alpha(:)’*cos( sqrt(gamma)*w*xtest(:) + …

+ repmat(b,1,size(X,2)) );

Page 16: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,
Page 17: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,
Page 18: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,
Page 19: Uniform Approximation of Functions with Random Basespages.cs.wisc.edu/~brecht/cs838docs/brecht.project.pdf · Gaussian RKHS vs Random Features • Representer Theorem: for many applications,

Randomize

Optimize