Paul Luo Li (Carnegie Mellon University) James Herbsleb (Carnegie Mellon University)
Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch...
Transcript of Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch...
![Page 1: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/1.jpg)
Nonparametric information estimation using Rank based Euclidean Graph Optimization
methods
Barnabás Póczos
Department of Computing Science,
University of Alberta, Canada
Machine Learning Lunch
Carnegie Mellon University
Feb 8, 2010
![Page 2: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/2.jpg)
2
Outline
GOAL: Information estimation
Background on information and entropy
Applications in machine learning
Graph optimization methods
TSP, MST, kNNCopula transformation
Information estimation
![Page 3: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/3.jpg)
Background on information and entropy
![Page 4: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/4.jpg)
4
Who cares about dependence?
• Observing physical, biological, chemical systems
– Which observations are dependent / independent?
– Is there dependence between inputs and outputs?
• Neuroscience: dependence between stimuli and neural response
• Stock markets– which stocks are dependent of each others, which are
independent,
– dependence on other events?
![Page 5: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/5.jpg)
5
What is dependence?
We want to measure dependence
Correlation?
… nope that’s not good enough…
Correlation can be zero when there is dependence between the variables
Isn’t this only an artificial problem?
It is a serious problem…
![Page 6: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/6.jpg)
6
Correlation?… nope that’s not good enough
Correlation measures only pairwise dependencies.
What can we do when we have more variables?
X1
X2
covariance matrix = I
cross-correlation = 0,
but there is dependence
that’s why PCA is not able to separate mixed sources, and we need
ICA methods for considering higher order dependencies, too.
![Page 7: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/7.jpg)
7
What is dependence ?
Alfréd Rényi
1961, Hungary
![Page 8: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/8.jpg)
8
-divergence
• Rényi’s -information
• Shannon’s mutual information
Special cases:
Theorem:
1967, Hungary
Imre Csiszár
![Page 9: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/9.jpg)
9
Measuring Dependence and Uncertainty
Shannon mutual information
Measuring uncertainty (Shannon entropy)
1948, Princeton
Claude Shannon
![Page 10: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/10.jpg)
10
Rényi’s information and entropy
Rényi’s entropy
Rényi’s information
![Page 11: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/11.jpg)
Applications of mutual information in machine learning
![Page 12: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/12.jpg)
12
Feature selection (Peng et. al, 2005)
Goal:
• find a set of features that has the largest dependence
with the target class
• find dependent and independent features
XT =XT Rn x d
labels
Microarray dataprocessing
d = number of genes
n = number of patients
![Page 13: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/13.jpg)
13
Feature selection (Peng et. al, 2005)
XT =XT Rn x d
labels
d = number of genes
n = number of patients
Max dependency
Max relevance
Min redundancy, Max relevance
Don’t want dependent features
![Page 14: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/14.jpg)
14
Independent Subspace Analysis
Observation
Hidden, independent sources
(subspaces)
Estimate A and S observing samples from X only
Goal:
![Page 15: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/15.jpg)
15
Independent Subspace Analysis
In case of perfect
separation WA is a
block permutation
matrix
Objective:
![Page 16: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/16.jpg)
16
Mutual Information Applications
• Information theory, Channel capacity
• Information geometry
• Clustering
• Optimal experiment design, active learning
• Prediction of protein structure
• Drug design
• Boosting
• fMRI data processing
• …
![Page 17: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/17.jpg)
The estimation problem
![Page 18: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/18.jpg)
18
The estimation problem
GOAL:
![Page 19: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/19.jpg)
19
How can we estimate these?
How can we estimate them?
![Page 20: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/20.jpg)
Graph optimization methods
TSP, MST, kNN
![Page 21: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/21.jpg)
21
History• 1959, Beardwood, Halton, Hammersley
– Given uniform iid samples on [0,1]2. TSP length=?
• 1976, Karp
– randomly distributed cities
– This asymptotic can be used for developing a deterministic poly time near-optimal(1+) TSP algorithm (a.s.)
– 1st to show that 9 alg. for a stochastic version
of an NP complete problem (TSP) that inpolynomial time gives a near optimal solution (a.s.)
John Hammersley
Oxford, UK
Richard Karp
Berkeley, CA
– uniform iid samples on [0,1]d. TSP length=?
![Page 22: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/22.jpg)
22
• 1981 – TSP ) MST, Minimal Matching graphs, conjecture for kNN
• Renyi’s -entropy estimation?
observations with density f on [0,1]d
• Beardwood, Halton, Hammersley
Michael Steele Joseph Yukich Wansoo Rhee
1999, Hero, Costa & Michel
![Page 23: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/23.jpg)
23
Steele, Yukich theorem for MST, TSP, Minimal Matching
(Fig taken from J. Costa and A. Hero)Sensitive to outliers…!
![Page 24: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/24.jpg)
24
A new result: This procedure works on Set-Nearest Neighborhood
graphs, too
Calculate:
S={1,3,7} ) consider 1st, 3rd, 7th nearest neighbors of each node
S={k} ) consider the kth nearest neighbors of each node
S={1,2,…,k} ) kNN graphs
![Page 25: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/25.jpg)
25
Rényi’s entropy estimators using Set-Nearest Neighborhood graphs
(Fig. taken from J. Costa and A. Hero)
![Page 26: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/26.jpg)
How can we get information estimators from entropy estimators?
How can we make the estimators more robust?
![Page 27: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/27.jpg)
27
The invariance trick
Information is preserved under monotonic transformations.
![Page 28: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/28.jpg)
28
Transformation to get uniform marginals
The transformation (copula transformation):
Monotone transform Uniform marginals
• Information estimation problem is reduced for
Rényi’s entropy estimation
• a little problem: we don’t know Fi distribution functions
Monotone transformation leading to uniform marginals?
Prob theory 101:
![Page 29: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/29.jpg)
29
Copula methods
• Very popular in financial analysis
– Capture dependence between marginal variables
– An alternative form of normalization
• … but not so widespread in machine learning…
Well in financial analysis, they led
to the global financial crisis…
Maybe we should use them in ML more often!
![Page 30: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/30.jpg)
30
The copula transformation
Monotone transform Uniform marginals
a little problem: we don’t know Fi distribution functions
Solution:
Use empirical distributions Fjn and empirical copula transform.
We need this in 1D only! ) no curse of dimensionality.
![Page 31: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/31.jpg)
31
Empirical copula transformation
We don’t know F1,…,Fd distribution functions
) estimate them with empirical distribution functions
![Page 32: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/32.jpg)
32
Empirical copula transformation
“true” copula transformation:
empirical copula transformation:
empirical copula transformation of the observations:
![Page 33: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/33.jpg)
33
REGO AlgorithmRank-based Euclidean Graph Optimization
![Page 34: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/34.jpg)
Theoretical Results
consistency
![Page 35: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/35.jpg)
35
Is consistency trivial?
Difficulties:
• The true copula transformation is not known… only the empirical
• Samples lie on a grid, not coming from the true copula distribution
True copula transform
Empirical copula transform
![Page 36: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/36.jpg)
36
• Empirical copula transformed points are not iid in n
) Steele & Yukich’s theorem cannot be applied
We need different proofs for the two cases:
• 0<p· 1: We have triangle inequality for ||.||p, but it is not Lipschitz
• 1<p<d/2: ||.||p is Lipschitz, but no triangle inequality
Difficulties:
Is consistency trivial?
![Page 37: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/37.jpg)
37
Consistency Results
Main theorem:
Note:• Consistent MI estimator using ranks only
• Ranks only ) Robust
• We don’t have theoretical results for other d and values…
![Page 38: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/38.jpg)
Empirical Results
![Page 39: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/39.jpg)
39
Empirical results, consistency in 2D
n
kMST=0.8n
![Page 40: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/40.jpg)
40
Empirical results, consistency in 10D
![Page 41: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/41.jpg)
41
Image registration
n=4900=70 x 70
200 outliers
![Page 42: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/42.jpg)
42
Independent subspace analysis
![Page 43: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/43.jpg)
43
Independent subspace analysisin the presence of outliers
![Page 44: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/44.jpg)
44
Infinitesimal robustness (finite sample influence function)
The finite sample influence functions
The amount of change caused by adding one outlier, x
(with some modification)
It cannot be arbitrarily big!
![Page 45: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/45.jpg)
45
Conclusion
Marriage of seemingly unrelated mathematical areas produced a consistent and robust
Rényi’s information estimator
Graph optimization methods
TSP, MST, kNNCopula transformation
Information estimation
![Page 46: Machine Learning Lunch Carnegie Mellon Universitybapoczos/other... · Machine Learning Lunch Carnegie Mellon University Feb 8, 2010. 2 Outline GOAL: Information estimation Background](https://reader036.fdocuments.in/reader036/viewer/2022071019/5fd27e489f13f440892c1a7a/html5/thumbnails/46.jpg)
46
Thanks for your attention!
“Hey Rege, Róka Rege, Hey REGŐ Rejtem…
I am calling my grandfather, I feel his soul here.
I am listening to his words, they are breaking the silence.
Impart your knowledge to your grandson please,
1000 years have already passed…,
Knowledge equals life, hey!”