Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way...
-
date post
21-Dec-2015 -
Category
Documents
-
view
219 -
download
3
Transcript of Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way...
![Page 1: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/1.jpg)
Object Orie’d Data Analysis, Last Time
• Kernel Embedding– Use linear methods in a non-linear way
• Support Vector Machines– Completely Non-Gaussian Classification
• Distance Weighted Discrimination– HDLSS Improvement of SVM
– Used in microarray data combination
– Face Data, Male vs. Female
![Page 2: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/2.jpg)
Support Vector MachinesForgotten last time,
Important Extension:
Multi-Class SVMs
Hsu & Lin (2002)
Lee, Lin, & Wahba (2002)
• Defined for “implicit” version
• “Direction Based” variation???
![Page 3: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/3.jpg)
Distance Weighted Discrim’n 2=d Visualization:
Pushes PlaneAway FromData
All PointsHave SomeInfluence
n
i iw r1,
1min
![Page 4: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/4.jpg)
Distance Weighted Discrim’n Maximal Data Piling
![Page 5: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/5.jpg)
HDLSS Discrim’n Simulations
Main idea:
Comparison of
• SVM (Support Vector Machine)
• DWD (Distance Weighted Discrimination)
• MD (Mean Difference, a.k.a. Centroid)
Linear versions, across dimensions
![Page 6: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/6.jpg)
HDLSS Discrim’n Simulations
Overall Approach:
• Study different known phenomena– Spherical Gaussians
– Outliers
– Polynomial Embedding
• Common Sample Sizes
• But wide range of dimensions
25 nn
1600,400,100,40,10d
![Page 7: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/7.jpg)
HDLSS Discrim’n Simulations
Spherical Gaussians:
![Page 8: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/8.jpg)
HDLSS Discrim’n Simulations
Spherical Gaussians:• Same setup as before• Means shifted in dim 1 only,• All methods pretty good• Harder problem for higher dimension• SVM noticeably worse• MD best (Likelihood method)• DWD very close to MS• Methods converge for higher
dimension??
2.21
![Page 9: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/9.jpg)
HDLSS Discrim’n Simulations
Outlier Mixture:
![Page 10: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/10.jpg)
HDLSS Discrim’n Simulations
Outlier Mixture:80% dim. 1 , other dims 020% dim. 1 ±100, dim. 2 ±500, others 0• MD is a disaster, driven by outliers• SVM & DWD are both very robust• SVM is best• DWD very close to SVM (insig’t
difference)• Methods converge for higher dimension??
Ignore RLR (a mistake)
2.21
![Page 11: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/11.jpg)
HDLSS Discrim’n Simulations
Wobble Mixture:
![Page 12: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/12.jpg)
HDLSS Discrim’n Simulations
Wobble Mixture:80% dim. 1 , other dims 020% dim. 1 ±0.1, rand dim ±100, others
0• MD still very bad, driven by outliers• SVM & DWD are both very robust• SVM loses (affected by margin push)• DWD slightly better (by w’ted influence)• Methods converge for higher dimension??
Ignore RLR (a mistake)
2.21
![Page 13: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/13.jpg)
HDLSS Discrim’n Simulations
Nested Spheres:
![Page 14: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/14.jpg)
HDLSS Discrim’n SimulationsNested Spheres:
1st d/2 dim’s, Gaussian with var 1 or C2nd d/2 dim’s, the squares of the 1st dim’s(as for 2nd degree polynomial embedding)
• Each method best somewhere• MD best in highest d (data non-Gaussian)• Methods not comparable (realistic)• Methods converge for higher
dimension??• HDLSS space is a strange place
Ignore RLR (a mistake)
![Page 15: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/15.jpg)
HDLSS Discrim’n SimulationsConclusions:
• Everything (sensible) is best sometimes• DWD often very near best• MD weak beyond Gaussian
Caution about simulations (and examples):• Very easy to cherry pick best ones• Good practice in Machine Learning
– “Ignore method proposed, but read paper for useful comparison of
others”
![Page 16: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/16.jpg)
HDLSS Discrim’n Simulations
Caution: There are additional players
E.g. Regularized Logistic Regression
looks also very competitive
Interesting Phenomenon:
All methods come together
in very high dimensions???
![Page 17: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/17.jpg)
1717
UNC, Stat & OR
HDLSS Asymptotics: Simple Paradoxes, I
For dim’al Standard Normal dist’n:
Euclidean Distance to Origin (as ):
- Data lie roughly on surface of sphere of radius
- Yet origin is point of highest density???
- Paradox resolved by:
density w. r. t. Lebesgue Measure
d
d
dd
d
IN
Z
Z
Z ,0~1
)1(pOdZ
d
![Page 18: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/18.jpg)
1818
UNC, Stat & OR
HDLSS Asymptotics: Simple Paradoxes, II
For dim’al Standard Normal dist’n: indep. of
Euclidean Dist. between and (as ):
Distance tends to non-random constant:
Can extend to Where do they all go???
(we can only perceive 3 dim’ns)
d
d
dd INZ ,0~2
)1(221 pOdZZ
1Z
1Z 2Z
nZZ ,...,1
![Page 19: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/19.jpg)
1919
UNC, Stat & OR
HDLSS Asymptotics: Simple Paradoxes, III
For dim’al Standard Normal dist’n: indep. of
High dim’al Angles (as ):
- Everything is orthogonal??? - Where do they all go???
(again our perceptual limitations) - Again 1st order structure is non-random
d
d
dd INZ ,0~2
)(90, 2/121
dOZZAngle p
1Z
![Page 20: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/20.jpg)
2020
UNC, Stat & OR
HDLSS Asy’s: Geometrical Representation, I
Assume , let
Study Subspace Generated by Data
Hyperplane through 0, of dimension
Points are “nearly equidistant to 0”, & dist
Within plane, can “rotate towards Unit Simplex”
All Gaussian data sets are“near Unit Simplex
Vertices”!!!
“Randomness” appears only in rotation of simplex
n
d ddn INZZ ,0~,...,1
d
d
Hall, Marron & Neeman (2005)
![Page 21: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/21.jpg)
2121
UNC, Stat & OR
HDLSS Asy’s: Geometrical Representation, II
Assume , let
Study Hyperplane Generated by Data
dimensional hyperplane
Points are pairwise equidistant, dist
Points lie at vertices of “regular hedron”
Again “randomness in data” is only in rotation
Surprisingly rigid structure in data?
1n
d ddn INZZ ,0~,...,1
d2
d~
n
![Page 22: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/22.jpg)
2222
UNC, Stat & OR
HDLSS Asy’s: Geometrical Representation, III
Simulation View: shows “rigidity after rotation”
![Page 23: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/23.jpg)
2323
UNC, Stat & OR
HDLSS Asy’s: Geometrical Representation, III
Straightforward Generalizations:
non-Gaussian data: only need moments
non-independent: use “mixing conditions”
Mild Eigenvalue condition on Theoretical Cov. (with J. Ahn, K. Muller & Y. Chi)
All based on simple “Laws of Large Numbers”
![Page 24: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/24.jpg)
2424
UNC, Stat & OR
HDLSS Asy’s: Geometrical Representation, IV
Explanation of Observed (Simulation) Behavior:
“everything similar for very high d ”
2 popn’s are 2 simplices (i.e. regular n-hedrons)
All are same distance from the other class
i.e. everything is a support vector
i.e. all sensible directions show “data piling”
so “sensible methods are all nearly the same”
Including 1 - NN
![Page 25: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/25.jpg)
2525
UNC, Stat & OR
HDLSS Asy’s: Geometrical Representation, V
Further Consequences of Geometric Representation
1. Inefficiency of DWD for uneven sample size(motivates weighted version, work in progress)
2. DWD more stable than SVM(based on deeper limiting distributions)(reflects intuitive idea feeling sampling
variation)(something like mean vs. median)
3. 1-NN rule inefficiency is quantified.
![Page 26: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/26.jpg)
2626
UNC, Stat & OR
The Future of Geometrical Representation?
HDLSS version of “optimality” results?
“Contiguity” approach? Params depend on d?
Rates of Convergence?
Improvements of DWD?
(e.g. other functions of distance than inverse)
It is still early days …
![Page 27: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/27.jpg)
2727
UNC, Stat & OR
NCI 60 DataNCI 60 Data
Recall from Sept. 6 & 8
NCI 60 Cell Lines
Interesting benchmark, since same cells
Data Web available:http://discover.nci.nih.gov/
datasetsNature2000.jsp
Both cDNA and Affymetrix Platforms
![Page 28: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/28.jpg)
2828
UNC, Stat & OR
NCI 60: Fully Adjusted Data, NCI 60: Fully Adjusted Data, Melanoma Melanoma ClusterCluster
BREAST.MDAMB435BREAST.MDN MELAN.MALME3M MELAN.SKMEL2 MELAN.SKMEL5 MELAN.SKMEL28 MELAN.M14 MELAN.UACC62 MELAN.UACC257
![Page 29: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/29.jpg)
2929
UNC, Stat & OR
NCI 60: Fully Adjusted Data, NCI 60: Fully Adjusted Data, Leukemia ClusterLeukemia Cluster
LEUK.CCRFCEM LEUK.K562 LEUK.MOLT4 LEUK.HL60 LEUK.RPMI8266LEUK.SR
![Page 30: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/30.jpg)
3030
UNC, Stat & OR
NCI 60: Views using DWD Dir’ns (focus on NCI 60: Views using DWD Dir’ns (focus on biology)biology)
![Page 31: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/31.jpg)
3131
UNC, Stat & OR
Real Clusters in NCI 60 Data?Real Clusters in NCI 60 Data?
From Sept. 8: Simple Visual Approach: Randomly relabel data (Cancer Types) Recompute DWD dir’ns & visualization Get heuristic impression from this Some types appeared signif’ly different Others did notDeeper Approach:
Formal Hypothesis Testing
![Page 32: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/32.jpg)
3232
UNC, Stat & OR
HDLSSHDLSS Hypothesis Testing Hypothesis Testing
Approach: DiProPerm TestDirection – Projection – Permutation
Ideas: Find an appropriate Direction vector Project data into that 1-d subspace Construct a 1-d test statistic Analyze significance by Permutation
![Page 33: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/33.jpg)
3333
UNC, Stat & OR
HDLSSHDLSS Hypothesis Testing – DiProPerm test Hypothesis Testing – DiProPerm test
DiProPerm Test
Context:
Given 2 sub-populations, X & Y
Are they from the same distribution?
Or significantly different?
H0: LX = LY vs. H1: LX ≠ LY
![Page 34: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/34.jpg)
3434
UNC, Stat & OR
HDLSSHDLSS Hypothesis Testing – DiProPerm test Hypothesis Testing – DiProPerm test
Reasonable Direction vectors:
Mean Difference
SVM
Maximal Data Piling
DWD (used in the following)
Any good discrimination direction…
![Page 35: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/35.jpg)
3535
UNC, Stat & OR
HDLSSHDLSS Hypothesis Testing – DiProPerm test Hypothesis Testing – DiProPerm test
Reasonable Projected 1-d statistics:
Two sample t-test (used here)
Chi-square test for different
variances
Kolmogorov - Smirnov
Any good distributional test…
![Page 36: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/36.jpg)
3636
UNC, Stat & OR
HDLSSHDLSS Hypothesis Testing – DiProPerm test Hypothesis Testing – DiProPerm test
DiProPerm Test Steps:1. For original data:
Find Direction vector Project Data, Compute True Test Statistic
2. For (many) random relabellings of data: Find Direction vector Project Data, Compute Perm’d Test Stat
3. Compare: True Stat among population of Perm’d Stat’s Quantile gives p-value
![Page 37: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/37.jpg)
3737
UNC, Stat & OR
HDLSSHDLSS Hypothesis Testing – DiProPerm test Hypothesis Testing – DiProPerm test
Remarks: Generally can’t use standard null
dist’ns… e.g. Students t-table, for t-statistic Because Direction and Projection
give nonstandard context I.e. violate traditional assumptions E.g. DWD finds separating directions Giving completely invalid test This motivates Permutation approach
![Page 38: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/38.jpg)
3838
UNC, Stat & OR
Improved Statistical Power - NCI 60 Improved Statistical Power - NCI 60 Melanoma Melanoma
![Page 39: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/39.jpg)
3939
UNC, Stat & OR
Improved Statistical Power - NCI 60 Leukemia Improved Statistical Power - NCI 60 Leukemia
![Page 40: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/40.jpg)
4040
UNC, Stat & OR
Improved Statistical Power - NCI 60 NSCLCImproved Statistical Power - NCI 60 NSCLC
![Page 41: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/41.jpg)
4141
UNC, Stat & OR
Improved Statistical Power - NCI 60 RenalImproved Statistical Power - NCI 60 Renal
![Page 42: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/42.jpg)
4242
UNC, Stat & OR
Improved Statistical Power - NCI 60 CNSImproved Statistical Power - NCI 60 CNS
![Page 43: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/43.jpg)
4343
UNC, Stat & OR
Improved Statistical Power - NCI 60 OvarianImproved Statistical Power - NCI 60 Ovarian
![Page 44: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/44.jpg)
4444
UNC, Stat & OR
Improved Statistical Power - NCI 60 ColonImproved Statistical Power - NCI 60 Colon
![Page 45: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/45.jpg)
4545
UNC, Stat & OR
Improved Statistical Power - NCI 60 BreastImproved Statistical Power - NCI 60 Breast
![Page 46: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/46.jpg)
4646
UNC, Stat & OR
Improved Statistical Power - SummaryImproved Statistical Power - Summary
Type cDNA -t Affy -t Comb -t
Affy-P Comb-P
Melanoma
36.8 39.9 51.8 e-7 0
Leukemia
18.3 23.8 27.5 0.12 0.00001
NSCLC 17.3 25.1 23.5 0.18 0.02
Renal 15.6 20.1 22.0 0.54 0.04
CNS 13.4 18.6 18.9 0.62 0.21
Ovarian 11.2 20.8 17.0 0.21 0.27
Colon 10.3 17.4 16.3 0.74 0.58
Breast 13.8 19.6 19.3 0.51 0.16
![Page 47: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/47.jpg)
4747
UNC, Stat & OR
HDLSSHDLSS Hypothesis Testing – DiProPerm test Hypothesis Testing – DiProPerm test
Many Open Questions on DiProPerm Test:
Which Direction is “Best”?
Which 1-d Projected test statistic?
Permutation vs. altern’es
(bootstrap?)???
How do these interact?
What are asymptotic properties?
![Page 48: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/48.jpg)
Independent Component Analysis
Idea: Find dir’ns that maximize indepen’ce
Motivating Context: Signal ProcessingBlind Source Separation
References:• Cardoso (1989)• Cardoso & Souloumiac (1993)• Lee (1998)• Hyvärinen and Oja (1999)• Hyvärinen, Karhunen and Oja (2001)
![Page 49: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/49.jpg)
Independent Component Analysis
ICA, motivating example:Cocktail party problem
Hear several simultaneous conversations
would like to “separate them”
Model for “conversations”:time series:
and ts1 ts2
![Page 50: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/50.jpg)
Independent Component Analysis
Cocktail Party Problem
![Page 51: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/51.jpg)
Independent Component Analysis
ICA, motivating example:Cocktail party problem
What the ears hear:Ear 1: Mixed version of signals:
Ear 2: A second mixture:
tsatsatx 2121111
tsatsatx 2221212
![Page 52: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/52.jpg)
Independent Component Analysis
What the ears hear: Mixed versions
![Page 53: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/53.jpg)
Independent Component Analysis
Goal: Recover “signal”
from “data”
for unknown “mixture matrix” ,
where , for all
Goal is to find “separating weights”, ,
so that , for all
Problem: would be fine,
but is unknown
)(
)()(
2
1
ts
tsts
)(
)()(
2
1
tx
txtx
2221
1211
aa
aaA
sAx t
W
xWs t1AW
A
![Page 54: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/54.jpg)
Independent Component Analysis
Solution 1: PCA
![Page 55: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/55.jpg)
Independent Component Analysis
Solution 2: ICA
![Page 56: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/56.jpg)
Independent Component Analysis
“Solutions” for Cocktail Party example:Approach 1: PCA
(on “population of 2-d vectors”)Directions of Greatest Variability do not
solve this problemApproach 2: ICA
(will describe method later)Independent Component directions do
solve the problem(modulo “sign changes” and
“identification”)
![Page 57: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/57.jpg)
Independent Component Analysis
Relation to FDA: recall “data matrix”
Signal Processing: focus on rows ( time series, for )
Functional Data Analysis: focus on columns ( data vectors)
Note: same 2 different viewpoints as dual problems in PCA
dnd
n
n
XX
XX
XXX
1
111
1
d nt ,...,1
n
![Page 58: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/58.jpg)
Independent Component Analysis
FDA Style Scatterplot View - Signals
nttsts ,...,1:)(),( 21
![Page 59: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/59.jpg)
Independent Component Analysis
FDA Style Scatterplot View - Data
nttxtx ,...,1:)(),( 21
![Page 60: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/60.jpg)
Independent Component Analysis
FDA Style Scatterplot View:
• Scatterplots give hint how blind recovery is possible
• Affine Transformation
stretches indep’t signals into dependent
• Inversion is key to ICA
(even when is unknown)
sAx
A
![Page 61: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/61.jpg)
Independent Component Analysis
Why not PCA?• Finds direction of greatest variability• Wrong direction for signal separation
![Page 62: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/62.jpg)
Independent Component Analysis
ICA Step 1:• “sphere the data” (i.e. find linear transfo to make mean = , cov = )• i.e. work with • requires of full rank
(at least , i.e. no HDLSS)• search for independence beyond
linear and quadratic structure
0I
ˆ 2/1 XZ
Xdn
![Page 63: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/63.jpg)
Independent Component AnalysisICA Step 2:• Find directions that make (sphered)
data as independent as possible• Worst case: Gaussian
Sphered data are independentInteresting “converse application” of
C.L.T.:• For and independent
(& non-Gaussian)• is “more Gaussian” for • so maximal independence comes
from least Gaussian directions
1S 2S
211 1 SuuSX 2
1u
![Page 64: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/64.jpg)
Independent Component Analysis
ICA Step 2:• Find dir’ns that make (sphered) data
as independent as possibleRecall “independence” means:
Joint distribution is product of Marginals
In cocktail party example:• Happens only when rotated so
support parallel to axes• Otherwise have blank areas, • while marginals are non-zero
![Page 65: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/65.jpg)
Independent Component Analysis
Parallel Idea (and key to algorithm):
Find directions that max non-Gaussianity
Reason:
• starting from independent coordinates
most projections are Gaussian
(since projection is “linear combo”)
Mathematics behind this:
Diaconis and Freedman (1984)
![Page 66: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/66.jpg)
Independent Component Analysis
Worst case for ICA:
• Gaussian marginals
• Then sphered data are independent
• So have independence in all directions
• Thus can’t find useful directions
Gaussian distribution is characterized by:
Independent & spherically symmetric
![Page 67: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/67.jpg)
Independent Component Analysis
Criteria for non-Gaussianity / independence:
• kurtosis ( , 4th order cumulant)
• negative entropy
• mutual information
• nonparametric maximum likelihood
• “infomax” in neural networks
• interesting connections between these
224 3 EXEX
![Page 68: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/68.jpg)
Independent Component Analysis
Matlab Algorithm (optimizing any of above):
FastICA
• http://www.cis.hut.fi/projects/ica/fastica/
• Numerical gradient search method
• Can find directions iteratively
• Or by simultaneous optimization
• Appears fast, with good defaults
• Should we worry about local optima???
![Page 69: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/69.jpg)
Independent Component Analysis
Notational summary:
1. First sphere data:
2. Apply ICA: find rotation to
make rows of
independent
3. Can transform back to original data
scale:
ˆ 2/1 XZ
SW
ZWS SS
SSS 2/1
![Page 70: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/70.jpg)
Independent Component AnalysisIdentifiability problem 1:
Generally can’t order rows of (& )
Since for a permutation matrix (pre-multiplication by swaps rows)
(post-multiplication by swaps columns)for each col’n, i.e.
So and are also solutions (i.e. )
SS S
PPP
SSSS sPPAsAz 1 zPWsP SS
SPSSPW ZPWPS SS
![Page 71: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/71.jpg)
Independent Component AnalysisIdentifiability problem 1: Row Order
Saw this in Cocktail Party Example
FastICA: orders by non-Gaussian-ness?
![Page 72: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/72.jpg)
Independent Component Analysis
Identifiability problem 2: Can’t find scale of elements of
Since for a (full rank) diagonal matrix (pre-mult’n by is scalar mult’n of rows)(post-mult’n by is scalar mult’n of col’s)for each col’n, i.e.
So and are also solutions
s
DDD
SSSS sDDAsAz 1 zDWsD SS
SDS SDW
![Page 73: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/73.jpg)
Independent Component AnalysisIdentifiability problem 2: Signal Scale
Not so clear in Cocktail Party Example
![Page 74: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/74.jpg)
Independent Component Analysis
Signal Processing Scale identification: (Hyvärinen and Oja)
Choose scale so each signal has unit average energy:
• Preserves energy along rows of data matrix
• Explains same scales in Cocktail Party Example
)(tsi
t
i ts2)(
![Page 75: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/75.jpg)
Independent Component Analysis
Would like to do:• More toy examples• Illustrating how non-Gaussianity
works
Like to see some?
Check out old course notes:http://www.stat.unc.edu/postscript/papers/marron/Teaching/CornellFDA/Lecture03-11-02/FDA03-11-02.pdf
http://www.stat.unc.edu/postscript/papers/marron/Teaching/CornellFDA/Lecture03-25-02/FDA03-25-02.pdf
![Page 76: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/76.jpg)
Independent Component AnalysisOne more “Would like to do”:ICA testing of multivariate Gaussianity
Usual approaches: 1-d tests on marginals
New Idea: use ICA to find “least Gaussian
Directions”, and base test on those.
Koch, Marron and Chen (2004)
![Page 77: Object Orie’d Data Analysis, Last Time Kernel Embedding –Use linear methods in a non-linear way Support Vector Machines –Completely Non-Gaussian Classification.](https://reader038.fdocuments.in/reader038/viewer/2022103022/56649d575503460f94a35642/html5/thumbnails/77.jpg)
Unfortunately Not Covered
• DWD & Micro-array Outcomes Data• Windup from FDA04-22-02.doc
– General Conclusion– Validation