Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · *...
Transcript of Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · *...
![Page 1: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/1.jpg)
Learning Statistical Property Testers
Sreeram KannanUniversity of Washington Seattle
![Page 2: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/2.jpg)
Collaborators
ArmanRahimzamani
HimanshuAsnani
University of Washington, Seattle
Sudipto Mukherjee
Rajat Sen
KarthikeyanShanmugan
UT, Austin IBM Research
![Page 3: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/3.jpg)
Statistical Property Testing
✤ Closeness testing
✤ Independence testing
✤ Conditional Independence testing
✤ Information estimation
![Page 4: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/4.jpg)
255075
100
Testing Total Variation Distance
255075
100
QP
![Page 5: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/5.jpg)
255075
100
Testing Total Variation Distance
255075
100
QP
n samples n samples
![Page 6: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/6.jpg)
255075
100
255075
100
QP
n samples n samples
Estimate DTV (P,Q) ?
Testing Total Variation Distance
![Page 7: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/7.jpg)
255075
100
255075
100
QP
n samples n samples
Search beyond Traditional Density Estimation Methods
Estimate DTV (P,Q) ?
Testing Total Variation Distance
P and Q can be arbitrary.
![Page 8: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/8.jpg)
Testing Total Variation: Prior Art
✤ Lots of work in CS theory on DTV testing
✤ Based on closeness testing between P and Q
✤ Sample complexity = O(na), where n = alphabet size
✤ Curse of dimensionality if n = 2d Complexity is O(2ad)
* Chan et al, Optimal Algorithms for testing closeness of discrete distributions, SODA 2014
* Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions, NIPS 2009
![Page 9: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/9.jpg)
Classifiers beat curse-of-dimensionality
✤ Deep NN and boosted random forests achieve state-of-the-art performance
✤ Works very well even in practice when X is high dimensional.
✤ Exploits generic inductive bias:
✤ Invariance
✤ Hierarchical Structure
✤ Symmetry
Theoretical guarantees lag severely behind practice!
![Page 10: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/10.jpg)
255075
100
Distance Estimation via Classification
255075
100
n samples ⇠ P n samples ⇠ Q
![Page 11: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/11.jpg)
255075
100
Distance Estimation via Classification
255075
100
n samples ⇠ P n samples ⇠ Q
(Label 0) (Label 1)
Classifier
![Page 12: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/12.jpg)
255075
100
Distance Estimation via Classification
255075
100
n samples ⇠ P n samples ⇠ Q
(Label 0) (Label 1)
Classifier
12 � 1
2DTV(P,Q).Classification Error of Optimal Bayes
Classifier=
![Page 13: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/13.jpg)
255075
100
Distance Estimation via Classification
255075
100
n samples ⇠ P n samples ⇠ Q
(Label 0) (Label 1)
Deep NN, Boosted Trees etc.
12 � 1
2DTV(P,Q).Classification Error of
Optimal Classifier =
* Lopez-Paz et al, Revisiting Classifier two-sample tests, ICLR 2017
* Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions, NIPS 2009
![Page 14: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/14.jpg)
255075
100
Distance Estimation via Classification
255075
100
n samples ⇠ P n samples ⇠ Q
(Label 0) (Label 1)
Deep NN, Boosted Trees etc.
12 � 1
2DTV(P,Q).Classification Error of
Any Classifier >=
* Lopez-Paz et al, Revisiting Classifier two-sample tests, ICLR 2017
* Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions, NIPS 2009
Can getP-valuecontrol
![Page 15: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/15.jpg)
Independence Testing
n samples {xi, yi}ni=1
* Lopez-Paz et al, Revisiting Classifier two-sample tests, ICLR 2017
* Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions, NIPS 2009
![Page 16: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/16.jpg)
Independence Testing
n samples {xi, yi}ni=1
nH0 : X || Y (PCI)
H1 : X 6?? Y (P)
![Page 17: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/17.jpg)
Independence Testing
n samples {xi, yi}ni=1
nH0 : X || Y (PCI)
H1 : X 6?? Y (P)
Classify
P(p(x, y))
PCI(p(x)p(y))
![Page 18: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/18.jpg)
Independence Testing
n samples {xi, yi}ni=1
nH0 : X || Y (PCI)
H1 : X 6?? Y (P)
Classify
P(p(x, y))
PCI(p(x)p(y))PCI(p(x)p(y))
![Page 19: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/19.jpg)
Independence Testing
n samples {xi, yi}ni=1
nH0 : X || Y (PCI)
H1 : X 6?? Y (P)
Classify
P(p(x, y))
PCI(p(x)p(y))
Permutation
PCI(p(x)p(y))
![Page 20: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/20.jpg)
Independence Testingn samples {xi, yi}ni=1
Split Equally
![Page 21: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/21.jpg)
Independence Testingn samples {xi, yi}ni=1
P(p(x, y))
Split Equally
![Page 22: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/22.jpg)
Independence Testingn samples {xi, yi}ni=1
P(p(x, y))
Split Equally
Label 0
![Page 23: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/23.jpg)
Independence Testingn samples {xi, yi}ni=1
P(p(x, y))
Split Equally
Label 0 yi’s are permuted
![Page 24: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/24.jpg)
Independence Testingn samples {xi, yi}ni=1
P(p(x, y))
Split Equally
Label 0 yi’s are permuted
PCI(p(x)p(y))
![Page 25: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/25.jpg)
Independence Testingn samples {xi, yi}ni=1
P(p(x, y))
Split Equally
Label 0 yi’s are permuted
PCI(p(x)p(y))
Label 1
![Page 26: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/26.jpg)
Independence Testingn samples {xi, yi}ni=1
P(p(x, y))
Split Equally
Label 0 yi’s are permuted
PCI(p(x)p(y))
Label 1
*Lopez-Paz et al, Revisiting Classifier two-sample tests, ICLR 2017
* Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions, NIPS 2009
P-value control
![Page 27: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/27.jpg)
Conditional Independence Testing
n samples {xi, yi, zi}ni=1
n H0 : X || Y |Z (PCI)
H1 : X 6?? Y |Z (P)
vs
![Page 28: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/28.jpg)
Conditional Independence Testing
n samples {xi, yi, zi}ni=1
n H0 : X || Y |Z (PCI)
H1 : X 6?? Y |Z (P)
Classify
vs
P(p(x, y, z))
PCI(p(z)p(x|z)p(y|z))
![Page 29: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/29.jpg)
Conditional Independence Testing
n samples {xi, yi, zi}ni=1
n H0 : X || Y |Z (PCI)
H1 : X 6?? Y |Z (P)
Classify
vs
P(p(x, y, z))
PCI(p(z)p(x|z)p(y|z))How to get PCI(p(z)p(x|z)p(y|z)?
![Page 30: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/30.jpg)
Conditional Independence Testing
n samples {xi, yi, zi}ni=1
n H0 : X || Y |Z (PCI)
H1 : X 6?? Y |Z (P)
Classify
vs
P(p(x, y, z))
PCI(p(z)p(x|z)p(y|z))Given samples ⇠ p(x, z)How to emulate p(y|z)?
![Page 31: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/31.jpg)
Conditional Independence Testing
n samples {xi, yi, zi}ni=1
n H0 : X || Y |Z (PCI)
H1 : X 6?? Y |Z (P)
Classify
vs
✤ KNN Based Methods
✤ Kernel Methods
P(p(x, y, z))
PCI(p(z)p(x|z)p(y|z))
Emulate p(y|z) as q(y|z)
![Page 32: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/32.jpg)
Conditional Independence Testing
n samples {xi, yi, zi}ni=1
n H0 : X || Y |Z (PCI)
H1 : X 6?? Y |Z (P)
Classify
vs
P(p(x, y, z))
PCI(p(z)p(x|z)q(y|z))
PCI(p(z)p(x|z)q(y|z))
Emulate p(y|z) as q(y|z)✤ KNN Based
Methods
✤ Kernel Methods
![Page 33: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/33.jpg)
Conditional Independence Testing
n samples {xi, yi, zi}ni=1
n H0 : X || Y |Z (PCI)
H1 : X 6?? Y |Z (P)
Classify
vs
P(p(x, y, z))
PCI(p(z)p(x|z)q(y|z))
PCI(p(z)p(x|z)q(y|z))
Emulate p(y|z) as q(y|z)✤ KNN Based
Methods
✤ Kernel Methods
✤ [KCIT] Gretton et al, Kernel-based conditional independence test and application in causal discovery, NIPS 2008
✤ [KCIPT] Doran et al, A permutation-based kernel conditional independence test, UAI 2014
✤ [CCIT] Sen et al, Model-Powered Conditional Independence Test, NIPS 2017
✤ [RCIT] Strobl et al, Approximate Kernel-based Conditional Independence Tests for Fast Non-Parametric Causal Discovery, arXiv
![Page 34: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/34.jpg)
Conditional Independence Testing
n samples {xi, yi, zi}ni=1
n H0 : X || Y |Z (PCI)
H1 : X 6?? Y |Z (P)
Classify
vs
P(p(x, y, z))
PCI(p(z)p(x|z)q(y|z))
PCI(p(z)p(x|z)q(y|z))
Emulate p(y|z) as q(y|z)✤ KNN Based
Methods
✤ Kernel Methods
✤ Limited to low-dimensional Z.
In practice, Z is often high dimensional.
(Eg. In graphical model, conditioning set can be entire graph.)
![Page 35: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/35.jpg)
Generative Models beat curse-of-dimensionality
Generatorz xLow-dimensional
Latent SpaceHigh-dimensional
data Space
![Page 36: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/36.jpg)
Generative Models beat curse-of-dimensionality
Generatorz x
✤ Trained Real Samples of x
✤ Can generate any number of new samples
Low-dimensionalLatent Space
High-dimensionaldata Space
![Page 37: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/37.jpg)
Generative Models beat curse-of-dimensionality
Generatorz x
✤ Trained Real Samples of x
✤ Can generate any number of new samples
Low-dimensionalLatent Space
High-dimensionaldata Space
![Page 38: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/38.jpg)
How loose can the estimate be for PCI or q(y|z)?
![Page 39: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/39.jpg)
As long as the density function q(y|z) > 0 whenever p(y, z) > 0.
How loose can the estimate be for PCI or q(y|z)?
Mimic-and-Classify works
![Page 40: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/40.jpg)
As long as the density function q(y|z) > 0 whenever p(y, z) > 0.
Mimic Functions : GANs, Regressors etc.
How loose can the estimate be for PCI or q(y|z)?
Novel Bias Cancellation Method in Mimic-and-Classify works
![Page 41: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/41.jpg)
Mimic and ClassifyMimic Step
Classify Step
![Page 42: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/42.jpg)
Mimic and Classify50
100Mimic Step
Classify Step
D ⇠ p(x, y, z)
![Page 43: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/43.jpg)
Mimic and Classify50
100
50100
50100
Mimic Step
Classify Step
D1 ⇠ p(x, y, z)
D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)
![Page 44: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/44.jpg)
Mimic and Classify50
100
50100
50100
Mimic Step
Classify Step
D1 ⇠ p(x, y, z)
D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)
Dataset D2
(xi, yi, zi) zi y’i (xi, y’i, zi)
Dataset D’MIMIC
![Page 45: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/45.jpg)
Mimic and Classify50
100
MIMIC
50100
50100
50100
Mimic Step
Classify Step
D1 ⇠ p(x, y, z)
D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)
D
0 ⇠ p(z)p(x|z)q(y|z)
![Page 46: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/46.jpg)
Mimic and Classify50
100
MIMIC
50100
50100
50100
(Label 0) (Label 1)
Mimic Step
Classify Step
D1 ⇠ p(x, y, z)
D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)
D
0 ⇠ p(z)p(x|z)q(y|z)
![Page 47: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/47.jpg)
Mimic and Classify50
100
MIMIC
50100
50100
50100
(Label 0) (Label 1)D = D1 [D0
Mimic Step
Classify Step
D1 ⇠ p(x, y, z)
D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)
D
0 ⇠ p(z)p(x|z)q(y|z)
![Page 48: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/48.jpg)
Mimic and Classify50
100
MIMIC
50100
50100
50100
(Label 0) (Label 1)D = D1 [D0
D
Mimic Step
Classify Step
D1 ⇠ p(x, y, z)
D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)
D
0 ⇠ p(z)p(x|z)q(y|z)
![Page 49: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/49.jpg)
Mimic and Classify50
100
MIMIC
50100
50100
50100
(Label 0) (Label 1)D = D1 [D0
D
Mimic Step
Classify Step
D1 ⇠ p(x, y, z)
D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)
D
0 ⇠ p(z)p(x|z)q(y|z)
Classification Error : Exyz
![Page 50: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/50.jpg)
Mimic and Classify50
100
MIMIC
50100
50100
50100
(Label 0) (Label 1)D = D1 [D0
D D�x
Drop x
Mimic Step
Classify Step
D1 ⇠ p(x, y, z)
D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)
D
0 ⇠ p(z)p(x|z)q(y|z)
Classification Error : Exyz
![Page 51: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/51.jpg)
Mimic and Classify50
100
MIMIC
50100
50100
50100
(Label 0) (Label 1)D = D1 [D0
D D�x
Drop x
Mimic Step
Classify Step
D1 ⇠ p(x, y, z)
D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)
D
0 ⇠ p(z)p(x|z)q(y|z)
Classification Error : Classification Error : Exyz
Eyz
![Page 52: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/52.jpg)
Mimic and Classify50
100
MIMIC
50100
50100
50100
(Label 0) (Label 1)D = D1 [D0
D D�x
Drop x
Mimic Step
Classify Step
D1 ⇠ p(x, y, z)
D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)
D
0 ⇠ p(z)p(x|z)q(y|z)
Classification Error : Classification Error : Exyz
EyzD(p(xyz)|p(xz)q(y|z)) D(p(yz)|p(z)q(y|z))
![Page 53: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/53.jpg)
Mimic and Classify50
100
MIMIC
50100
50100
50100
(Label 0) (Label 1)D = D1 [D0
D D�x
Drop x
Mimic Step
Classify Step
D1 ⇠ p(x, y, z)
D2 ⇠ p(x, y, z)D ⇠ p(x, y, z)
D
0 ⇠ p(z)p(x|z)q(y|z)
Classification Error : Classification Error : Exyz
EyzD(p(xyz)|p(xz)q(y|z)) D(p(yz)|p(z)q(y|z))
Statistic = Exyz-Eyz Cancels bias due to q(y|z)
![Page 54: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/54.jpg)
Mimic and ClassifyMimic Step
As long as the density function q(y|z) > 0 whenever p(y, z) > 0.
Classify Step
![Page 55: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/55.jpg)
Mimic and ClassifyMimic Step
*The errors here are the corresponding optimal Bayes classifier errors.
As long as the density function q(y|z) > 0 whenever p(y, z) > 0.
Classify Step |E
D
[Exyz
]�ED
[Eyz
]| = 0 $ H0 is true
2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E
D
[Exyz
]�ED
[Eyz
]|
![Page 56: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/56.jpg)
Mimic and Classify (Theory)
2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E
D
[Exyz
]�ED
[Eyz
]|
![Page 57: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/57.jpg)
Mimic and Classify (Theory)
�Z
y,z
min(p(z)q(y|z), p(z)p(y|z))(1� ✏(y, z))d(y, z)
2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E
D
[Exyz
]�ED
[Eyz
]|
![Page 58: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/58.jpg)
Mimic and Classify (Theory)
�Z
y,z
min(p(z)q(y|z), p(z)p(y|z))(1� ✏(y, z))d(y, z)
Where: ✏(y, z) = max
⇡2⇧(p(x|z),p(x0|y,z))E⇡[1{x=x
0}|y, z]
Conditional dependence $ ✏(y, z) < 1 with non-zero probability
2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E
D
[Exyz
]�ED
[Eyz
]|
![Page 59: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/59.jpg)
Mimic and Classify (Theory)
2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))
�Z
y,z
min(p(z)q(y|z), p(z)p(y|z))(1� ✏(y, z))d(y, z)
Where: ✏(y, z) = max
⇡2⇧(p(x|z),p(x0|y,z))E⇡[1{x=x
0}|y, z]
Conditional dependence $ ✏(y, z) < 1 with non-zero probability
As long as the density function q(y|z) > 0 whenever p(y, z) > 0,
then conditional dependence implies that 2|ED[e]� ED[e�x
)]| > 0.
Theorem 1
2|ED
[Exyz
]�ED
[Eyz
]|
As long as the density function q(y|z) > 0 whenever p(y, z) > 0,
then conditional dependence implies that 2|ED[e]� ED[e�x
)]| > 0.
2|ED
[Exyz
]�ED
[Eyz
]| > 0
![Page 60: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/60.jpg)
Conditional independence implies p(x,y, z) = p(z)p(y|z)p(x|z).
DTV(p(z)p(y|z), p(z)q(y|z)) = DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))
Mimic and Classify (Theory)
![Page 61: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/61.jpg)
Conditional independence implies p(x,y, z) = p(z)p(y|z)p(x|z).
DTV(p(z)p(y|z), p(z)q(y|z)) = DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))
Mimic and Classify (Theory)
2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E
D
[Exyz
]�ED
[Eyz
]|
![Page 62: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/62.jpg)
Conditional independence implies p(x,y, z) = p(z)p(y|z)p(x|z).
DTV(p(z)p(y|z), p(z)q(y|z)) = DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))
= DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))
Mimic and Classify (Theory)
2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E
D
[Exyz
]�ED
[Eyz
]|
![Page 63: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/63.jpg)
Conditional independence implies p(x,y, z) = p(z)p(y|z)p(x|z).
DTV(p(z)p(y|z), p(z)q(y|z)) = DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))
= DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))= DTV(p(x,y, z), p(x|z)p(z)q(y|z))
Mimic and Classify (Theory)
2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))2|E
D
[Exyz
]�ED
[Eyz
]|
![Page 64: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/64.jpg)
Conditional independence implies that 2|ED[e]� ED[e�x
)]| = 0
Conditional independence implies p(x,y, z) = p(z)p(y|z)p(x|z).
DTV(p(z)p(y|z), p(z)q(y|z)) = DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))
2|ED[e]� ED[e�x)]|= DTV(p(z,x,y), p(z)q(y|z)p(x|z))�DTV(p(y, z), p(z)q(y|z))
= DTV(p(x|z)p(z)p(y|z), p(x|z)p(z)q(y|z))= DTV(p(x,y, z), p(x|z)p(z)q(y|z))
Mimic and Classify (Theory)
Theorem 2
2|ED
[Exyz
]�ED
[Eyz
]| = 0
2|ED
[Exyz
]�ED
[Eyz
]|
![Page 65: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/65.jpg)
Combining Theorem 1 and Theorem 2
Mimic and Classify (Theory)
Theorem 3
As long as the density function q(y|z) > 0 when p(y, z) > 0
|ED
[Exyz
]�ED
[Eyz
]| = 0 $ H0 is true
![Page 66: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/66.jpg)
MIMIFY - CGAN
Deep Learning based MIMIC Functions
GeneratorG(z,s)
s
z
DiscriminatorD(y,z)
(y,z)
[0,1]Gaussian Latent Space
![Page 67: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/67.jpg)
MIMIFY - CGAN
Deep Learning based MIMIC Functions
GeneratorG(z,s)
s
z
DiscriminatorD(y,z)
(y,z)
[0,1]
⇠ q(y|z)
Gaussian Latent Space
![Page 68: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/68.jpg)
MIMIFY - CGAN
Deep Learning based MIMIC Functions
MIMIFY - REG
GeneratorG(z,s)
s
z
DiscriminatorD(y,z)
(y,z)
[0,1]
⇠ q(y|z)
Gaussian Latent Space
Regress to estimate r(z) = E[Y |Z = z]
![Page 69: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/69.jpg)
MIMIFY - CGAN
Deep Learning based MIMIC Functions
MIMIFY - REG
GeneratorG(z,s)
s
z
DiscriminatorD(y,z)
(y,z)
[0,1]
⇠ q(y|z)
Gaussian Latent Space
Regress to estimate r(z) = E[Y |Z = z]
y = r(z)+ Gaussian Noise ⇠ q(y|z)
![Page 70: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/70.jpg)
MIMIFY - CGAN
Deep Learning based MIMIC Functions
MIMIFY - REG
GeneratorG(z,s)
s
z
DiscriminatorD(y,z)
(y,z)
[0,1]
⇠ q(y|z)
Gaussian Latent Space
Regress to estimate r(z) = E[Y |Z = z]
y = r(z)+ Gaussian Noise ⇠ q(y|z)(or, laplacian noise)
![Page 71: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/71.jpg)
Post-Nonlinear Noise Synthetic Experiments: AUROC
Experiments
![Page 72: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/72.jpg)
Flow-cytometry Data
Experiments
![Page 73: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/73.jpg)
Gene Regulatory Network Inference (DREAM)
Experiments
![Page 74: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/74.jpg)
Estimating Information Measures
![Page 75: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/75.jpg)
255075
100
255075
100
QP
n samples n samples
Estimating Kullback-Leibler Distance
Estimate DKL(P k Q) ?
![Page 76: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/76.jpg)
255075
100
255075
100
QP
n samples n samples
Estimating Kullback-Leibler Distance
Estimate DKL(P k Q) ?
Curse of dimensionality: Sample complexity O(n/log n) with n = 2d
![Page 77: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/77.jpg)
255075
100
Neural Network Approximation
255075
100
n samples ⇠ P n samples ⇠ Q
![Page 78: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/78.jpg)
255075
100
Neural Network Approximation
255075
100
n samples ⇠ P n samples ⇠ Q
Donsker-Varadhan Dual Representation:
DKL(P k Q) = supT EP [T ]� log(EQ[eT ])
![Page 79: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/79.jpg)
255075
100
MINE: Neural Network Approximation
255075
100
n samples ⇠ P n samples ⇠ Q
Donsker-Varadhan Dual Representation:
DKL(P k Q) = supT EP [T ]� log(EQ[eT ])
• T Rich NN class
• E Sample Averages
• supT Obtained via Stochastic Gradient search
* *Benghazi et al, MINE : Mutual Information * Neural Estimation, ICML 2018
![Page 80: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/80.jpg)
255075
100
MINE is Unstable to Train
255075
100
n samples ⇠ P n samples ⇠ Q
* *Benghazi et al, MINE : Mutual Information * Neural Estimation, ICML 2018
(x100) (x100) (x100)
True !(#; %)
![Page 81: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/81.jpg)
Divergence estimation via Classification
§"#$(& ' = sup,∈ℱ
/0∼2 0 [4 5 ] − log(/0∼; 0 [<, 0 ])
4∗ 5 = ?@A & 5' 5 BC@&DBEF? (RN derivative)
Classifiers can estimate f*
!"
!#
$ = 1
$ = 0
(), (+ ∼ - (), (+
(), (+ ∼ . (), (+
-(0 = )|(), (+)-(0 = 3|(), (+)
Label 0 for q(x)
Label 1 for p(x)
f* = p(l = 1 |x)p(l = 0 |x)
Plug in to DV-bound
Highly stable training! True lower bound
![Page 82: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/82.jpg)
Classifiers require calibration
Require classifiers that are calibrated => Require p(l=1|x)
Can get well-calibrated neural networks
Lakshminarayan et al, Simple and scalable predictive uncertainty estimation using deep ensembles, NeurIPS 17
![Page 83: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/83.jpg)
Mutual Information Estimation
I(X;Y ) = DKL(PXY k PXPY )
Permutated samples
Samples
Classifier-MI: use classifier to estimate I(X;Y)
Theorem-1: NeuralNet-Classifier-MI is consistent
Theorem-2: Classifier-MI is a “true” lower-bound on MI
![Page 84: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/84.jpg)
Performance: MI Estimation
!", $" ∼ & 0, 1 )) 1 , * = !,, !-, …!0 , 1 = $,, $-, …$0 ,!"⊥ $3(5 ≠ 7)
![Page 85: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/85.jpg)
Performance: Conditional MI Estimation
§ModularEstimation
§/ 0; 2 3 = / 0; 2, 3 − /(0; 3)
CGAN/CVAE/1-NN
9:, ;: :<=> 9:′, ;: :<=
> ∼ A(9|;)
Classifier-MIC:, 9:, ;: :<=
>DEF A C, 9, ; A C, ; A(9|;))C:, 9:′, ;: :<=
>
Estimate I(X;Y|Z) = DKL (pXYZ|pXZpY|Z)
![Page 86: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/86.jpg)
Performance: CMI
Model – I (Linear)
! ∼ # 0, 1 , ( ∼ ) −0.5, 0.5 -., / ∼ # (0, 0.01
1 = ! + /
Model – II (Linear)
! ∼ # 0, 1 , ( ∼ # 0, 1 -., / ∼ # 45(, 0.01
1 = ! + /
Non-Linear models :( ∼ # 6, 7-. , ! = 80 90 1 = 8: ;<=! + ;>=( + 9:
80, 8: ∼ {@ABℎ, DEF, exp(−| ⋅ |)}, 90, 9: ∼ #(0, 0.1)
![Page 87: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/87.jpg)
Performance: CMI
"# = %& ' = %&, &&&
Model-1 (linear)
![Page 88: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/88.jpg)
Performance: CMI
Model-2 (linear)
"# = %& ' = %&, &&&
![Page 89: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/89.jpg)
Performance: CMI
Model-3 (Non-linear)
![Page 90: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/90.jpg)
Performance: Conditional Indep Testing
§! ∼ # $, &'( , ) = cos ./! + 12
3 = 5cos 67! + 18 9:) ⊥ 3|!cos =) + 67! + 18 9:) ⊥ 3|!
./, 67 ∼ > 0, 1 '(, ||.|| = 1, 6 = 1,= ∼ > 0,2 , 1B∼ # 0, 0.25 .
![Page 91: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/91.jpg)
Performance: Real Data (Flow Cytometry)
§ 50 CI and 50 NI relations. § Examples –pkc ⊥ akt|raf, mek, p38, pka, jnkpip3 ⊥ pip2|p38, raf, pkc, jnk, erk, akt
§ CCMI (Mean AuROC) = 0.75CCIT (Mean AuROC) = 0.66
![Page 92: Learning Statistical Property Testersvakilian/TTI-slides/kannan.pdf · 2019-08-21 · * Sriperumbudur et al, Kernel choice and classifiability for RKHS embeddings of probability distributions,](https://reader034.fdocuments.in/reader034/viewer/2022042113/5e90075b423ece3c11367f88/html5/thumbnails/92.jpg)
Open Problems
✤ Closeness testing problems✤ Beyond DTV: Distance measure estimation using classifiers?✤ Stable trainability✤ Time-series data (Directed information estimation and testing)
✤ Statistical property testing✤ Uniformity testing