Statistical Challenges in Agent-Based Computational Modeling
Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational...
Transcript of Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational...
![Page 1: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/1.jpg)
Computational and Statistical Challengesin High Dimensional Statistical Models
Ilias Zadik
Operations Research Center (ORC), MIT
PhD Thesis Defense
Commitee: David Gamarnik (PhD advisor), Guy Bresler, Lester Mackey
June 12, 2019
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 1 / 31
![Page 2: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/2.jpg)
Introduction- Big Data Challenges
Over the recent years, the number and magnitude of available datasetshave been growing enormously.
Big impact across science:From artificial intelligence to economics to medicine and many others.
Required novel statistical and computational tools on dealing withissues such as high dimensionality.
Many open challenging theoretical questionseven for simple high dimensional statistical models!
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 2 / 31
![Page 3: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/3.jpg)
Introduction- Big Data Challenges
Over the recent years, the number and magnitude of available datasetshave been growing enormously.
Big impact across science:From artificial intelligence to economics to medicine and many others.
Required novel statistical and computational tools on dealing withissues such as high dimensionality.
Many open challenging theoretical questionseven for simple high dimensional statistical models!
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 2 / 31
![Page 4: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/4.jpg)
Introduction- Big Data Challenges
Over the recent years, the number and magnitude of available datasetshave been growing enormously.
Big impact across science:From artificial intelligence to economics to medicine and many others.
Required novel statistical and computational tools on dealing withissues such as high dimensionality.
Many open challenging theoretical questionseven for simple high dimensional statistical models!
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 2 / 31
![Page 5: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/5.jpg)
Introduction- Big Data Challenges
Over the recent years, the number and magnitude of available datasetshave been growing enormously.
Big impact across science:From artificial intelligence to economics to medicine and many others.
Required novel statistical and computational tools on dealing withissues such as high dimensionality.
Many open challenging theoretical questionseven for simple high dimensional statistical models!
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 2 / 31
![Page 6: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/6.jpg)
Thesis Overview: The Models
Two Long-Studied Stylized High Dimensional Models:
(1) High Dimensional Linear Regression Model (HDLR), [Tibshirani ’96]Recover vector of coefficients from few noisy linear samples.Motivation: Fit linear models in high dimensional data.
(2) Planted Clique Model (PC) [Jerrum ’92]Recover planted clique from a large observed network.Motivation: Community detection in large networks.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 3 / 31
![Page 7: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/7.jpg)
Thesis Overview: Contributions
HDLR (signal strength = sample size), PC (signal strength = clique size):
Under assumptions,
• Compute the exact statistical limit of the HDLR model(“All-to-Nothing Phase Transition”)• Explain computational-statistical gaps of HDLR and PC models,
through statistical-physics based methods. (”Overlap Gap Property”)•
Papers:(Gamarnik, Z. COLT ’17, AOS (major rev.) ’18+)(Gamarnik, Z. AOS (major rev.) ’18+), (Gamarnik, Z. NeurIPS ’18)(Reeves, Xu, Z. COLT ’19), (Gamarnik, Z. ’19+)
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 4 / 31
![Page 8: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/8.jpg)
Thesis Overview: Contributions
HDLR (signal strength = sample size), PC (signal strength = clique size):
Under assumptions,• Compute the exact statistical limit of the HDLR model
(“All-to-Nothing Phase Transition”)• Explain computational-statistical gaps of HDLR and PC models,
through statistical-physics based methods. (”Overlap Gap Property”)• Improved computational limit for noiseless HDLR model
using lattice basis reduction (”One Sample Suffices”)Papers:(Gamarnik, Z. COLT ’17, AOS (major rev.) ’18+)(Gamarnik, Z. AOS (major rev.) ’18+), (Gamarnik, Z. NeurIPS ’18)(Reeves, Xu, Z. COLT ’19), (Gamarnik, Z. ’19+)
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 4 / 31
![Page 9: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/9.jpg)
Thesis Overview: Contributions
HDLR (signal strength = sample size), PC (signal strength = clique size):
Under assumptions,• Compute the exact statistical limit of the HDLR model
(“All-to-Nothing Phase Transition”)• Explain computational-statistical gaps of HDLR and PC models,
through statistical-physics based methods. (”Overlap Gap Property”)• Improved computational limit for noiseless HDLR model
using lattice basis reduction (”One Sample Suffices”)Papers:(Gamarnik, Z. COLT ’17, AOS (major rev.) ’18+)(Gamarnik, Z. AOS (major rev.) ’18+), (Gamarnik, Z. NeurIPS ’18)(Reeves, Xu, Z. COLT ’19), (Gamarnik, Z. ’19+)
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 4 / 31
![Page 10: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/10.jpg)
Outline of the Talk
(1) Introduction and Thesis Overview
(2) High Dimensional Linear Regression ModelI BackgroundI Statistical Limit: All-or-Nothing PhenomenonI Computational-Statistical Gap and Overlap Gap Property
(3) Planted Clique Model and Overlap Gap Property
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 5 / 31
![Page 11: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/11.jpg)
Outline of the Talk
(1) Introduction and Thesis Overview
(2) High Dimensional Linear Regression ModelI BackgroundI Statistical Limit: All-or-Nothing PhenomenonI Computational-Statistical Gap and Overlap Gap Property
(3) Planted Clique Model and Overlap Gap Property
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 5 / 31
![Page 12: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/12.jpg)
High Dimensional Linear Regression
Let (unknown) β∗ ∈ Rp. p number of features.For data matrix X ∈ Rn×p , and noise W ∈ Rn,observe n noisy linear samples of β∗, Y = Xβ∗ + W.
Goal: Given (Y, X), recover β∗ with minimum n possible.
High-dimensional regime: n� p, p→ +∞.
n < p implies assumptions on β∗ are necessary.Reason: even if W = 0, Y = Xβ∗ underdetermined.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 6 / 31
![Page 13: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/13.jpg)
High Dimensional Linear Regression
Let (unknown) β∗ ∈ Rp. p number of features.For data matrix X ∈ Rn×p , and noise W ∈ Rn,observe n noisy linear samples of β∗, Y = Xβ∗ + W.
Goal: Given (Y, X), recover β∗ with minimum n possible.
High-dimensional regime: n� p, p→ +∞.n < p implies assumptions on β∗ are necessary.Reason: even if W = 0, Y = Xβ∗ underdetermined.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 6 / 31
![Page 14: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/14.jpg)
Assumptions on β∗ and X, W
Assumptions on β∗:
(1) β∗ is k-sparse: k non-zero coordinates, k/p→ 0, as p→ +∞.(A lot of research, e.g. Compressed Sensing, Genomics, MRI.)
(2) β∗ is binary valued: β∗ ∈ {0, 1}p. (†)
Distributional Assumptions on X, W:
(1) X ∈ Rn×p has i.i.d. N (0, 1) entries.
(2) W ∈ Rn has i.i.d. N(0,σ2
)entries.
(†) (non-trivial) simplification of well-studied β∗min := minβ∗i 6=0 |β∗i | = Θ (1) > 0
and support recovery task.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 7 / 31
![Page 15: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/15.jpg)
Assumptions on β∗ and X, W
Assumptions on β∗:
(1) β∗ is k-sparse: k non-zero coordinates, k/p→ 0, as p→ +∞.(A lot of research, e.g. Compressed Sensing, Genomics, MRI.)
(2) β∗ is binary valued: β∗ ∈ {0, 1}p. (†)
Distributional Assumptions on X, W:
(1) X ∈ Rn×p has i.i.d. N (0, 1) entries.
(2) W ∈ Rn has i.i.d. N(0,σ2
)entries.
(†) (non-trivial) simplification of well-studied β∗min := minβ∗i 6=0 |β∗i | = Θ (1) > 0
and support recovery task.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 7 / 31
![Page 16: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/16.jpg)
The Model
Setup
Let β∗ ∈ {0, 1}p be a binary k-sparse vector, k/p→ 0, as p→ +∞.For
• X ∈ Rn×p consisting of i.i.d N (0, 1) entries
• W ∈ Rn consisting of i.i.d. N (0,σ2) entries
we get n noisy linear samples of β∗, Y ∈ Rn, given by,
Y := Xβ∗ + W.
Goal: Statistical and Computational Limit
Minimum n so that given (Y, X), β∗ is (efficiently) recoverablewith probability tending to 1 as n, k, p→ +∞ (w.h.p.).
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 8 / 31
![Page 17: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/17.jpg)
The Model
Setup
Let β∗ ∈ {0, 1}p be a binary k-sparse vector, k/p→ 0, as p→ +∞.For
• X ∈ Rn×p consisting of i.i.d N (0, 1) entries
• W ∈ Rn consisting of i.i.d. N (0,σ2) entries
we get n noisy linear samples of β∗, Y ∈ Rn, given by,
Y := Xβ∗ + W.
Goal: Statistical and Computational Limit
Minimum n so that given (Y, X), β∗ is (efficiently) recoverablewith probability tending to 1 as n, k, p→ +∞ (w.h.p.).
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 8 / 31
![Page 18: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/18.jpg)
Rise of a Computational-Statistical Gap
Computational Results ([Wainwright ’09],[Fletcher et al ’11])
Set nalg = 2k log p. Assume SNR = kσ2 → +∞.
Ifn > (1 + ε)nalg
LASSO (convex relaxation) and OMP (greedy algorithm) succeed w.h.p.
Statistical Results
Let n∗ := 2k log pk/ log
(kσ2 + 1
). Assume SNR = k
σ2 → +∞.
• If n < (1 – ε)n∗ no algorithm can succeed w.h.p. [Wang et al ’10]
• For some large C > 0, if n ≥ Cn∗, MLE succeeds [Rad’ 11].
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 9 / 31
![Page 19: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/19.jpg)
Rise of a Computational-Statistical Gap
Computational Results ([Wainwright ’09],[Fletcher et al ’11])
Set nalg = 2k log p. Assume SNR = kσ2 → +∞.
Ifn > (1 + ε)nalg
LASSO (convex relaxation) and OMP (greedy algorithm) succeed w.h.p.
Statistical Results
Let n∗ := 2k log pk/ log
(kσ2 + 1
). Assume SNR = k
σ2 → +∞.
• If n < (1 – ε)n∗ no algorithm can succeed w.h.p. [Wang et al ’10]
• For some large C > 0, if n ≥ Cn∗, MLE succeeds [Rad’ 11].
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 9 / 31
![Page 20: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/20.jpg)
Rise of a Computational-Statistical Gap
Computational Results ([Wainwright ’09],[Fletcher et al ’11])
Set nalg = 2k log p. Assume SNR = kσ2 → +∞.
Ifn > (1 + ε)nalg
LASSO (convex relaxation) and OMP (greedy algorithm) succeed w.h.p.
Statistical Results
Let n∗ := 2k log pk/ log
(kσ2 + 1
). Assume SNR = k
σ2 → +∞.
• If n < (1 – ε)n∗ no algorithm can succeed w.h.p. [Wang et al ’10]
• For some large C > 0, if n ≥ Cn∗, MLE succeeds [Rad’ 11].
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 9 / 31
![Page 21: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/21.jpg)
Rise of a Computational-Statistical Gap
Computational Results ([Wainwright ’09],[Fletcher et al ’11])
Set nalg = 2k log p. Assume SNR = kσ2 → +∞.
Ifn > (1 + ε)nalg
LASSO (convex relaxation) and OMP (greedy algorithm) succeed w.h.p.
Statistical Results
Let n∗ := 2k log pk/ log
(kσ2 + 1
). Assume SNR = k
σ2 → +∞.
• If n < (1 – ε)n∗ no algorithm can succeed w.h.p. [Wang et al ’10]
• For some large C > 0, if n ≥ Cn∗, MLE succeeds [Rad’ 11].
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 9 / 31
![Page 22: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/22.jpg)
Rise of a Computational-Statistical Gap
Computational Results ([Wainwright ’09],[Fletcher et al ’11])
Set nalg = 2k log p. Assume SNR = kσ2 → +∞.
Ifn > (1 + ε)nalg
LASSO (convex relaxation) and OMP (greedy algorithm) succeed w.h.p.
Statistical Results
Let n∗ := 2k log pk/ log
(kσ2 + 1
). Assume SNR = k
σ2 → +∞.
• If n < (1 – ε)n∗ no algorithm can succeed w.h.p. [Wang et al ’10]
• For some large C > 0, if n ≥ Cn∗, MLE succeeds [Rad’ 11].
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 9 / 31
![Page 23: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/23.jpg)
Pictorial Representation
Figure: Computational-Statistical Gap
Contributions
(1) n∗ = 2k log pk/ log
(kσ2 + 1
)is the exact statistical limit
(All-or-Nothing Phase Transition).
(2) nalg = 2k log p is the phase transition point for (landscape) hardness(Overlap Gap Property Phase Transition).
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 10 / 31
![Page 24: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/24.jpg)
Pictorial Representation
Figure: Computational-Statistical Gap
Contributions
(1) n∗ = 2k log pk/ log
(kσ2 + 1
)is the exact statistical limit
(All-or-Nothing Phase Transition).
(2) nalg = 2k log p is the phase transition point for (landscape) hardness(Overlap Gap Property Phase Transition).
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 10 / 31
![Page 25: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/25.jpg)
Outline of the Talk
(1) Introduction and Thesis Overview
(2) High Dimensional Linear Regression ModelI BackgroundI Statistical Limit: All-or-Nothing PhenomenonI Computational-Statistical Gap and Overlap Gap Property
(3) Planted Clique Model and Overlap Gap Property
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 11 / 31
![Page 26: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/26.jpg)
Maximum Likelihood Estimator (MLE)
Y = Xβ∗ + W with W iid N(0,σ2) entries.
The MLE
β̂MLE is the optimal solution of least-squares
(LS) : minβ∈{0,1}p,‖β‖0=k
‖Y – Xβ‖2
[Rad ’11]: success with Cn∗ samples.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 12 / 31
![Page 27: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/27.jpg)
All or Nothing Phenomenon- Result
Definition
For β ∈ {0, 1}p, k-sparse we define
overlap(β) := |Support(β∗) ∩ Support(β)|.
Theorem (“All or Nothing Phase Transition” (GZ ’17), (RXZ ’19))
Let ε > 0 be arbitrary. Assume k� p and k/σ2 ≥ C(ε) > 0,
• If n > (1 + ε) n∗, then
1
koverlap
(β̂MLE
)→ 1, whp, as n, p, k→ +∞.
• If n < (1 – ε) n∗, and k� √p, then ∀β̂ = β̂ (Y, X)
1
koverlap
(β̂)→ 0, whp, as n, p, k→ +∞.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 13 / 31
![Page 28: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/28.jpg)
All or Nothing Phenomenon- Result
Definition
For β ∈ {0, 1}p, k-sparse we define
overlap(β) := |Support(β∗) ∩ Support(β)|.
Theorem (“All or Nothing Phase Transition” (GZ ’17), (RXZ ’19))
Let ε > 0 be arbitrary. Assume k� p and k/σ2 ≥ C(ε) > 0,
• If n > (1 + ε) n∗, then
1
koverlap
(β̂MLE
)→ 1, whp, as n, p, k→ +∞.
• If n < (1 – ε) n∗, and k� √p, then ∀β̂ = β̂ (Y, X)
1
koverlap
(β̂)→ 0, whp, as n, p, k→ +∞.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 13 / 31
![Page 29: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/29.jpg)
All or Nothing Phenomenon - Comments
An “All or Nothing” phase transition!
• With n ≥ (1 + ε)n∗,MLE recovers all but o(1)-fraction of the support.
• With n ≤ (1 – ε)n∗,every estimator recovers at most o(1)-fraction of the support.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 14 / 31
![Page 30: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/30.jpg)
All or Nothing Phenomenon - Comments
An “All or Nothing” phase transition!
• With n ≥ (1 + ε)n∗,MLE recovers all but o(1)-fraction of the support.
• With n ≤ (1 – ε)n∗,every estimator recovers at most o(1)-fraction of the support.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 14 / 31
![Page 31: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/31.jpg)
All or Nothing Theorem - Proof Sketch
Negative Result for n ≤ (1 – ε)n∗:
• Step 1:“Impossibility of Testing”: Data Look Like Pure Noise.
Let P the law of (Y = Xβ∗ + W, X),and Q the law of (Y = λW, X), for some λ > 0.We show,
DKL (P||Q)→ 0, as p→ +∞.
• Step 2:“Impossibility of Testing” implies “Impossibility of Estimation”.We show that for any estimator β̂ = β̂ (Y, X):
overlap(β̂)
/k ≤(
1 + σ2/k)
DKL (P||Q) .
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 15 / 31
![Page 32: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/32.jpg)
All or Nothing Theorem - Proof Sketch
Negative Result for n ≤ (1 – ε)n∗:
• Step 1:“Impossibility of Testing”: Data Look Like Pure Noise.Let P the law of (Y = Xβ∗ + W, X),and Q the law of (Y = λW, X), for some λ > 0.
We show,DKL (P||Q)→ 0, as p→ +∞.
• Step 2:“Impossibility of Testing” implies “Impossibility of Estimation”.We show that for any estimator β̂ = β̂ (Y, X):
overlap(β̂)
/k ≤(
1 + σ2/k)
DKL (P||Q) .
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 15 / 31
![Page 33: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/33.jpg)
All or Nothing Theorem - Proof Sketch
Negative Result for n ≤ (1 – ε)n∗:
• Step 1:“Impossibility of Testing”: Data Look Like Pure Noise.Let P the law of (Y = Xβ∗ + W, X),and Q the law of (Y = λW, X), for some λ > 0.We show,
DKL (P||Q)→ 0, as p→ +∞.
• Step 2:“Impossibility of Testing” implies “Impossibility of Estimation”.We show that for any estimator β̂ = β̂ (Y, X):
overlap(β̂)
/k ≤(
1 + σ2/k)
DKL (P||Q) .
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 15 / 31
![Page 34: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/34.jpg)
All or Nothing Theorem - Proof Sketch
Negative Result for n ≤ (1 – ε)n∗:
• Step 1:“Impossibility of Testing”: Data Look Like Pure Noise.Let P the law of (Y = Xβ∗ + W, X),and Q the law of (Y = λW, X), for some λ > 0.We show,
DKL (P||Q)→ 0, as p→ +∞.
• Step 2:“Impossibility of Testing” implies “Impossibility of Estimation”.
We show that for any estimator β̂ = β̂ (Y, X):
overlap(β̂)
/k ≤(
1 + σ2/k)
DKL (P||Q) .
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 15 / 31
![Page 35: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/35.jpg)
All or Nothing Theorem - Proof Sketch
Negative Result for n ≤ (1 – ε)n∗:
• Step 1:“Impossibility of Testing”: Data Look Like Pure Noise.Let P the law of (Y = Xβ∗ + W, X),and Q the law of (Y = λW, X), for some λ > 0.We show,
DKL (P||Q)→ 0, as p→ +∞.
• Step 2:“Impossibility of Testing” implies “Impossibility of Estimation”.We show that for any estimator β̂ = β̂ (Y, X):
overlap(β̂)
/k ≤(
1 + σ2/k)
DKL (P||Q) .
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 15 / 31
![Page 36: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/36.jpg)
Summary for n∗ contribution
Sharp Information-Theoretic Limit n∗
(1 + ε)n∗ samples MLE (asymptotically) succeeds.(1 – ε)n∗ samples all estimators (asymptotically) strongly fail.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 16 / 31
![Page 37: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/37.jpg)
Outline of the Talk
(1) Introduction and Thesis Overview
(2) High Dimensional Linear Regression ModelI BackgroundI Statistical Limit: All-or-Nothing PhenomenonI Computational-Statistical Gap and Overlap Gap Property
(3) Planted Clique Model and Overlap Gap Property
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 17 / 31
![Page 38: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/38.jpg)
Computational-Statistical Gap
Contribution through Landscape Analysis
nalg is a phase transition point for certain Overlap Gap Property (OGP)on the space of binary k-sparse vectors (origin in spin glass theory).Conjecture computational hardness!
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 18 / 31
![Page 39: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/39.jpg)
Computational-Statistical Gap
Contribution through Landscape Analysis
nalg is a phase transition point for certain Overlap Gap Property (OGP)on the space of binary k-sparse vectors (origin in spin glass theory).Conjecture computational hardness!
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 18 / 31
![Page 40: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/40.jpg)
Computational Hardness: A Spin Glass Perspective
Computational gaps appear frequently in random environments
(1) randoms CSPs,such as random-k-SAT (e.g. [MMZ ’05], [ACORT ’11])
(2) average-case combinatorial opt problemssuch as max-independent set in ER graphs (e.g. [GS ’17], [RV ’17])
Between easy and hard regime there is an “abrupt change in thegeometry of the space of (near-optimal) solutions” [ACO ’08].
(Vague) Strategy of Studying the Geometry
Study realizable overlap sizes between “near-optimal” solutions.Algorithms appear to work as long as there are no gaps in the overlaps.
Overlap Gap Property, Shattering, Clustering, Free Energy Wells etc
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 19 / 31
![Page 41: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/41.jpg)
Computational Hardness: A Spin Glass Perspective
Computational gaps appear frequently in random environments
(1) randoms CSPs,such as random-k-SAT (e.g. [MMZ ’05], [ACORT ’11])
(2) average-case combinatorial opt problemssuch as max-independent set in ER graphs (e.g. [GS ’17], [RV ’17])
Between easy and hard regime there is an “abrupt change in thegeometry of the space of (near-optimal) solutions” [ACO ’08].
(Vague) Strategy of Studying the Geometry
Study realizable overlap sizes between “near-optimal” solutions.Algorithms appear to work as long as there are no gaps in the overlaps.
Overlap Gap Property, Shattering, Clustering, Free Energy Wells etc
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 19 / 31
![Page 42: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/42.jpg)
Computational Hardness: A Spin Glass Perspective
Computational gaps appear frequently in random environments
(1) randoms CSPs,such as random-k-SAT (e.g. [MMZ ’05], [ACORT ’11])
(2) average-case combinatorial opt problemssuch as max-independent set in ER graphs (e.g. [GS ’17], [RV ’17])
Between easy and hard regime there is an “abrupt change in thegeometry of the space of (near-optimal) solutions” [ACO ’08].
(Vague) Strategy of Studying the Geometry
Study realizable overlap sizes between “near-optimal” solutions.Algorithms appear to work as long as there are no gaps in the overlaps.
Overlap Gap Property, Shattering, Clustering, Free Energy Wells etc
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 19 / 31
![Page 43: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/43.jpg)
Computational Hardness: A Spin Glass Perspective
Computational gaps appear frequently in random environments
(1) randoms CSPs,such as random-k-SAT (e.g. [MMZ ’05], [ACORT ’11])
(2) average-case combinatorial opt problemssuch as max-independent set in ER graphs (e.g. [GS ’17], [RV ’17])
Between easy and hard regime there is an “abrupt change in thegeometry of the space of (near-optimal) solutions” [ACO ’08].
(Vague) Strategy of Studying the Geometry
Study realizable overlap sizes between “near-optimal” solutions.Algorithms appear to work as long as there are no gaps in the overlaps.
Overlap Gap Property, Shattering, Clustering, Free Energy Wells etc
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 19 / 31
![Page 44: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/44.jpg)
The Overlap Gap Property (OGP) for Linear Regression
“Near-optimal solutions” {β ∈ {0, 1}p : ‖β‖0 = k, “small” ‖Y – Xβ‖2}.
Idea: Study overlaps between β and β∗.overlap(β) = |Support(β) ∩ Support(β∗)|.
The OGP (informally)
The set of β ′s with “small” ‖Y – Xβ‖2 partitions inone group where β have low overlap with the ground truth β∗ andthe other group where β have high overlap with the ground truth β∗.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 20 / 31
![Page 45: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/45.jpg)
The Overlap Gap Property (OGP) for Linear Regression
“Near-optimal solutions” {β ∈ {0, 1}p : ‖β‖0 = k, “small” ‖Y – Xβ‖2}.Idea: Study overlaps between β and β∗.overlap(β) = |Support(β) ∩ Support(β∗)|.
The OGP (informally)
The set of β ′s with “small” ‖Y – Xβ‖2 partitions inone group where β have low overlap with the ground truth β∗ andthe other group where β have high overlap with the ground truth β∗.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 20 / 31
![Page 46: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/46.jpg)
The Overlap Gap Property for Linear Regression-definition
For r > 0, set Sr := {β ∈ {0, 1}p : ‖β‖0 = k, n–12 ‖Y – Xβ‖2 < r}.
Definition (The Overlap Gap Property)
The linear regression problem satisfies OGP if there exists r > 0 and0 < ζ1 < ζ2 < 1 such that
(a) For every β ∈ Sr,
1
koverlap (β) < ζ1 or
1
koverlap (β) > ζ2.
(b) Both the sets
Sr ∩ {β :1
koverlap (β) < ζ1} and Sr ∩ {β :
1
koverlap (β) > ζ2}
are non-empty.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 21 / 31
![Page 47: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/47.jpg)
OGP Phase Transition at Θ(nalg)
Theorem (GZ ’17a), (GZ ’17b)
Suppose k ≤ exp(√
log p). There exists C > 1 > c > 0 such that,
• If n∗ < n < cnalg then w.h.p. OGP holds.
• If n > Cnalg then w.h.p. OGP does not hold.
Figure: n < cnalg Figure: n > Cnalg
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 22 / 31
![Page 48: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/48.jpg)
OGP Phase Transition at Θ(nalg)
Theorem (GZ ’17a), (GZ ’17b)
Suppose k ≤ exp(√
log p). There exists C > 1 > c > 0 such that,
• If n∗ < n < cnalg then w.h.p. OGP holds.
• If n > Cnalg then w.h.p. OGP does not hold.
OGP coincides with the failure ofconvex relaxation and compressed sensing methods!
Figure: n∗ < n < cnalg Figure: n > Cnalg
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 22 / 31
![Page 49: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/49.jpg)
OGP and Local Search
Local Step: β → β ′ if dH(β,β ′) = 2. E.g.
∗01∗
→∗10∗
(LS): minβ∈{0,1}p,‖β‖0=k ‖Y – Xβ‖2.
Corollary: Local Search Barrier [GZ’17a]
Under OGP, there are low-overlap local minima in (LS).If n < cnalg, greedy local-search algorithm fails (worst-case) w.h.p.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 23 / 31
![Page 50: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/50.jpg)
OGP and Local Search
Local Step: β → β ′ if dH(β,β ′) = 2. E.g.
∗01∗
→∗10∗
(LS): minβ∈{0,1}p,‖β‖0=k ‖Y – Xβ‖2.
Corollary: Local Search Barrier [GZ’17a]
Under OGP, there are low-overlap local minima in (LS).If n < cnalg, greedy local-search algorithm fails (worst-case) w.h.p.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 23 / 31
![Page 51: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/51.jpg)
OGP and Local Search
Local Step: β → β ′ if dH(β,β ′) = 2. E.g.
∗01∗
→∗10∗
(LS): minβ∈{0,1}p,‖β‖0=k ‖Y – Xβ‖2.
Corollary: Local Search Barrier [GZ’17a]
Under OGP, there are low-overlap local minima in (LS).If n < cnalg, greedy local-search algorithm fails (worst-case) w.h.p.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 23 / 31
![Page 52: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/52.jpg)
OGP and Local Search
Theorem (GZ ’17b)
If n > Cnalg, the only local minimum in (LS) is β∗ whp
and greedy local search algorithm succeeds in O(k/σ2) iterations whp.
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 24 / 31
![Page 53: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/53.jpg)
Summary of Contribution
Sharp Information-Theoretic Limit n∗
(1 + ε)n∗ samples MLE (asymptotically) succeeds.(1 – ε)n∗ samples all estimators (asymptotically) strongly fail.
OGP Phase Transition at nalg
n < cnalg OGP holds and n > Cnalg OGP does not hold.Computational Hardness conjectured!
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 25 / 31
![Page 54: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/54.jpg)
Outline of the Talk
(1) Introduction and Thesis Overview
(2) High Dimensional Linear Regression ModelI BackgroundI Statistical Limit: All-or-Nothing PhenomenonI Computational-Statistical Gap and Overlap Gap Property
(3) Planted Clique Model and Overlap Gap Property
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 26 / 31
![Page 55: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/55.jpg)
The Planted Clique Model
The Planted Clique Model [Jerrum ’92]
Graph Generating Assumptions:
• Stage 1: G0 is an Erdos-Renyi G(n, 1/2):n-vertex undirected graph, each edge appears w.p. 1/2.
• Stage 2: k out of the n vertices of G0 are chosen u.a.r. to form ak-vertex clique, PC. Call G the final graph.
Goal: Recover PC from observing G.Question: For how small k = kn can we recover?Statistical limit + Computational limit.
n = 7, k = 3, G0 (left) and G (right) :
Figure: Figure:
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 27 / 31
![Page 56: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/56.jpg)
The Planted Clique Model-Literature
Literature:• Statistical Limit (unique k-clique): k = (2 + ε) log2 n, for any ε > 0.• (Apparent) Computational Limit: k = c
√n, for any c > 0.
[AKS’98],[FR’10],[DM’13],[DGGP’14]
Long-studied comp-stats gap [BR’13], [BHK+’16], [BBH’18]
Question: Is there an OGP phase transition around k =√
n?
Figure: G0,n = 7, k = 3
Figure: G,n = 7, k = 3
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 28 / 31
![Page 57: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/57.jpg)
The Planted Clique Model-Literature
Literature:• Statistical Limit (unique k-clique): k = (2 + ε) log2 n, for any ε > 0.• (Apparent) Computational Limit: k = c
√n, for any c > 0.
[AKS’98],[FR’10],[DM’13],[DGGP’14]
Long-studied comp-stats gap [BR’13], [BHK+’16], [BBH’18]
Question: Is there an OGP phase transition around k =√
n?
Figure: G0,n = 7, k = 3
Figure: G,n = 7, k = 3
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 28 / 31
![Page 58: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/58.jpg)
The Planted Clique Model-Literature
Literature:• Statistical Limit (unique k-clique): k = (2 + ε) log2 n, for any ε > 0.• (Apparent) Computational Limit: k = c
√n, for any c > 0.
[AKS’98],[FR’10],[DM’13],[DGGP’14]
Long-studied comp-stats gap [BR’13], [BHK+’16], [BBH’18]Question: Is there an OGP phase transition around k =
√n?
Figure: G0,n = 7, k = 3
Figure: G,n = 7, k = 3
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 28 / 31
![Page 59: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/59.jpg)
The Overlap Gap Property for Planted Clique: Results
• Focus on subgraphs of G of fixed vertex size (”k-sparse binary β”)with many edges, dense, (”small error ‖Y – Xβ‖2”)and study their overlap with PC (”overlap with β∗”) .OGP: dense subgraphs have either high or low overlap with PC.
• Strong evidence for OGP phase transition at k =√
n.(Possible explanation for a long-studied hardness!).• Proof OGP appears if k ≤ n0.0917.• Assumption 1:
Concentration of the value of k-densest subgraph problem of G(n, 12).
Known for k = Θ (log n) [BBSV’18], proven for k ≤ n0.0917..[GZ’19],conjectured for all k = o
(√n).
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 29 / 31
![Page 60: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/60.jpg)
The Overlap Gap Property for Planted Clique: Results
• Focus on subgraphs of G of fixed vertex size (”k-sparse binary β”)with many edges, dense, (”small error ‖Y – Xβ‖2”)and study their overlap with PC (”overlap with β∗”) .OGP: dense subgraphs have either high or low overlap with PC.• Strong evidence for OGP phase transition at k =
√n.
(Possible explanation for a long-studied hardness!).• Proof OGP appears if k ≤ n0.0917.
• Assumption 1:Concentration of the value of k-densest subgraph problem of G(n, 12).
Known for k = Θ (log n) [BBSV’18], proven for k ≤ n0.0917..[GZ’19],conjectured for all k = o
(√n).
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 29 / 31
![Page 61: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/61.jpg)
The Overlap Gap Property for Planted Clique: Results
• Focus on subgraphs of G of fixed vertex size (”k-sparse binary β”)with many edges, dense, (”small error ‖Y – Xβ‖2”)and study their overlap with PC (”overlap with β∗”) .OGP: dense subgraphs have either high or low overlap with PC.• Strong evidence for OGP phase transition at k =
√n.
(Possible explanation for a long-studied hardness!).• Proof OGP appears if k ≤ n0.0917.• Assumption 1:
Concentration of the value of k-densest subgraph problem of G(n, 12).
Known for k = Θ (log n) [BBSV’18], proven for k ≤ n0.0917..[GZ’19],conjectured for all k = o
(√n).
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 29 / 31
![Page 62: Computational and Statistical Challenges in High ...izadik/files/PhDThesisDefense.pdfComputational and Statistical Challenges in High Dimensional Statistical Models Ilias Zadik Operations](https://reader035.fdocuments.in/reader035/viewer/2022062604/5fbc9b7f28e2df5d167ae989/html5/thumbnails/62.jpg)
Thesis Overview: Contributions
HDLR (signal strength = sample size), PC (signal strength = clique size):
Under assumptions,• Compute the exact statistical limit of the HDLR model
(“All-to-Nothing Phase Transition”)• Explain computational-statistical gaps of HDLR and PC models,
through statistical-physics based methods. (”Overlap Gap Property”)• Improved computational limit for noiseless HDLR model
using lattice basis reduction (”One Sample Suffices”)Papers:(Gamarnik, Z. COLT ’17, AOS (major rev.) ’18+)(Gamarnik, Z. AOS (major rev.) ’18+), (Gamarnik, Z. NeurIPS ’18)(Reeves, Xu, Z. COLT ’19), (Gamarnik, Z. ’19+)
Ilias Zadik (ORC) Challenges in High Dimensional Statistics June 12, 2019 30 / 31