Statistical Inference and Random...

Statistical Inference and Random Matrices

N.S. Witte

Institute of Fundamental SciencesMassey University

New Zealand

5-12-2017

Joint work with Peter Forrester

6th Wellington Workshop in Probabilityand Mathematical Statistics

4-6 December 2017

N.S. Witte Statistical Inference and Random Matrices 1-1

Applications

Historical Origins:

Integrals over Classical Groups U(N),O(N), Sp(2N) Hurwitz 1897, Haar 1933

Mathematical Statistics: Wishart 1928, James 1954-64, Constantine 1964, Mathai 1997

Quantisation of Classically Chaotic Systems: Wigner 1955, 1958

Contemporary Applications:

Principal Component Analysis, sample covariance matrices, Wishart matrices, null andnon-null SCM

mathematical finance, cross correlations of financial data, sample correlation matrices,

Polynuclear growth models, random permutations, last passage percolation, queuing models,

Biogeographic pattern of species nested-ness, ordered binary presence-absence matrices,

distribution of mutation fitness effects across species, Fisher’s geometrical model,

complex networks modeled by random graphs, e.g. adjacency matrices

data analysis and statistical learning

Stable signal recovery from incomplete and inaccurate measurements

Compressed sensing, best k-term approximation, n-widths

Wireless communication, antenna networks,

quantum entanglement

quantum chaos, semi-classical approximation

quantum transport in mesoscopic systems

Applications

Historical Origins:

Integrals over Classical Groups U(N),O(N), Sp(2N) Hurwitz 1897, Haar 1933

Mathematical Statistics: Wishart 1928, James 1954-64, Constantine 1964, Mathai 1997

Quantisation of Classically Chaotic Systems: Wigner 1955, 1958

Contemporary Applications:

Principal Component Analysis, sample covariance matrices, Wishart matrices, null andnon-null SCM

mathematical finance, cross correlations of financial data, sample correlation matrices,

Polynuclear growth models, random permutations, last passage percolation, queuing models,

Biogeographic pattern of species nested-ness, ordered binary presence-absence matrices,

distribution of mutation fitness effects across species, Fisher’s geometrical model,

complex networks modeled by random graphs, e.g. adjacency matrices

data analysis and statistical learning

Stable signal recovery from incomplete and inaccurate measurements

Compressed sensing, best k-term approximation, n-widths

Wireless communication, antenna networks,

quantum entanglement

quantum chaos, semi-classical approximation

quantum transport in mesoscopic systems

What is a random matrix?

E.g. Gaussian Orthogonal Ensembles of Random Matrices: GOE aka Gaussian Wigner matrixi.i.d random variables

xj,j ∼ N[0, 1]

xj,k ∼ N[0, 1√2

Construct n × n real, symmetric matrix X = (xj,k )nj,k=1

Joint p.d.f for the elements

P(X ) =1

n∏j=1

e− 1

2x2j,j

∏1≤j<k≤n

e−x2

j,k =1

∏1≤j,k≤n

e− 1

2x2j,k =

Cne−

12Tr(X2)

Invariance under orthogonal transformations X 7→ OXO†, OO† = I

Spectral Decomposition of a Matrix

n × n Real Symmetric Matrices X = (xj,k )1≤j,k≤n

Eigenvalue Analysis λ1, . . . , λn

X = OΛO†

Λ = diag(λ1, . . . , λn), Orthogonal Eigenvectors O = (O1, . . . ,On), OO† = I

Volume form(dX ) = ∧n

j=1dxj,j ∧1≤j<k≤n dxj,k

Change of 12 n(n + 1) variables {xj,k} 7→ {λj ,Oj} Jacobian

(dX ) = (O†dO)∏

1≤j<k≤n

|λj − λk | ∧nj=1 dλj

n × m Real Matrices XSingular Value Decomposition σ1, . . . , σm

X = OΣP†

Σ = diag(σ1, . . . , σm) ∈ Rn×m, Orthogonal O ∈ Rn×n, P ∈ Rm×m

Gram-Schmidt orthogonalisation, QR, LU, Cholesky, Hessenberg Decompositions

Putting it altogether: GOE or Gaussian Wigner Matrices

Joint p.d.f for eigenvalues

P(λ) =1

n∏j=1

e− 1

∏1≤j<k≤n

|λj − λk |1, λj ∈ R

repulsion parameter, Dyson index, β = 1 in Stieltjes picture, ”Log-gas”

Hermite weight, one of the classical orthogonal polynomial weights

normalisation is Selberg integral

Principal Component Analysis

X ∈ Fn×p (F = R,C) with

X = {x (j)k } j=1,...,n

k=1,...,p,

p = # of variables,

n = # of data points

p × p Covariance matrix

A = X†X =

n∑j=1

x(j)k1

x(j)k2

k1=1,...,pk2=1,...,p

A is a Wishart matrix if xj,k are i.i.d random variables drawn from N[0, 1]

Joint eigenvalue p.d.f.

p∏k=1

λβa/2k e−βλk/2

∏1≤j<k≤p

|λj − λk |β , λk ∈ [0,∞)

i.e. Laguerre weight, a = n − p + 1− 2/β, n ≥ p

{1 Real,R2 Complex,C

Translation

Single-Wishart Null hypothesisµ = 0,Σ = I

p − variate

degrees of freedom = n

Laguerre LβE

e−β2λλβ2

(n−p+1−2/β)

Double-Wishart Null hypothesisµ1 = 0,Σ1 = I , µ2 = 0,Σ2 = I

p − 1st variate

q − 2nd variate

degrees of freedom = n

Jacobi JβE

(1−λ)β2

(q−p+1−2/β)(1+λ)β2

(n−q−p+1−2/β)

Global Properties of the spectra: Empirical Spectral Distribution

n →∞ Wigner semi-circle Law for the global density of eigenvalues

ρ(λ) =1

√2n − λ2

n, p →∞ Marchenko-Pastur Law for eigenvalues of X†X

ρ(λ) =1

√(λ− nx−)(nx+ − λ)

x± =(c−1/2 ± 1

)2, c =

n≤ 1

GUE, GOE n = 24

2√nρ(2√nx)

- 1.5 - 1.0 - 0.5 0.5 1.0 1.5

Global Properties of the spectra: Empirical Spectral Distribution

Prop. 2.2 and Lemma 4.1 of Haagerup and Thorbjørnsen [2012]

Theorem

The eigenvalue density ρ(x) for the GUE satisfies the third order, homogeneous ordinarydifferential equation

ρ′′′

+ (4n − x2)ρ′

+ xρ = 0

subject to certain boundary conditions, for fixed n as x → ±∞.

W & Forrester [2013]

Theorem

The eigenvalue density ρ(x) for the GOE satisfies the fifth order, linear homogeneous ordinarydifferential equation

−4ρ(V ) + 5(x2−4n+ 2)ρ′′′−6xρ′′+ [−x4 + (8n−4)x2−16n2 + 16n+ 2]ρ′+x(x2−4n+ 2)ρ = 0

again subject to certain boundary conditions, for fixed n as x → ±∞.

The many ways to look at these problems

Statistic on Spec(X ) = {λ1, . . . λn} Regime

density ρ(λ) Global spectrum

m-point correlation functions ρm(λ1, . . . , λm)

Linear Spectral Statistics,∑n

i=1 f (λi ) Hypothesis tests, Distribution theory

Extreme Eigenvalues, λmax , λmin Large deviations, Spectrum edge

Eigenvalue Spacings, λi+1 − λi Bulk or Edge spectrum

Spectral Gaps ∀j, λj /∈ J ⊂ Spec(X )

Condition Numbers, λmaxλmin

Determinants, Characteristic Polynomials∏n

i=1(ζ − λi )

Tools one can use

Approach Primary Object

Moment Methods

Concentration Inequalities

Large Deviation Theory Potential problems and equilibrium measures

Free Probability Stieltjes transform

Loop Equations Stieltjes transform, Resolvents

Hypergeometric Functions of Matrix Argument Zonal and Jack Polynomials

Orthogonal/Bi-orthogonal Polynomials Riemann-Hilbert asymptotics

Integrable Systems, Painleve equations Gap probabilities, Characteristic polynomial

The Spectrum Edge: Soft Edge Tracy-Widom Distribution F2(s), β = 2

Gap probability i.e. probability of no eigenvalues of n × n GUE in (t,∞) denoted by

E2,n(0; (t,∞))

Shift and scale t as

t =√

√2n1/6

Take limit n →∞ of Gap probability

limn→∞

P[λmax −

√2n1/6

≤ s] = F2(s)

Tracy-Widom [1994] Fredholm determinant

F2(s) = det(1− K2)L2(s,∞)

where the integral operator K2 has the kernel, the Airy Kernel

K2(x, y) =Ai(x)Ai′(y)− Ai(y)Ai′(x)

x − y

Tracy-Widom Distribution F2(s) and the second Painleve transcendent PII

The PII transcendent q(t;α) then satisfies the standard form of the second Painleve equation

dt2q = 2q3 + tq + α, α ∈ C

Gap probability, i.e. Tracy-Widom distribution F2(s), is

ESoft Edge2 (0; (s,∞)) = exp

(−∫ ∞s

dt (t − s)q(t)2)

where q(t) is the α = 0 solution for PII with the boundary condition

q(t) ∼t→∞

Hastings and McLeod solution, see Hastings, S. P. and McLeod, J. B.A boundary value problem associated with the second Painleve transcendent and theKorteweg-Vries equation. Arch. Rational Mech. Anal., 1980, 73(1), 31-51

Forrester & W [2012], Tails

log ESoft Edgeβ (n; (s,∞)) ∼

s→−∞n�|s|

−β|s|3

3|s|3/2

(βn +

2− 1

)2)]log |s|−3/4

Universality Hypothesis

Extension of central limit theorems and the Gaussian distribution

Conjecture

As the rank of the random matrix ensembles n →∞, with or without a similar scaling of otherparameters, the ensembles

have well-defined limits,

these limits define new distributions which are insensitive to details of the finite model otherthan their symmetry class, β

are characterised by the solutions of integrable dynamical systems, e.g. of the integrablehierarchies such as the Toda lattice, K-dV or K-P systems, or more precisely by Painleve typeequations.

Proven Cases:

– ”Four Moments Theorems” in Tao, T., Vu, V.,Random covariance matrices: universality of local statistics of eigenvalues.Ann. Probab., 2012, 40, 1285–1315.

– ”Riemann-Hilbert Approach” in Deift, P., Gioev, D.Random Matrix Theory: Invariant Ensembles and Universality. Amer. Math. Soc., 2009.

Beyond the Null case: Example 1, Appearance of ”phase transition phenomenon”

Baik, J., Ben Arous, G. and Peche, S.Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices.Ann. Probab., 2005, 33(5), 1643–1697

β = 2, λ1 is the largest eigenvalue of the sample covariance matrix, γ2 = np , `1 ≥ `2

Population covariance matrix

Ip−2

As p, n →∞ either

[λ1 − (1 + γ−1)2)]

(1 + γ)4/3n2/3 ≤ x

F2(x), 0 < `1, `2 < 1 + γ−1

F 21 (x), 0 < `2 < 1 + γ−1 = `1

F (x) `1 = `2 = 1 + γ−1

[λ1 − `1(1 +γ−2

`1 − 1)]

√1− γ−2/(`1 − 1)2

)→{G1(x), `1 > 1 + γ−1, `1 > `2

G2(x), `1 = `2 > 1 + γ−1

Beyond the Null case: The moral of the story

As n →∞ and p � n PCA works: Sample CM → Population CM

As n, p →∞ and p = O(n) ???

Issue: How close are the eigenvalues of the sample PCA to the population PCA?

”Even though for n finite there is no phase transition (n →∞) as a function of n orsome other parameter the eigenvector of the sample PCA (e.g associated witheigenvalue λ1) may exhibit a sharp loss of tracking suddenly losing its relation to theeigenvector of the population PCA.”

Nadler, B. Finite sample approximation results for principal component analysis: a matrixperturbation approach. Ann. Statist. 36 (2008), no. 6, 27912817.

Beyond the Null case: Example 2, ”Spiked Population models”

Johnstone, I.M.On the distribution of the largest eigenvalue in principal components analysisAnn. Statist., 2001, 29(2), 295–327

A model:xi = µ + Aui + σzi , i = 1, . . . , n

p number of variablesM number of spikesxi observation p-vectorµ p-vector of meansA p ×M factor loading matrixui M-vector of random factorszi p-vector of white noise

Population Covariance Matrix

Σ =M∑j=1

`2j qjq

Tj + σ

Φ M ×M covariance matrix of ui`j , qj , j = 1, . . . ,M eigenvalues/vectors of AΦAT

Ma, Z. Sparse principal component analysis and iterative thresholding.Ann. Statist. 41 (2013), no. 2, 772801.

Painleve Transcendents

P-I d2y

dx2= 6y2 + x

P-II d2y

dx2= 2y3+xy+ν

P-III’d2y

4x2(γy+α) +

4y, γ = 4, δ = −4

P-IVd2y

2y3 + 4xy2 + 2(x2−α)y +

y − 1

+(y − 1)2

{αy +

x+δy(y + 1)

y − 1

, δ = −1/2

y − 1+

y − x

x − 1+

y − x

+y(y − 1)(y − x)

x2(x − 1)2

y2+γ(x − 1)

(y − 1)2+δx(x − 1)

(y − x)2

}N.S. Witte Statistical Inference and Random Matrices 1-20

Painleve Equations: Digital Library of Mathematical Functions, Chapter 32,http://dlmf.nist.gov/

ClassicalSolution

PAffineWeylGroup

— PI

Airy: Ai(x),Bi(x) PII A1

Bessel: Iν(x),Kν(x) PIII B2

Hermite-Weber: Dν(x) PIV A2

Confluent Hypergeometric: 1F1(a, c; x) PV A3

Gauss Hypergeometric: 2F1(a, b; c; x) PVI D4

Reference Monographs/Reviews

Muirhead, R. J., Aspects of Multivariate Statistical Theory, Wiley Series in Probability andMathematical Statistics, John Wiley & Sons Inc., 1982,

Bai, Z. and Silverstein, J. W., Spectral Analysis of Large Dimensional Random Matrices, 2ndEdition, Springer, New York, 2010,

Mehta, M. L., Random Matrices, Pure and Applied Mathematics (Amsterdam), Vol. 142, 3rdEdition, Elsevier/Academic Press, Amsterdam, 2004,

Anderson, G. W. and Guionnet, A. and Zeitouni, O., An Introduction to Random Matrices,Cambridge University Press, Cambridge, 2010,

Forrester, P. J., Log Gases and Random Matrices, Princeton University Press, 2010

Akemann, G. and Baik, J. and Di Francesco, P., Handbook on Random Matrix Theory,Oxford University Press, 2011,

Couillet, R. and Debbah, M. , Random Matrix Methods for Wireless Communications,Cambridge University Press, 2011

Johnstone, I. M., High Dimensional Statistical Inference and Random Matrices, InternationalCongress of Mathematicians. Vol. I, Eur. Math. Soc., Zurich, 2006, pp. 307-333

Paul, D. and Aue, A, Random matrix theory in statistics: A review J. Statist. Plann.Inference, 150, 1–29, 2014

Statistical Inference and Random...

Documents

Transcript of Statistical Inference and Random...

STATISTICAL INFERENCE PART V

PARAMETRIC STATISTICAL INFERENCE

6.2 Statistical inference

Grossman Statistical Inference

Statistical Inference - Lecture 1

Statistical Inference: Introduction

CONSULTING, STATISTICAL 141 MULTIVARIATE ANALYSIS STATISTICAL INFERENCE) · 2016-03-23 · MULTIVARIATE ANALYSIS STATISTICAL INFERENCE) CONSULTING, STATISTICAL DEFINITION Statistical

3 . Statistical Inference

Introduction to Statistical Inference - Bioinformatics Grazgenome.tugraz.at/MedicalInformatics2/Statistical... · 2015-01-26 · Statistical Inference • The target of statistical

STATS 200: Introduction to Statistical Inference · Statistical inference Statistical inference = Probability 1 Probability: For a speci ed probability distribution, what are the

Contrasts & Statistical Inference

CO902 Probabilistic and statistical inference · CO902 Probabilistic and statistical inference Lecture 1 ... “Probability and Statistics ... Casella & Berger “Statistical Inference

Models and Statistical Inference: The Controversy …stats.org.uk/statistical-inference/Lenhard2006.pdfBrit. J. Phil. Sci. 57 (2006), 69–91 Models and Statistical Inference: The

Principles of Statistical Inference

Introduction to statistical inference for infectious … to statistical inference for ... Introduction to statistical inference for infectious ... infection with probability S(t)=n).Published

Introduction to Statistical Inference - Department of Statistical

Probability and Statistical Inference

Statistical inference

Constrained Statistical Inference for Categorical Datacurve.carleton.ca/.../said-constrainedstatisticalinferenceforcategorical.pdfConstrained Statistical Inference for Categorical

Statistical inference