Post on 17-Aug-2018
Statistical Inference and Random Matrices
N.S. Witte
Institute of Fundamental SciencesMassey University
New Zealand
5-12-2017
Joint work with Peter Forrester
6th Wellington Workshop in Probabilityand Mathematical Statistics
4-6 December 2017
N.S. Witte Statistical Inference and Random Matrices 1-1
Applications
Historical Origins:
Integrals over Classical Groups U(N),O(N), Sp(2N) Hurwitz 1897, Haar 1933
Mathematical Statistics: Wishart 1928, James 1954-64, Constantine 1964, Mathai 1997
Quantisation of Classically Chaotic Systems: Wigner 1955, 1958
Contemporary Applications:
Principal Component Analysis, sample covariance matrices, Wishart matrices, null andnon-null SCM
mathematical finance, cross correlations of financial data, sample correlation matrices,
Polynuclear growth models, random permutations, last passage percolation, queuing models,
Biogeographic pattern of species nested-ness, ordered binary presence-absence matrices,
distribution of mutation fitness effects across species, Fisher’s geometrical model,
complex networks modeled by random graphs, e.g. adjacency matrices
data analysis and statistical learning
Stable signal recovery from incomplete and inaccurate measurements
Compressed sensing, best k-term approximation, n-widths
Wireless communication, antenna networks,
quantum entanglement
quantum chaos, semi-classical approximation
quantum transport in mesoscopic systems
N.S. Witte Statistical Inference and Random Matrices 1-2
Applications
Historical Origins:
Integrals over Classical Groups U(N),O(N), Sp(2N) Hurwitz 1897, Haar 1933
Mathematical Statistics: Wishart 1928, James 1954-64, Constantine 1964, Mathai 1997
Quantisation of Classically Chaotic Systems: Wigner 1955, 1958
Contemporary Applications:
Principal Component Analysis, sample covariance matrices, Wishart matrices, null andnon-null SCM
mathematical finance, cross correlations of financial data, sample correlation matrices,
Polynuclear growth models, random permutations, last passage percolation, queuing models,
Biogeographic pattern of species nested-ness, ordered binary presence-absence matrices,
distribution of mutation fitness effects across species, Fisher’s geometrical model,
complex networks modeled by random graphs, e.g. adjacency matrices
data analysis and statistical learning
Stable signal recovery from incomplete and inaccurate measurements
Compressed sensing, best k-term approximation, n-widths
Wireless communication, antenna networks,
quantum entanglement
quantum chaos, semi-classical approximation
quantum transport in mesoscopic systems
N.S. Witte Statistical Inference and Random Matrices 1-3
What is a random matrix?
E.g. Gaussian Orthogonal Ensembles of Random Matrices: GOE aka Gaussian Wigner matrixi.i.d random variables
xj,j ∼ N[0, 1]
xj,k ∼ N[0, 1√2
]
Construct n × n real, symmetric matrix X = (xj,k )nj,k=1
Joint p.d.f for the elements
P(X ) =1
Cn
n∏j=1
e− 1
2x2j,j
∏1≤j<k≤n
e−x2
j,k =1
Cn
∏1≤j,k≤n
e− 1
2x2j,k =
1
Cne−
12Tr(X2)
Invariance under orthogonal transformations X 7→ OXO†, OO† = I
N.S. Witte Statistical Inference and Random Matrices 1-4
Spectral Decomposition of a Matrix
n × n Real Symmetric Matrices X = (xj,k )1≤j,k≤n
Eigenvalue Analysis λ1, . . . , λn
X = OΛO†
Λ = diag(λ1, . . . , λn), Orthogonal Eigenvectors O = (O1, . . . ,On), OO† = I
Volume form(dX ) = ∧n
j=1dxj,j ∧1≤j<k≤n dxj,k
Change of 12 n(n + 1) variables {xj,k} 7→ {λj ,Oj} Jacobian
(dX ) = (O†dO)∏
1≤j<k≤n
|λj − λk | ∧nj=1 dλj
n × m Real Matrices XSingular Value Decomposition σ1, . . . , σm
X = OΣP†
Σ = diag(σ1, . . . , σm) ∈ Rn×m, Orthogonal O ∈ Rn×n, P ∈ Rm×m
Gram-Schmidt orthogonalisation, QR, LU, Cholesky, Hessenberg Decompositions
N.S. Witte Statistical Inference and Random Matrices 1-5
Putting it altogether: GOE or Gaussian Wigner Matrices
Joint p.d.f for eigenvalues
P(λ) =1
Cn
n∏j=1
e− 1
2λ2j
∏1≤j<k≤n
|λj − λk |1, λj ∈ R
N.B.
repulsion parameter, Dyson index, β = 1 in Stieltjes picture, ”Log-gas”
Hermite weight, one of the classical orthogonal polynomial weights
normalisation is Selberg integral
N.S. Witte Statistical Inference and Random Matrices 1-6
Principal Component Analysis
X ∈ Fn×p (F = R,C) with
X = {x (j)k } j=1,...,n
k=1,...,p,
with
p = # of variables,
n = # of data points
p × p Covariance matrix
A = X†X =
n∑j=1
x(j)k1
x(j)k2
k1=1,...,pk2=1,...,p
A is a Wishart matrix if xj,k are i.i.d random variables drawn from N[0, 1]
Joint eigenvalue p.d.f.
1
C
p∏k=1
λβa/2k e−βλk/2
∏1≤j<k≤p
|λj − λk |β , λk ∈ [0,∞)
i.e. Laguerre weight, a = n − p + 1− 2/β, n ≥ p
β =
{1 Real,R2 Complex,C
N.S. Witte Statistical Inference and Random Matrices 1-7
Translation
Single-Wishart Null hypothesisµ = 0,Σ = I
p − variate
degrees of freedom = n
Laguerre LβE
e−β2λλβ2
(n−p+1−2/β)
Double-Wishart Null hypothesisµ1 = 0,Σ1 = I , µ2 = 0,Σ2 = I
p − 1st variate
q − 2nd variate
degrees of freedom = n
Jacobi JβE
(1−λ)β2
(q−p+1−2/β)(1+λ)β2
(n−q−p+1−2/β)
N.S. Witte Statistical Inference and Random Matrices 1-8
Global Properties of the spectra: Empirical Spectral Distribution
n →∞ Wigner semi-circle Law for the global density of eigenvalues
ρ(λ) =1
π
√2n − λ2
n, p →∞ Marchenko-Pastur Law for eigenvalues of X†X
ρ(λ) =1
2πλ
√(λ− nx−)(nx+ − λ)
where
x± =(c−1/2 ± 1
)2, c =
p
n≤ 1
N.S. Witte Statistical Inference and Random Matrices 1-9
GUE, GOE n = 24
2√nρ(2√nx)
- 1.5 - 1.0 - 0.5 0.5 1.0 1.5
5
10
15
- 1.5 - 1.0 - 0.5 0.5 1.0 1.5
2
4
6
8
10
12
14
N.S. Witte Statistical Inference and Random Matrices 1-10
Global Properties of the spectra: Empirical Spectral Distribution
Prop. 2.2 and Lemma 4.1 of Haagerup and Thorbjørnsen [2012]
Theorem
The eigenvalue density ρ(x) for the GUE satisfies the third order, homogeneous ordinarydifferential equation
ρ′′′
+ (4n − x2)ρ′
+ xρ = 0
subject to certain boundary conditions, for fixed n as x → ±∞.
W & Forrester [2013]
Theorem
The eigenvalue density ρ(x) for the GOE satisfies the fifth order, linear homogeneous ordinarydifferential equation
−4ρ(V ) + 5(x2−4n+ 2)ρ′′′−6xρ′′+ [−x4 + (8n−4)x2−16n2 + 16n+ 2]ρ′+x(x2−4n+ 2)ρ = 0
again subject to certain boundary conditions, for fixed n as x → ±∞.
N.S. Witte Statistical Inference and Random Matrices 1-11
The many ways to look at these problems
Statistic on Spec(X ) = {λ1, . . . λn} Regime
density ρ(λ) Global spectrum
m-point correlation functions ρm(λ1, . . . , λm)
Linear Spectral Statistics,∑n
i=1 f (λi ) Hypothesis tests, Distribution theory
Extreme Eigenvalues, λmax , λmin Large deviations, Spectrum edge
Eigenvalue Spacings, λi+1 − λi Bulk or Edge spectrum
Spectral Gaps ∀j, λj /∈ J ⊂ Spec(X )
Condition Numbers, λmaxλmin
Determinants, Characteristic Polynomials∏n
i=1(ζ − λi )
N.S. Witte Statistical Inference and Random Matrices 1-12
Tools one can use
Approach Primary Object
Moment Methods
Concentration Inequalities
Large Deviation Theory Potential problems and equilibrium measures
Free Probability Stieltjes transform
Loop Equations Stieltjes transform, Resolvents
Hypergeometric Functions of Matrix Argument Zonal and Jack Polynomials
Orthogonal/Bi-orthogonal Polynomials Riemann-Hilbert asymptotics
Integrable Systems, Painleve equations Gap probabilities, Characteristic polynomial
N.S. Witte Statistical Inference and Random Matrices 1-13
The Spectrum Edge: Soft Edge Tracy-Widom Distribution F2(s), β = 2
Gap probability i.e. probability of no eigenvalues of n × n GUE in (t,∞) denoted by
E2,n(0; (t,∞))
Shift and scale t as
t =√
2n +s
√2n1/6
Take limit n →∞ of Gap probability
limn→∞
P[λmax −
√2n
√2n1/6
≤ s] = F2(s)
Tracy-Widom [1994] Fredholm determinant
F2(s) = det(1− K2)L2(s,∞)
where the integral operator K2 has the kernel, the Airy Kernel
K2(x, y) =Ai(x)Ai′(y)− Ai(y)Ai′(x)
x − y
N.S. Witte Statistical Inference and Random Matrices 1-14
Tracy-Widom Distribution F2(s) and the second Painleve transcendent PII
The PII transcendent q(t;α) then satisfies the standard form of the second Painleve equation
d2
dt2q = 2q3 + tq + α, α ∈ C
Gap probability, i.e. Tracy-Widom distribution F2(s), is
ESoft Edge2 (0; (s,∞)) = exp
(−∫ ∞s
dt (t − s)q(t)2)
where q(t) is the α = 0 solution for PII with the boundary condition
q(t) ∼t→∞
Ai(t)
Hastings and McLeod solution, see Hastings, S. P. and McLeod, J. B.A boundary value problem associated with the second Painleve transcendent and theKorteweg-Vries equation. Arch. Rational Mech. Anal., 1980, 73(1), 31-51
Forrester & W [2012], Tails
log ESoft Edgeβ (n; (s,∞)) ∼
s→−∞n�|s|
−β|s|3
24+
√2
3|s|3/2
(βn +
β
2− 1
)
+
[β
2n2 +
(β
2− 1
)n +
1
6
(1−
2
β
(1−
β
2
)2)]log |s|−3/4
N.S. Witte Statistical Inference and Random Matrices 1-15
Universality Hypothesis
Extension of central limit theorems and the Gaussian distribution
Conjecture
As the rank of the random matrix ensembles n →∞, with or without a similar scaling of otherparameters, the ensembles
have well-defined limits,
these limits define new distributions which are insensitive to details of the finite model otherthan their symmetry class, β
are characterised by the solutions of integrable dynamical systems, e.g. of the integrablehierarchies such as the Toda lattice, K-dV or K-P systems, or more precisely by Painleve typeequations.
Proven Cases:
– ”Four Moments Theorems” in Tao, T., Vu, V.,Random covariance matrices: universality of local statistics of eigenvalues.Ann. Probab., 2012, 40, 1285–1315.
– ”Riemann-Hilbert Approach” in Deift, P., Gioev, D.Random Matrix Theory: Invariant Ensembles and Universality. Amer. Math. Soc., 2009.
N.S. Witte Statistical Inference and Random Matrices 1-16
Beyond the Null case: Example 1, Appearance of ”phase transition phenomenon”
Baik, J., Ben Arous, G. and Peche, S.Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices.Ann. Probab., 2005, 33(5), 1643–1697
β = 2, λ1 is the largest eigenvalue of the sample covariance matrix, γ2 = np , `1 ≥ `2
Population covariance matrix
Σ =
`1
`2
Ip−2
As p, n →∞ either
P(
[λ1 − (1 + γ−1)2)]
γ
(1 + γ)4/3n2/3 ≤ x
)→
F2(x), 0 < `1, `2 < 1 + γ−1
F 21 (x), 0 < `2 < 1 + γ−1 = `1
F (x) `1 = `2 = 1 + γ−1
or
P(
[λ1 − `1(1 +γ−2
`1 − 1)]
n1/2
`1
√1− γ−2/(`1 − 1)2
≤ x
)→{G1(x), `1 > 1 + γ−1, `1 > `2
G2(x), `1 = `2 > 1 + γ−1
N.S. Witte Statistical Inference and Random Matrices 1-17
Beyond the Null case: The moral of the story
As n →∞ and p � n PCA works: Sample CM → Population CM
As n, p →∞ and p = O(n) ???
Issue: How close are the eigenvalues of the sample PCA to the population PCA?
”Even though for n finite there is no phase transition (n →∞) as a function of n orsome other parameter the eigenvector of the sample PCA (e.g associated witheigenvalue λ1) may exhibit a sharp loss of tracking suddenly losing its relation to theeigenvector of the population PCA.”
Nadler, B. Finite sample approximation results for principal component analysis: a matrixperturbation approach. Ann. Statist. 36 (2008), no. 6, 27912817.
N.S. Witte Statistical Inference and Random Matrices 1-18
Beyond the Null case: Example 2, ”Spiked Population models”
Johnstone, I.M.On the distribution of the largest eigenvalue in principal components analysisAnn. Statist., 2001, 29(2), 295–327
A model:xi = µ + Aui + σzi , i = 1, . . . , n
where
p number of variablesM number of spikesxi observation p-vectorµ p-vector of meansA p ×M factor loading matrixui M-vector of random factorszi p-vector of white noise
Population Covariance Matrix
Σ =M∑j=1
`2j qjq
Tj + σ
2Ip
where
Φ M ×M covariance matrix of ui`j , qj , j = 1, . . . ,M eigenvalues/vectors of AΦAT
Ma, Z. Sparse principal component analysis and iterative thresholding.Ann. Statist. 41 (2013), no. 2, 772801.
N.S. Witte Statistical Inference and Random Matrices 1-19
Painleve Transcendents
P-I d2y
dx2= 6y2 + x
P-II d2y
dx2= 2y3+xy+ν
P-III’d2y
dx2=
1
y
(dy
dx
)2
−1
x
dy
dx+
y2
4x2(γy+α) +
β
4x+δ
4y, γ = 4, δ = −4
P-IVd2y
dx2=
1
2y
(dy
dx
)2
+3
2y3 + 4xy2 + 2(x2−α)y +
β
y
P-V
d2y
dx2=
{1
2y+
1
y − 1
}(dy
dx
)2
−1
x
dy
dx
+(y − 1)2
x2
{αy +
β
y
}+γy
x+δy(y + 1)
y − 1
, δ = −1/2
P-VI
d2y
dx2=
1
2
{1
y+
1
y − 1+
1
y − x
}(dy
dx
)2
−{
1
x+
1
x − 1+
1
y − x
}dy
dx
+y(y − 1)(y − x)
x2(x − 1)2
{α+
βx
y2+γ(x − 1)
(y − 1)2+δx(x − 1)
(y − x)2
}N.S. Witte Statistical Inference and Random Matrices 1-20
Painleve Equations: Digital Library of Mathematical Functions, Chapter 32,http://dlmf.nist.gov/
ClassicalSolution
PAffineWeylGroup
— PI
Airy: Ai(x),Bi(x) PII A1
Bessel: Iν(x),Kν(x) PIII B2
Hermite-Weber: Dν(x) PIV A2
Confluent Hypergeometric: 1F1(a, c; x) PV A3
Gauss Hypergeometric: 2F1(a, b; c; x) PVI D4
N.S. Witte Statistical Inference and Random Matrices 1-21
Reference Monographs/Reviews
Muirhead, R. J., Aspects of Multivariate Statistical Theory, Wiley Series in Probability andMathematical Statistics, John Wiley & Sons Inc., 1982,
Bai, Z. and Silverstein, J. W., Spectral Analysis of Large Dimensional Random Matrices, 2ndEdition, Springer, New York, 2010,
Mehta, M. L., Random Matrices, Pure and Applied Mathematics (Amsterdam), Vol. 142, 3rdEdition, Elsevier/Academic Press, Amsterdam, 2004,
Anderson, G. W. and Guionnet, A. and Zeitouni, O., An Introduction to Random Matrices,Cambridge University Press, Cambridge, 2010,
Forrester, P. J., Log Gases and Random Matrices, Princeton University Press, 2010
Akemann, G. and Baik, J. and Di Francesco, P., Handbook on Random Matrix Theory,Oxford University Press, 2011,
Couillet, R. and Debbah, M. , Random Matrix Methods for Wireless Communications,Cambridge University Press, 2011
Johnstone, I. M., High Dimensional Statistical Inference and Random Matrices, InternationalCongress of Mathematicians. Vol. I, Eur. Math. Soc., Zurich, 2006, pp. 307-333
Paul, D. and Aue, A, Random matrix theory in statistics: A review J. Statist. Plann.Inference, 150, 1–29, 2014
N.S. Witte Statistical Inference and Random Matrices 1-22