UCLA Department of Statistics - CARME 2011carme2011.agrocampus-ouest.fr/slides/deleeuw.pdf · In...

UCLA Department of Statistics

History and Theoryof Nonlinear Principal Component Analysis

Jan de Leeuw

February 11, 2011

Jan de Leeuw

NLPCA History UCLA Department of Statistics

Abstract

Relationships between Multiple Correspondence Analysis (MCA) andNonlinear Principal Component Analysis (NLPCA), which is defined asPCA with Optimal Scaling (OS), are discussed. We review the history ofNLPCA.

We discuss forms of NLPCA that have been proposed over the years:Shepard-Kruskal- Breiman-Friedman-Gifi PCA with optimal scaling,Aspect Analysis of correlations,Guttman’s MSA,Logit and Probit PCA of binary data, andLogistic Homogeneity Analysis.

Since I am trying to summarize 40+ years of work, the presentation willbe rather dense.

Jan de Leeuw


Linear PCAHistory

(Linear) Principal Components Analysis (PCA) is sometimes attributedto Hotelling (1933), but that is surely incorrect.

The equations for the principal axes of quadratic forms and surfaces, invarious forms, were known from classical analytic geometry (notablyfrom work by Cauchy and Jacobi in the mid 19th century).

There are some modest beginnings in Galton’s Natural Inheritance of1889, where the principal axes are connected for the first time with the“correlation ellipsoid".

There is a full-fledged (although tedious) discussion of the technique inPearson (1901), and there is a complete application (7 physical traits of3000 criminals) in MacDonell (1902), by a Pearson co-worker.

There is proper attribution in: Burt, C., Alternative Methods of FactorAnalysis and their Relations to Pearson’s Method of “Principle Axes”, Br.J. Psych., Stat. Sec., 2 (1949), pp. 98-121.

Jan de Leeuw


Linear PCAHow To

Hotelling’s introduction of PCA follows the now familiar route of makingsuccessive orthogonal linear combinations with maximum variance. Hedoes this by using Power iterations (without reference), discussed in1929 by Von Mises and Pollaczek-Geiringer.

Pearson, following Galton, used the correlation ellipsoid throughout.This seems to me the more basic approach.

He cast the problem in terms of finding low-dimensional subspaces(lines and planes) of best (least squares) fit to a cloud of points, andconnects the solution to the principal axes of the correlation ellipsoid.

In modern notation, this means minimizing SSQ(Y −XB′) over n× rmatrices X and m× r matrices B. For r = 1 this is the best line, etc.

Jan de Leeuw


Correspondence AnalysisHistory

Simple Correspondence Analysis (CA) of a bivariate frequency tablewas first discussed, in fairly rudimentary form, by Pearson (1905), bylooking at transformations linearizing regressions. See De Leeuw, Onthe Prehistory of Correspondence Analysis, Statistica Neerlandica, 37,1983, 161–164.

This was taken up by Hirshfeld (Hartley) in 1935, where the techniquewas presented in a fairly complete form (to maximize correlation anddecompose contingency). This approach was later adopted byGebelein, and by Renyi and his students in their study of maximalcorrelation.

Jan de Leeuw


Correspondence AnalysisHistory

In the 1938 edition of Statistical Methods for Research Workers Fisherscores a categorical variable to maximize a ratio of variances (quadraticforms). This is not quite CA, because it is presented in an (asymmetric)regression context.

Symmetric CA and the reciprocal averaging algorithm are discussed,however, in Fisher (1940) and applied by his co-worker Maung(1941a,b).

In the early sixties the chi-square metric, relating CA to metricmultidimensional scaling (MDS), with an emphasis on geometry andplotting, was introduced by Benzécri (thesis of Cordier, 1965).

Jan de Leeuw


Multiple Correspondence AnalysisHistory

Different weighting schemes to combine quantitative variables to anindex that optimizes some variance-based discrimination orhomogeneity criterion were proposed in the late thirties by Horst (1936),by Edgerton and Kolbe (1936), and by Wilks (1938).

The same idea was applied to quantitative variables in a seminal paperby Guttman (1941), that presents, for the first time, the equationsdefining Multiple Correspondence Analysis (MCA).

The equations are presented in the form of a row-eigen (scores), acolumn-eigen (weights), and a singular value (joint) problem.

The paper introduces the “codage disjonctif complet” as well as the“Tableau de Burt”, and points out the connections with the chi-squaremetric.

There is no geometry, and the emphasis is on constructing a singlescale. In fact Guttman warns against extracting and using additionaleigen-pairs.

Jan de Leeuw


Multiple Correspondence AnalysisFurther History

In Guttman (1946) scale or index construction was extended to pairedcomparisons and ranks. In Guttman (1950) it was extended to scalablebinary items.

In the fifties and sixties Hayashi introduced the quantification techniquesof Guttman in Japan, where they were widely disseminated through thework of Nishisato. Various extensions and variations were added by theJapanese school.

Starting in 1968, MCA was studied as a form of metric MDS by DeLeeuw.

Although the equations defining MCA were the same as those definingPCA, the relationship between the two remained problematic.

These problems are compounded by “horse shoes” or the “effectGuttman”, i.e. artificial curvilinear relationships between successivedimensions (eigenvectors).

Jan de Leeuw


Nonlinear PCAWhat ?

PCA can be made non-linear in various ways.1 First, we could seek indices which discriminate maximally and are

non-linear combinations of variables. This generalizes the weightingapproach (Hotelling).

2 Second, we could find nonlinear combinations of components that areclose to the observed variables. This generalizes the reduced rankapproach (Pearson).

3 Third, we could look for transformations of the variables that optimizethe linear PCA fit. This is known (term of Darrell Bock) as the optimalscaling (OS) approach.

Jan de Leeuw


Nonlinear PCAForms

The first approach has not been studied much, although there are somerelations with Item Response Theory.

The second approach is currently popular in Computer Science, as“nonlinear dimension reduction”. I am currently working on a polynomialversion, but there is not unified theory, and the papers are usually of the“‘well, we could also do this” type familiar from cluster analysis.

The third approach preserves many of the properties of linear PCA andcan be connected with MCA as well. We shall follow its history anddiscuss the main results.

Jan de Leeuw


Nonlinear PCAPCA with OS

Guttman observed in 1959 that if we require that the regression betweenmonotonically transformed variables are linear, then the transformationsare uniquely defined. In general, however, we need approximations.

The loss function for PCA-OS is SSQ(Y −XB′), as before, but now weminimize over components X , loadings B, and transformations Y .

Transformations are defined column-wise (over variables) and belong tosome restricted class (monotone, step, polynomial, spline).

Algorithms often are of the alternating least squares type, where optimaltransformation and low-rank matrix approximation are alternated untilconvergence.

Jan de Leeuw


PCA-OSHistory of programs

Shepard and Kruskal used the monotone regression machinery ofnon-metric MDS to construct the first PCA-OS programs around 1962.The paper describing the technique was not published until 1975.

Around 1970 versions of PCA-OS (sometimes based on Guttman’s rankimage principle) were developed by Lingoes and Roskam.

In 1973 De Leeuw, Young, and Takane started the ALSOS project, withresulted in PRINCIPALS (published in 1978), and PRINQUAL in SAS.

In 1980 De Leeuw (with Heiser, Meulman, Van Rijckevorsel, and manyothers) started the Gifi project, which resulted in PRINCALS, in SPSSCATPCA, and in the R package homals by De Leeuw and Mair (2009).

In 1983 Winsberg and Ramsay published a PCA-OS version usingmonotone spline transformations.

In 1987 Koyak, using the ACE smoothing methodology of Breiman andFriedman (1985), introduced mdrace.

Jan de Leeuw


PCA/MCAThe Gifi Project

The Gifi project followed the ALSOS project. It has or had as its explicit goals:

1 Unify a large class of multivariate analysis methods by combining asingle loss function, parameter constraints (as in MDS), and ALSalgorithms.

2 Give a very general definition of component analysis (to be calledhomogeneity analysis) that would cover CA, MCA, linear PCA, nonlinearPCA, regression, discriminant analysis, and canonical analysis.

3 Write code and analyze examples for homogeneity analysis.

Jan de Leeuw


GifiLoss of Homogeneity

The basic Gifi loss function is

σ(X ,Y ) =m

∑j=1

SSQ(X −GjYj).

The n× kj matrices Gj are the data, coded as indicator matrices (ordummies). Alternatively, Gj can be a B-spline basis. Also, Gj can havezero rows for missing data.

X is an n×p matrix of object scores, satisfying the normalizationconditions X ′X = I.

Yj are kj ×p matrices of category quantifications. There can be rank,level and additivity constraints on the Yj .

Jan de Leeuw


GifiALS

The basic Gifi algorithm alternates

1 X (k) = ORTH(∑mj=1 GjY

(k)j ).

2 Y (k+1)j = argmin

Yj∈Yj

tr (Y (k+1)j −Yj)

′Dj(Y(k+1)j −Yj).

We use the following notation.

Superscript (k) is used for iterations.

ORTH() is any orthogonalization method such as QR, Gram-Schmidt,or SVD.

Dj = G′j Gj are the marginals.

Y (k+1)j = D−1

j G′j X(k) are the category centroids.

The constraints on Yj are written as Yj ∈ Yj .

Jan de Leeuw


GifiStar Plots

Let’s look at some movies.

GALO: 1290 students, 4 variables. We show both MCA and NLPCA.

Senate: 100 senators, 20 votes. Since the variables are binary, MCA =NLPCA.

Jan de Leeuw


objplot galo

12

3

4

56

7

8

9

10

11

12131415

16

171819

20

212223

24

25

26 2728

29

30

31

32

33

343536

37

38

39

4041 42

43

44

4546

4748

49

50

51

52

53

54

55

56

57

58

59

60

616263

64

65

66

67

68

69

70

71

72

73

74

7576

77

78

7980

81

828384

85

86

87

88

89

90

91

92

93

94

95

96

97

98 99

100

101

102

103

104

105

106

107

108

109110

111

112

113

114115

116

117

118

119120121

122

123

124

125

126127

128

129

130

131 132133

134

135

136

137138

139

140141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156 157158

159

160

161

162163

164

165

166

167

168

169

170

171

172

173

174 175

176

177

178

179

180 181

182

183184

185

186

187

188

189

190

191

192 193

194

195

196197

198

199

200

201

202203

204

205 206

207

208

209210

211

212213

214

215

216217

218

219

220

221222

223

224

225226

227 228229

230

231

232

233

234

235

236

237

238

239

240

241242

243

244

245

246

247

248

249

250

251252

253

254

255

256

257

258

259260

261

262

263

264

265

266

267268

269

270

271272

273

274

275276

277

278

279

280 281

282

283

284285

286

287

288

289

290

291

292293

294

295

296

297298

299

300

301 302

303 304

305

306

307

308

309

310

311

312

313314

315

316

317

318

319

320

321

322

323

324325

326 327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345346

347

348

349

350

351

352

353

354

355

356357

358

359

360

361362

363364

365

366

367368

369

370

371

372

373

374

375

376

377

378

379 380381

382

383

384

385 386

387

388389

390

391

392

393

394

395

396

397

398399

400

401

402

403

404

405

406

407408

409

410 411

412

413

414 415

416

417

418

419

420

421

422

423

424

425

426

427428

429

430431

432

433

434

435

436

437438

439

440

441

442443

444

445

446

447

448

449

450

451

452

453

454

455

456

457

458

459

460 461

462

463464465466

467468

469

470

471

472

473

474

475

476

477

478

479480

481

482483

484

485

486

487

488

489

490

491

492

493

494

495496

497498

499

500

501 502

503

504

505

506

507

508509

510

511

512

513514

515

516

517

518

519

520

521

522

523

524

525

526

527

528

529530

531

532533

534

535536537

538

539

540

541

542

543

544

545546

547

548

549

550

551

552

553

554

555556

557

558559

560

561

562

563

564

565

566

567568

569

570

571

572

573

574

575

576

577

578

579

580

581

582

583

584

585

586

587

588

589

590

591

592

593

594

595

596597

598

599

600

601

602

603

604

605

606

607

608

609

610

611

612

613

614

615

616

617

618619

620

621622

623

624

625

626

627

628

629

630

631632

633

634

635

636637 638

639

640

641

642

643644

645

646

647648

649

650

651

652

653

654

655

656

657658

659

660

661

662

663

664 665

666

667

668

669

670

671

672

673

674675

676

677

678

679

680

681

682

683

684685

686

687

688

689

690

691

692

693

694

695

696

697

698

699

700

701

702703

704

705

706

707

708

709

710711

712

713

714

715

716

717

718

719720

721722

723

724

725

726

727

728

729

730

731732

733 734735

736737

738

739

740

741

742

743

744

745

746

747

748

749750751

752

753

754

755

756

757

758

759

760

761

762

763764

765

766

767

768

769

770

771

772

773

774

775

776 777

778

779

780

781

782

783

784

785

786

787 788

789

790

791

792

793

794

795

796

797

798799 800 801

802803

804

805806

807808809

810

811

812

813814

815

816

817

818

819

820

821822 823824

825826

827

828

829

830831

832 833

834

835

836

837

838

839

840841

842 843844845

846

847

848

849850

851852

853

854

855

856

857

858859

860

861862

863

864865

866

867

868

869

870

871

872

873 874

875

876

877

878

879

880

881

882

883

884

885

886

887

888

889

890

891

892

893

894

895

896

897

898

899900901902

903

904 905

906

907

908

909910911

912

913

914

915

916

917

918

919

920

921

922

923

924

925

926

927

928

929

930

931

932

933

934

935

936

937938

939 940

941

942

943

944945946

947

948

949

950

951

952

953

954

955

956

957958

959

960

961

962963

964

965

966967

968969

970

971

972

973974

975

976

977978

979980

981

982

983984985

986

987

988

989

990

991

992

993

994

995

996

997

998

999

100010011002

1003

1004

1005

1006

1007

1008

1009

1010

1011

1012

1013

1014

1015

1016

10171018

1019

10201021

1022

1023

1024

1025

102610271028

1029

1030

1031

1032

1033

10341035

1036

1037

1038

1039

1040

10411042

1043

10441045

1046

1047

1048

1049

10501051

1052

1053

1054

1055

1056

1057

1058

1059

1060

10611062

1063

10641065

1066

1067

1068

10691070

1071

1072

1073

1074

1075

1076

1077

1078

1079

10801081

1082

108310841085

1086

1087

10881089

1090

1091

1092

1093

1094

1095

1096

1097

1098

1099

1100

1101

1102

1103

1104

1105

1106

11071108

1109

1110

1111

11121113

1114

1115

1116

1117

1118

1119 1120

1121

1122

1123

11241125

1126

1127

11281129

1130

1131

1132

11331134

1135

1136

1137

11381139

1140

1141

11421143

1144

1145

1146

11471148

11491150

1151

1152

1153

1154

1155

1156

1157

1158

1159

11601161

1162

1163

1164

1165

1166

1167

11681169

1170

1171

1172

1173

1174

1175

11761177

1178

1179

1180

1181 1182

1183

1184

1185

1186

1187

1188

1189

1190

1191

1192

1193

1194

1195

1196

1197

1198

1199

1200

1201

1202

1203 1204

1205

1206

1207

12081209

12101211

1212

1213

1214

12151216

1217

1218

121912201221

12221223

12241225

1226

1227

1228

1229

1230

1231

12321233

1234

1235

1236

1237

1238

12391240

1241

1242

1243

1244

1245

1246

1247 1248

1249

1250

125112521253

1254

1255

1256

1257

1258

1259

1260

1261

1262

126312641265

1266

1267

1268

126912701271

1272

12731274 1275

1276

12771278

1279

12801281

1282

1283

1284

1285

1286 1287

12881289

1290

Jan de Leeuw


objplot senate

Sessions

ShelbyMurkowski

Stevens

Kyl

McCain

Hutchinson

Lincoln

Boxer

Feinstein

Allard

Campbell

Dodd

Lieberman

Biden

Carper

Graham

Nelson

Cleland

Miller

Akaka

Inouye

Craig

Crapo

DurbinFitzgerald

BayhLugar

Grassley

Harkin

BrownbackRobertsBunning

McConnell

Breaux

Landrieu

CollinsSnowe

MikulskiSarbanes

KennedyKerryLevinStabenowDaytonWellstone

Cochran

Lott

Bond Carnahan

Baucus

Burns

Hagel

Nelson1

Ensign

Reid

Gregg

Smith1

Corzine

Torricelli

BingamanDomenici

ClintonSchumerEdwards

Helms

ConradDorgan

DeWine

Voinovich

Inhofe

Nickles

Smith

Wyden

Santorum

Specter

Chafee

ReedHollings

Thurmond

Daschle

Johnson

Frist

Thompson

Gramm

Hutchison

BennettHatch

Jeffords

Leahy

Allen

Warner

Cantwell

Murray

Byrd

RockefellerFeingold

Kohl

Enzi

Thomas

Jan de Leeuw


Jan de Leeuw


GifiSingle Variables

If there are no constraints on the Yj homogeneity analysis is MCA.

We will not go into additivity constraints, because they take us from PCAand MCA towards regression and canonical analysis. See the homalspaper and package.

A single variable has constraints Yj = zja′j , i.e. category quantifications

are of rank one. In a given analysis some variables can be single whileother can be multiple (unconstrained). More generally, there can be rankconstraints on the Yj .

This can be combined with level constraints on the single quantificationszj , which can be numerical, polynomial, ordinal, or nominal.

If all variables are single homogeneity analysis is NLPCA (i.e. PCA-OS).This relationship follows from the form of the loss function.

Jan de Leeuw


GifiMultiple Variables

There is another relationship, which is already implicit in Guttman(1941). If we transform the variables to maximize the dominanteigenvalue of the correlation matrix, then we find both the first MCAdimension and the one-dimensional nominal PCA solution.

But there are deeper relations between MCA and NLPCA. These weredeveloped in a series of papers by De Leeuw and co-workers, starting in1980. Their analysis also elucidates the “Effect Guttman”.

These relationships are most easily illustrated by performing an MCA ofa continuous standardized multivariate normal, say on m variables,analyzed in the form of a Burt Table (with doubly-infinite subtables).Suppose the correlation matrix of this distribution is the m×m matrixR = {rj`}.

Jan de Leeuw


GifiMultinormal MCA

Suppose I = R[0],R = R[1],R[2], ... is the infinite sequence of Hadamard(elementwise) powers of R.

Suppose λ[s]j are the m eigenvalues of R[s] and y [s]

j are thecorresponding eigenvectors.

The eigenvalues of the MCA solution are the m×∞ eigenvalues λ[s]j .

The MCA eigenvector corresponding to λ[s]j consists of the m functions

y [s]j` Hs, with Hs the sth normalized Hermite polynomial.

An MCA eigenvector consists of m linear transformations, or mquadratic transformations, and so on. There are m linear eigenvectors,m quadratic eigenvectors, and so on.

Jan de Leeuw


GifiMultinormal MCA

The same theory applies to what Yule (1900) calls “strainedmultinormal” variables zj , in which there exists diffeomorphisms φj suchthat φj(zj) are jointly multinormal (an example are Gaussian copulas).

And the same theory also applies, except for the polynomial part, whenseparate transformations of the variables exists that linearize allbivariate regressions (this generalizes a result of Pearson from 1905).

Under all these scenarios, MCA solutions are NLPCA solutions, andvice versa.

With the provision that NLPCA solutions are always selected from thesame R[s], while MCA solutions come from all R[s].

Also, generally, the dominant eigenvalue is λ[1]1 and the second largest

one is either λ[2]1 or λ

[1]2 . In the first case we have a horseshoe.

Jan de Leeuw


GifiBilinearizability

The “joint bilinearizibility” also occurs (trivially) if m = 2, i.e. in CA, and ifkj = 2 for all j , i.e. for binary variables.

If there is joint linearizability then the joint first-order asymptotic normaldistribution of the induced correlation coefficients does not depend onthe standard errors of the computed optimal transformations (no matterif they come from MCA or NLPCA or any other OS method).

There is additional horseshoe theory, due mostly to Schriever (1986),that uses the Krein-Gantmacher-Karlin theory of total positivity. It is notbased on families of orthogonal polynomials, but on (higher-order) orderrelations.

This was, once again, anticipated by Guttman (1950) who used finitedifference equations to derive the horseshoe MCA/NLPCA for the binaryitems defining a perfect scale.

Jan de Leeuw


MorePavings

If we have a mapping of n objects into Rp then a categorical variablecan be used to label the objects.

The subsets corresponding to the categories of the variables aresupposed to be homogeneous.

This can be formalized either as being small or as being separated bylines or curves. There are many ways to quantify this in loss functions.

MCA (multiple variables, star plots) tends to think small (within vsbetween), NLPCA tends to think separable.

Guttman’s MSA defines outer and inner points of a category. An outerpoint is a closest point for any point not in the category.

The closest outer point for an inner point should belong to the samecategory as the inner point. This is a nice “topological” way to defineseparation, but it is hard to quantify.

Jan de Leeuw


MoreAspects

The aspect approach (De Leeuw and Mair, JSS, 2010, using theoryfrom De Leeuw, 1990) goes back to Guttman’s 1941 original motivation.

An aspect is any function of all correlations (and/or the correkationratios) between m transformed variables.

Now choose the transformations/quantifications such that the aspect ismaximized. We use majorization to turns this into a sequence of leastsquares problems.

For MCA the aspect is the largest eigenvalue, for NLPCA it is the sum ofthe largest p eigenvalues.

Determinants, multiple correlations, canonical correlations can also beused as aspects.

Or: the sum of the differences of the correlation ratios and the squaredcorrelation coefficients.

Multinormal, strained multinormal, and bilinearizability theory applies toall aspects.

Jan de Leeuw


MoreLogistic

Instead of using least squares throughout, we can build a similar systemusing logit or probit log likelihoods. This is in the development stages.

The basic loss function, corresponding to the Gifi loss function, is

n

∑i=1

m

∑j=1

kj

∑`=1

gij` logexp{−φ(xi ,yj`)}

∑kjν=1 exp{−φ(xi ,yjν)}

where the data are indicators, as before. The function φ can bedistance, squared distance, or negative inner product.

This emphasizes separation, because we want the xi closest to the yj`

for which gij` = 1.

We use majorization to turn this into a sequence of reweighted leastsquares MDS or PCA problems.

Jan de Leeuw


UCLA Department of Statistics - CARME 2011carme2011.agrocampus-ouest.fr/slides/deleeuw.pdf · In...

Documents

Transcript of UCLA Department of Statistics - CARME 2011carme2011.agrocampus-ouest.fr/slides/deleeuw.pdf · In...